Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda #2094

Closed
wants to merge 1 commit into from

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Oct 25, 2023

Summary:
For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options. This version seemed the shortest to write, at the cost of some code duplication. The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation. Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel. I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand. I'm willing to do this if someone tells me to. Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Differential Revision: D50653995

@netlify
Copy link

netlify bot commented Oct 25, 2023

Deploy Preview for pytorch-fbgemm-docs canceled.

Name Link
🔨 Latest commit 903de3c
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6543b0a3ae41450008ea6889

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang added a commit to ezyang/FBGEMM that referenced this pull request Nov 1, 2023
…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang added a commit to ezyang/FBGEMM that referenced this pull request Nov 1, 2023
…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang added a commit to ezyang/FBGEMM that referenced this pull request Nov 1, 2023
…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50653995

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 824ef10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants