New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda #2094

Closed

ezyang wants to merge 1 commit into pytorch:main from ezyang:export-D50653995

Contributor

ezyang commented Oct 25, 2023

Summary:
For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options. This version seemed the shortest to write, at the cost of some code duplication. The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation. Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel. I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand. I'm willing to do this if someone tells me to. Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Differential Revision: D50653995

netlify bot commented Oct 25, 2023 •

edited

Loading

✅ Deploy Preview for pytorch-fbgemm-docs canceled.

Name	Link
🔨 Latest commit	`903de3c`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6543b0a3ae41450008ea6889

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Oct 25, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

facebook-github-bot added the fb-exported label

ezyang force-pushed the export-D50653995 branch from a9cd315 to 70a5189 Compare

November 1, 2023 18:01

ezyang added a commit to ezyang/FBGEMM that referenced this pull request


          Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda (pyto…

70a5189

…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995

Contributor

facebook-github-bot commented Nov 1, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang added a commit to ezyang/FBGEMM that referenced this pull request


          Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda (pyto…

d5a0c56

…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995

ezyang force-pushed the export-D50653995 branch from 70a5189 to d5a0c56 Compare

November 1, 2023 18:02

Contributor

facebook-github-bot commented Nov 1, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang added a commit to ezyang/FBGEMM that referenced this pull request


          Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda (pyto…

5d31268

…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995

ezyang force-pushed the export-D50653995 branch from d5a0c56 to 5d31268 Compare

November 1, 2023 19:00

Contributor

facebook-github-bot commented Nov 1, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995


          Add meta functions for {}_embedding{}_codegen_forward_{}{}_cuda (pyto…

903de3c

…rch#2094)

Summary:

For example, this hits split_embedding_codegen_lookup_rowwise_adagrad_function among others.

There are a few other ways to write this and I'm open to other options.  This version seemed the shortest to write, at the cost of some code duplication.  The general strategy is I took embedding_forward_split_template.cu, copy pasted it into embedding_forward_split_meta_template.cpp, deleted all the CUDA specific stuff, and made it register a Meta implementation instead of a CUDA implementation.  Most parts of the original were excised except the kernel itself, where all of the error checking and output tensor calculation is kept.

The biggest annoyance is the code duplication for the error checking, which are copy pasted with essentially no modifications from the original CUDA kernel.  I could potentially template the original logic so I can transclude it in two places, but that would also have made this diff harder to understand.  I'm willing to do this if someone tells me to.  Another approach would have been to implement the meta function entirely in Python (this is what I did in my original prototype), but to get it to handle all of the optimizer permutations seemed like a big pain.

Reviewed By: zou3519

Differential Revision: D50653995

Contributor

facebook-github-bot commented Nov 2, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

ezyang force-pushed the export-D50653995 branch from 5d31268 to 903de3c Compare

November 2, 2023 14:22

Contributor

facebook-github-bot commented Nov 2, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

1 similar comment

Contributor

facebook-github-bot commented Nov 2, 2023

This pull request was exported from Phabricator. Differential Revision: D50653995

facebook-github-bot closed this in

824ef10

Contributor

facebook-github-bot commented Nov 2, 2023

This pull request has been merged in 824ef10.

facebook-github-bot added the Merged label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment