Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid cloning gradient tensor in embedding backward pass #2526

Closed
wants to merge 1 commit into from

Conversation

jhadidjojo
Copy link
Contributor

Summary: I found memory spike during embedding kernel backward split_embedding_backward_codegen_rowwise_adagrad_unweghted_exact_cuda, which was traced into the below code making a clone of the gradient tensor. This logic didn't seem to be there in the original code: https://github.com/pytorch/FBGEMM/pull/2347/files#diff-944ab49dcbcf54826cc3e1eab5e3c0c787b5a195f602c2d3052adae14c506d78.

Differential Revision: D56420646

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56420646

Copy link

netlify bot commented Apr 22, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 3f16122
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6629059bd7f6f200086dbaa1
😎 Deploy Preview https://deploy-preview-2526--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jhadidjojo jhadidjojo self-assigned this Apr 22, 2024
@flaviotruzzi
Copy link
Contributor

flaviotruzzi commented Apr 22, 2024

This will break the signature for const, will it not? The reason for that if I recall it was that we moved this memory alignment into the operator, inside the cuda kernel, the signatures ask that to be const. If we don’t do it inside the operator we need to figure out a way to adjust the meta kernel which don’t have access to the data pointers to check the alignment.

@jhadidjojo
Copy link
Contributor Author

jhadidjojo commented Apr 22, 2024

This will break the signature for const, will it not?

No, I don't think so. You're right that grad_output is a const reference, and if you try to assign that to a new value it will error (that's probably the error you saw?). Here I'm creating a new (non-const) reference called aligned_grad_output. You can assign it to the grad_output by reference without copy.

@flaviotruzzi
Copy link
Contributor

This will break the signature for const, will it not?

No, I don't think so. You're right that grad_output is a const reference, and if you try to assign that to a new value it will error (that's probably the error you saw?). Here I'm creating a new (non-const) reference called aligned_grad_output. You can assign it to the grad_output by reference without copy.

Cool, I am only on my phone, and still on leave, can you check with someone else to approve it?

@jhadidjojo
Copy link
Contributor Author

@pytorchbot merge

Copy link

pytorch-bot bot commented Apr 23, 2024

This PR needs to be approved by an authorized maintainer before merge.

Summary:

I found memory spike during embedding kernel backward `split_embedding_backward_codegen_rowwise_adagrad_unweghted_exact_cuda`, which was traced into the below code making a clone of the gradient tensor. This logic didn't seem to be there in the original code: https://github.com/pytorch/FBGEMM/pull/2347/files#diff-944ab49dcbcf54826cc3e1eab5e3c0c787b5a195f602c2d3052adae14c506d78.

Reviewed By: ezyang

Differential Revision: D56420646
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D56420646

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a75037b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants