-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacement is irrelevant for 1-sample multinomial #86342
Conversation
So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86342
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 1 PendingAs of commit cbe71cf: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kulinseth thank you for pointing this out, will do in a separate PR, as it does not have anything to do with this optimization. |
@pytorchbot merge -g |
Merge startedYour change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review)
Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review)
Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review) Pull Request resolved: #86437 Approved by: https://github.com/kulinseth
Summary: So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath CUDA already has this optimization, see https://github.com/pytorch/pytorch/blob/dc9c507d24d0c833cb09105177326f1f6bbe99c4/aten/src/ATen/native/cuda/MultinomialKernel.cu#L355-L356 Pull Request resolved: #86342 Approved by: https://github.com/ngimel Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/2232db7fc12301a2226d1921948917d5b23b6888 Reviewed By: seemethere Differential Revision: D40167219 Pulled By: seemethere fbshipit-source-id: 52e66c95e74b7f6b2eefbc232e9ac6cac8ced73c
Summary: Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review) Pull Request resolved: #86437 Approved by: https://github.com/kulinseth Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/10aead9adc20bd45b7692e97a64cb76f114c8e16 Reviewed By: seemethere Differential Revision: D40196735 Pulled By: seemethere fbshipit-source-id: 89af8392d697def2a8bef21c79147767b3a8d098
So use fast path, both on CPU and on MPS
Also, remove some spurious copy-n-paste checks from MPS codepath
CUDA already has this optimization, see
pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu
Lines 355 to 356 in dc9c507