Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement is irrelevant for 1-sample multinomial #86342

Closed
wants to merge 1 commit into from

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Oct 6, 2022

So use fast path, both on CPU and on MPS

Also, remove some spurious copy-n-paste checks from MPS codepath

CUDA already has this optimization, see

if (n_sample == 1 && maxShared >= requiredShared) {
// Optimized allocation-free implementation

So use fast path, both on CPU and on MPS

Also, remove some spurious copy-n-paste checks from MPS codepath
@malfet malfet requested a review from kulinseth as a code owner October 6, 2022 00:14
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86342

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit cbe71cf:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Oct 6, 2022
@malfet malfet added the topic: performance topic category label Oct 6, 2022
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 6, 2022
Copy link
Collaborator

@kulinseth kulinseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to enable Caching for this Op. @malfet , currently if you see the operation is not cached and leads to recompilation in each call. That would have Perf implications as well. Can we please enable that support here? You can see how its done in:.

@malfet
Copy link
Contributor Author

malfet commented Oct 7, 2022

@kulinseth thank you for pointing this out, will do in a separate PR, as it does not have anything to do with this optimization.

@malfet
Copy link
Contributor Author

malfet commented Oct 7, 2022

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet malfet deleted the malfet/speed-up-multinomial-for-1-sample branch October 7, 2022 00:09
malfet added a commit that referenced this pull request Oct 7, 2022
Reuse existing RandomCachedGraph to keep RNG state as part of the graph

Add `CreateCachedGraphAs` convenience wrapper

Addresses #86342 (review)
malfet added a commit that referenced this pull request Oct 7, 2022
Reuse existing RandomCachedGraph to keep RNG state as part of the graph

Add `CreateCachedGraphAs` convenience wrapper

Addresses #86342 (review)
pytorchmergebot pushed a commit that referenced this pull request Oct 7, 2022
Reuse existing RandomCachedGraph to keep RNG state as part of the graph
Add `CreateCachedGraphAs` convenience wrapper
Addresses #86342 (review)
Pull Request resolved: #86437
Approved by: https://github.com/kulinseth
facebook-github-bot pushed a commit that referenced this pull request Oct 7, 2022
Summary:
So use fast path, both on CPU and on MPS

Also, remove some spurious copy-n-paste checks from MPS codepath

CUDA already has this optimization, see
https://github.com/pytorch/pytorch/blob/dc9c507d24d0c833cb09105177326f1f6bbe99c4/aten/src/ATen/native/cuda/MultinomialKernel.cu#L355-L356

Pull Request resolved: #86342
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/2232db7fc12301a2226d1921948917d5b23b6888

Reviewed By: seemethere

Differential Revision: D40167219

Pulled By: seemethere

fbshipit-source-id: 52e66c95e74b7f6b2eefbc232e9ac6cac8ced73c
facebook-github-bot pushed a commit that referenced this pull request Oct 10, 2022
Summary:
Reuse existing RandomCachedGraph to keep RNG state as part of the graph
Add `CreateCachedGraphAs` convenience wrapper
Addresses #86342 (review)

Pull Request resolved: #86437
Approved by: https://github.com/kulinseth

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/10aead9adc20bd45b7692e97a64cb76f114c8e16

Reviewed By: seemethere

Differential Revision: D40196735

Pulled By: seemethere

fbshipit-source-id: 89af8392d697def2a8bef21c79147767b3a8d098
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged release notes: mps Release notes category topic: performance topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants