Replacement is irrelevant for 1-sample multinomial #86342

malfet · 2022-10-06T00:14:07Z

So use fast path, both on CPU and on MPS

Also, remove some spurious copy-n-paste checks from MPS codepath

CUDA already has this optimization, see

pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu

Lines 355 to 356 in dc9c507

    
               if (n_sample == 1 && maxShared >= requiredShared) { 
        
                 // Optimized allocation-free implementation

So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath

pytorch-bot · 2022-10-06T00:14:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86342

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit cbe71cf:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kulinseth

We need to enable Caching for this Op. @malfet , currently if you see the operation is not cached and leads to recompilation in each call. That would have Perf implications as well. Can we please enable that support here? You can see how its done in:.

malfet · 2022-10-07T00:07:10Z

@kulinseth thank you for pointing this out, will do in a separate PR, as it does not have anything to do with this optimization.

malfet · 2022-10-07T00:07:17Z

@pytorchbot merge -g

pytorchmergebot · 2022-10-07T00:08:38Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review)

Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review) Pull Request resolved: #86437 Approved by: https://github.com/kulinseth

Summary: So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath CUDA already has this optimization, see https://github.com/pytorch/pytorch/blob/dc9c507d24d0c833cb09105177326f1f6bbe99c4/aten/src/ATen/native/cuda/MultinomialKernel.cu#L355-L356 Pull Request resolved: #86342 Approved by: https://github.com/ngimel Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/2232db7fc12301a2226d1921948917d5b23b6888 Reviewed By: seemethere Differential Revision: D40167219 Pulled By: seemethere fbshipit-source-id: 52e66c95e74b7f6b2eefbc232e9ac6cac8ced73c

Summary: Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review) Pull Request resolved: #86437 Approved by: https://github.com/kulinseth Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/10aead9adc20bd45b7692e97a64cb76f114c8e16 Reviewed By: seemethere Differential Revision: D40196735 Pulled By: seemethere fbshipit-source-id: 89af8392d697def2a8bef21c79147767b3a8d098

Replacement is irrelevant for 1-sample multinomial

cbe71cf

So use fast path, both on CPU and on MPS Also, remove some spurious copy-n-paste checks from MPS codepath

malfet requested a review from kulinseth as a code owner October 6, 2022 00:14

pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Oct 6, 2022

facebook-github-bot added the cla signed label Oct 6, 2022

malfet added the topic: performance topic category label Oct 6, 2022

ngimel approved these changes Oct 6, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 6, 2022

kulinseth reviewed Oct 6, 2022

View reviewed changes

pytorchmergebot added the Merged label Oct 7, 2022

pytorchmergebot closed this in 2232db7 Oct 7, 2022

malfet deleted the malfet/speed-up-multinomial-for-1-sample branch October 7, 2022 00:09

malfet added a commit that referenced this pull request Oct 7, 2022

[MPS] Cache multinomial_with_replacement graph

c3ffff9

Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review)

malfet mentioned this pull request Oct 7, 2022

[MPS] Cache multinomial_with_replacement graph #86437

Closed

malfet added a commit that referenced this pull request Oct 7, 2022

[MPS] Cache multinomial_with_replacement graph

1c25b0a

Reuse existing RandomCachedGraph to keep RNG state as part of the graph Add `CreateCachedGraphAs` convenience wrapper Addresses #86342 (review)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacement is irrelevant for 1-sample multinomial #86342

Replacement is irrelevant for 1-sample multinomial #86342

malfet commented Oct 6, 2022 •

edited

pytorch-bot bot commented Oct 6, 2022 •

edited

kulinseth left a comment

malfet commented Oct 7, 2022

malfet commented Oct 7, 2022

pytorchmergebot commented Oct 7, 2022

	if (n_sample == 1 && maxShared >= requiredShared) {
	// Optimized allocation-free implementation

Replacement is irrelevant for 1-sample multinomial #86342

Replacement is irrelevant for 1-sample multinomial #86342

Conversation

malfet commented Oct 6, 2022 • edited

pytorch-bot bot commented Oct 6, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86342

✅ No Failures, 1 Pending

kulinseth left a comment

Choose a reason for hiding this comment

malfet commented Oct 7, 2022

malfet commented Oct 7, 2022

pytorchmergebot commented Oct 7, 2022

Merge started

malfet commented Oct 6, 2022 •

edited

pytorch-bot bot commented Oct 6, 2022 •

edited