-
Notifications
You must be signed in to change notification settings - Fork 25.6k
move uniform_ from TH to ATen and optimize with MKL VSL #30954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
performanceBelow shows the performance compare between original and this pr, the benchmark will compare from:
benchmark code available at this op_bench-py, to reproduce, benchmark uses the metric of numbers processed per second, aka the throughput, the higher the better. single socket
single core
ValidationTo prove the MKL VSL random generator as functional as ATen implementation, test cases are also provided in The test case uses T-test to verify the mean and variance of MKL VSL output aligns with theoretical mean and variance for uniform distribution:
I do cross check the test cases with ATen implementation and MKL VSL implementation, both passed. |
#24781 Migrate uniform_ from the TH to Aten (CPU) |
💊 CircleCI build failures summary and remediationsAs of commit 8097855 (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following build failures do not appear to be due to upstream breakages:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need two copies of this code compiled (AVX on/off)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, I just aligned uniform_mkl_kernel
with bernoulli_mkl_kernel
in the same manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TensorIterator, please rebase.
Code looks good, please rebase and get rid of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
cc @pbelevich as it is related to random generators |
I had no time to check in details, but errors looks like related to the introduced change and I cannot merge it as is. |
yes, i was aware of the errors are real... Somehow I have some trouble in understanding the test cases. Still WIP. |
Remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase
Rebased |
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
rebased |
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…igrate uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…igrate CPU uniform_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…rm_ from TH to ATen)" `uniform_kernel_cpu` is based on #30954 Differential Revision: [D20820221](https://our.internmc.facebook.com/intern/diff/D20820221) [ghstack-poisoned]
…ATen) (pytorch#35580) Summary: Pull Request resolved: pytorch#35580 `uniform_kernel_cpu` is based on pytorch#30954 Test Plan: Imported from OSS Differential Revision: D20820221 Pulled By: pbelevich fbshipit-source-id: 13f9fc8fc75b0e9fb48021f2ac08dcb38212a53f
I think we can close this one, as it is already ported. |
This PR moves
uniform_
CPU implementation from TH to ATen, e.g.uniform_cpu_
.When PyTorch is built with MKL support and CPU is deteced to be intel CPU,
uniform_cpu_
will finally call MKL VSL random generator which assures best performance on intel CPUs.Otherwise (MKL is not built or AMD CPU),
uniform_cpu_
will end up in sequential random generator from ATen, this path is identical to old TH implementation.Single socket run speedup: 2.4x~142x
Single core run speedup: 3.2x~13x