-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. #22103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sion from float when the input is small.
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
@pytorchbot retest this please |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
@pytorchbot retest this please |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
@pytorchbot retest this please |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
@gchanan @ngimel @yf225 @li-roy Could you give a quick review? This is actually a pretty simple change, if you view the diff without whitespace changes: https://github.com/pytorch/pytorch/pull/22103/files?w=1 |
|
can we have some performance benchmarks? I'm concerned about int64_t to half conversion on the CPU that's happening for small tensors as a result of this PR
Also, initialTensorOptions() implicitly set scalar type of the tensor to float, so it works now, but the comment in the file says that it's not a stable API, and what if that scalar type ever changes? https://github.com/pytorch/pytorch/blob/4453a1ff887dec226355b375d4f1bfa1eb016728/aten/src/ATen/InitialTensorOptions.h I'd prefer scalar type to be explicitly set to kFloat. |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
@ngimel The conversion from I have added an explicit dtype. Is there a way to trigger the benchmark? |
|
I don't think there are ready benchmarks that you can trigger, you can use benchmarks similar to those in original issue #7606, for before and after your PR. |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
|
Here's the perf. It looks like this patch significantly improves the performance for half on cuda. I turned off CPU turbo boost and always ran three times the benchmark (warmup) before the one that is used, so the results should be reliable. before this patch: after this patch: |
|
I also realized that the |
…sary conversion from float when the input is small." Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small. Previously, when n is small and dtype is not Half, randperm on cuda would offload to CPU with a Float type, which has been changed to Half type in this commit. gh-metadata: pytorch pytorch 22103 gh/xuhdev/2/head
…sion from float when the input is small. Summary: Pull Request resolved: pytorch/pytorch#22103 Test Plan: Imported from OSS Differential Revision: D16153585 Pulled By: li-roy fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da
One important comment is missing from pytorch#22103 (not sure what happened). This commit makes it up.
Summary: One important comment is missing from pytorch/pytorch#22103 (not sure what happened). This commit makes it up. Pull Request resolved: pytorch/pytorch#22984 Differential Revision: D16347044 Pulled By: ezyang fbshipit-source-id: 0903909a5fb6740b43195136f1a23c28cfb2a02f
Stack from ghstack:
Previously, when n is small and dtype is not Half, randperm on cuda
would offload to CPU with a Float type, which has been changed to Half
type in this commit.
This commit basically swaps the following two blocks:
and
Differential Revision: D16153585