Skip to content

Conversation

ngimel
Copy link
Collaborator

@ngimel ngimel commented Jul 6, 2020

Per title. This is not used currently in the pytorch codebase, but it is a legitimate usecase, and we have extensions that want to do that and are forced to roll their own atomic implementations for non-standard types. Whether atomic op returns old value or not should not affect performance, compiler is able to generate correct code depending on whether return value is used. https://godbolt.org/z/DBU_UW.
Atomic operations for non-standard integer types (1,2 and 8 byte-width) are left as is, with void return.

@ngimel ngimel requested review from ezyang and mcarilli July 6, 2020 19:18
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 58d7d91.

@jeffdaily
Copy link
Collaborator

@ngimel would you support a PR to have a non-returning interface in addition to your change here? Since atomic return values are not currently used in the pytorch codebase, it would be self-documenting to use something like atomicAddNoReturn. Specifically for ROCm, we can better optimize when the user opts in like this. Forcing atomic to return a value prohibits our optimization today.

Which extensions are you referring to that utilize atomic return values?

@ngimel
Copy link
Collaborator Author

ngimel commented Jul 8, 2020

Some internal custom ops (optimizers mostly) want to have old value, and they are compiling for double in half because that's how dispatch macros work, so they can't use built-in atomicAdd, and are forced to copy-paste things from THCAtomic adding return value. It's a legitimate request for someone to return old value, so having no mechanism for doing that is not ideal.
Sure, we can have atomiAddNoReturn. Out of curiosity, what's preventing ROCm from doing the same thing that nvcc does, and compile to fast instruction if return is ignored, and slow instruction if it's used? I posted godbolt in the PR description illustrating nvcc behavior.

facebook-github-bot pushed a commit that referenced this pull request Jun 29, 2021
Summary:
Enables an important performance optimization for ROCm, in light of the discussion in #41028.

CC jithunnair-amd sunway513

Pull Request resolved: #60607

Reviewed By: jbschlosser

Differential Revision: D29409894

Pulled By: ngimel

fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Jun 29, 2021
Summary:
Enables an important performance optimization for ROCm, in light of the discussion in pytorch#41028.

CC jithunnair-amd sunway513

Pull Request resolved: pytorch#60607

Reviewed By: jbschlosser

Differential Revision: D29409894

Pulled By: ngimel

fbshipit-source-id: effca258a0f37eaefa35674a7fd19459ca7dc95b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants