[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs #128750

jerrymannil · 2024-06-14T22:10:05Z

Current implementation is very specific to MI100.
This is causing performance degradation for other GPUs.

Benchmarking on MI300X:

Before:  1918.5126953125 ms
After: 0.8285150527954102 ms

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

Fixes #128631 Current implementation is very specific to MI100. This is causing performance degradation for other GPUs.

pytorch-bot · 2024-06-14T22:10:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128750

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 36a1e1a with merge base bca2cf0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-06-14T22:10:09Z

The committers listed above are authorized under a signed CLA.

✅ login: jeffdaily / name: Jeff Daily (36a1e1a)
✅ login: jerrymannil / name: Jerry Mannil (b59ef58)

aten/src/ATen/cuda/Atomic.cuh

xw285cornell · 2024-06-15T01:47:34Z

THis is awesome! Can you add some benchmark results for this change?

facebook-github-bot · 2024-06-15T01:48:04Z

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jerrymannil · 2024-06-17T16:13:25Z

@xw285cornell

Benchmarking on MI300X

Before:  1918.5126953125 ms
After: 0.8285150527954102 ms

jerrymannil · 2024-06-18T14:31:18Z

@eqy Can you look ?

xw285cornell · 2024-06-19T03:10:04Z

very nice, thank you for the contribution!

facebook-github-bot · 2024-06-19T03:54:48Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-06-19T03:56:12Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Current implementation is very specific to MI100. This is causing performance degradation for other GPUs. Fixes pytorch#128631 Benchmarking on MI300X: ``` Before: 1918.5126953125 ms After: 0.8285150527954102 ms ``` Co-authored-by: Jeff Daily <jeff.daily@amd.com> Pull Request resolved: pytorch#128750 Approved by: https://github.com/xw285cornell (cherry picked from commit 1f0a68b)

[ROCM] Fix fp32 atomicAdd for non-MI100 GPUs

b59ef58

Fixes #128631 Current implementation is very specific to MI100. This is causing performance degradation for other GPUs.

jerrymannil requested a review from eqy as a code owner June 14, 2024 22:10

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Jun 14, 2024

pytorchbot added the open source label Jun 14, 2024

jeffdaily reviewed Jun 14, 2024

View reviewed changes

aten/src/ATen/cuda/Atomic.cuh Outdated Show resolved Hide resolved

jeffdaily changed the title ~~[ROCM] Fix fp32 atomicAdd for non-MI100 GPUs~~ [ROCm] Fix fp32 atomicAdd for non-MI100 GPUs Jun 14, 2024

use unsafeAtomicAdd

36a1e1a

jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Jun 15, 2024

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2024

jataylo requested review from malfet and xw285cornell June 18, 2024 14:43

xw285cornell approved these changes Jun 19, 2024

View reviewed changes

pytorchmergebot added the merging label Jun 19, 2024

pytorchmergebot added the Merged label Jun 19, 2024

pytorchmergebot closed this in 1f0a68b Jun 19, 2024

pytorchmergebot removed the merging label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs #128750

[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs #128750

Uh oh!

jerrymannil commented Jun 14, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 14, 2024 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jun 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

xw285cornell commented Jun 15, 2024

Uh oh!

facebook-github-bot commented Jun 15, 2024

Uh oh!

jerrymannil commented Jun 17, 2024 •

edited

Loading

Uh oh!

jerrymannil commented Jun 18, 2024

Uh oh!

xw285cornell commented Jun 19, 2024

Uh oh!

facebook-github-bot commented Jun 19, 2024

Uh oh!

pytorchmergebot commented Jun 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs #128750

[ROCm] Fix fp32 atomicAdd for non-MI100 GPUs #128750

Uh oh!

Conversation

jerrymannil commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128750

✅ No Failures

Uh oh!

linux-foundation-easycla bot commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xw285cornell commented Jun 15, 2024

Uh oh!

facebook-github-bot commented Jun 15, 2024

Uh oh!

jerrymannil commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerrymannil commented Jun 18, 2024

Uh oh!

xw285cornell commented Jun 19, 2024

Uh oh!

facebook-github-bot commented Jun 19, 2024

Uh oh!

pytorchmergebot commented Jun 19, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jerrymannil commented Jun 14, 2024 •

edited

Loading

pytorch-bot bot commented Jun 14, 2024 •

edited

Loading

linux-foundation-easycla bot commented Jun 14, 2024 •

edited

Loading

jerrymannil commented Jun 17, 2024 •

edited

Loading