[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444

jerrymannil · 2025-08-12T18:53:55Z

thread_work_size of 16 is giving better perf with many workloads for MI300X

cherry-pick of ROCm@fb81400

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

* thread_work_size of 16 is giving better perf with many workloads cherry-pick of ROCm@fb81400

pytorch-bot · 2025-08-12T18:53:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160444

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0a46c2e with merge base 9903ca4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerrymannil · 2025-08-12T22:22:05Z

@pytorchbot merge

pytorchmergebot · 2025-08-12T22:24:03Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…for MI300X (#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: #160444 Approved by: https://github.com/jeffdaily

…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily

…for MI300X (#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: #160444 Approved by: https://github.com/jeffdaily

…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily

…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily

…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels

0a46c2e

* thread_work_size of 16 is giving better perf with many workloads cherry-pick of ROCm@fb81400

jerrymannil requested review from eqy and syed-ahmed as code owners August 12, 2025 18:53

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Aug 12, 2025

jerrymannil changed the title ~~[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels~~ [ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X Aug 12, 2025

pytorchbot added the open source label Aug 12, 2025

jeffdaily approved these changes Aug 12, 2025

View reviewed changes

jeffdaily added release notes: rocm mandatorylabel ciflow/rocm Trigger "default" config CI on ROCm and removed release notes: cuda release notes category labels Aug 12, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 12, 2025

pytorchmergebot added the merging label Aug 12, 2025

pytorchmergebot closed this in ba47821 Aug 13, 2025

pytorchmergebot added Merged and removed merging labels Aug 13, 2025

jerrymannil deleted the patch-2 branch August 26, 2025 20:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444

Uh oh!

jerrymannil commented Aug 12, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

jerrymannil commented Aug 12, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444

[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444

Uh oh!

Conversation

jerrymannil commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160444

✅ No Failures

Uh oh!

jerrymannil commented Aug 12, 2025

Uh oh!

pytorchmergebot commented Aug 12, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerrymannil commented Aug 12, 2025 •

edited

Loading

pytorch-bot bot commented Aug 12, 2025 •

edited

Loading