-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels for MI300X #160444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* thread_work_size of 16 is giving better perf with many workloads cherry-pick of ROCm@fb81400
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160444
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0a46c2e with merge base 9903ca4 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…for MI300X (#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: #160444 Approved by: https://github.com/jeffdaily
…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily
…for MI300X (#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: #160444 Approved by: https://github.com/jeffdaily
…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily
…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily
…for MI300X (pytorch#160444) * thread_work_size of 16 is giving better perf with many workloads for MI300X cherry-pick of ROCm@fb81400 Pull Request resolved: pytorch#160444 Approved by: https://github.com/jeffdaily
cherry-pick of ROCm@fb81400
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd