-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Triton] [Inductor] Enable Epilogue Subtiling in the blackwell ws template #163145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163145
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3bdfdae with merge base c261c71 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
c46dda0
to
cce8942
Compare
cce8942
to
4fdb022
Compare
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
Here is an example of the output code with addmm (omitted the definition of
|
4fdb022
to
d15d75e
Compare
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
b75faba
to
64952e8
Compare
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
d15d75e
to
b75faba
Compare
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
1 similar comment
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077
64952e8
to
5cf7a4a
Compare
@eellison My blackwell specific tests all pass. This includes a fix to the tests that may unblock other tests. Once this lands I can help with getting |
@pytorchbot merge |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
…plate (pytorch#163145) Summary: Enables supprot for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Differential Revision: D82610077
add3a59
to
3bdfdae
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…plate (pytorch#163145) Summary: Enables support for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077 Pull Request resolved: pytorch#163145 Approved by: https://github.com/eellison
…plate (#163145) Summary: Enables support for epilogue subtiling in the blackwell ws template. This requires the ability to call `store_output` twice in the same kernel and reuse the same tensor descriptor across allocations. Test Plan: Tested with test_max_autotune.py on a Blackwell server. Rollback Plan: Differential Revision: D82610077 Pull Request resolved: #163145 Approved by: https://github.com/eellison
Summary: Enables support for epilogue subtiling in the blackwell ws template. This requires the ability to call
store_output
twice in the same kernel and reuse the same tensor descriptor across allocations.Test Plan:
Tested with test_max_autotune.py on a Blackwell server.
Rollback Plan:
Differential Revision: D82610077
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @mlazos