[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape#110892
[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape#110892swolchok wants to merge 6 commits intogh/swolchok/588/basefrom
Conversation
…ave static shape If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110892
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit b591468 with merge base 2edc75a ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ave static shape If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) ghstack-source-id: 203410034 Pull Request resolved: #110892
… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203460350 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)
desertfire
left a comment
There was a problem hiding this comment.
Left comments in the internal diff. Let me know when you think this is ready for review.
… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…als when tensors provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
ce7b56a to
4e03711
Compare
…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203568593 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203869660 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)
…rovably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203924205 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. Pull Request resolved: #114741 Approved by: https://github.com/jgong5, https://github.com/desertfire
We found performance regression when using cpp wrapper in non-AOT mode due to the change in pytorch#110892. pytorch#110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. Pull Request resolved: pytorch#114741 Approved by: https://github.com/jgong5, https://github.com/desertfire
Stack from ghstack (oldest at bottom):
If a Tensor can be reused and has static shape, we can just cache it across iterations.
This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning.
Differential Revision: D50023678
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler