[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape #110892

swolchok · 2023-10-09T20:48:30Z

Stack from ghstack (oldest at bottom):

If a Tensor can be reused and has static shape, we can just cache it across iterations.

This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning.

Differential Revision: D50023678

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

…ave static shape If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) [ghstack-poisoned]

pytorch-bot · 2023-10-09T20:48:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110892

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit b591468 with merge base 2edc75a ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ave static shape If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) ghstack-source-id: 203410034 Pull Request resolved: #110892

torch/_inductor/config.py

… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This and the following diff are meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203460350 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)

desertfire

Left comments in the internal diff. Let me know when you think this is ready for review.

… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

pytorchmergebot · 2023-10-10T18:34:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…als when tensors provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203568593 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)

pytorchmergebot · 2023-10-10T19:53:23Z

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

swolchok · 2023-10-10T19:59:50Z

@pytorchbot merge

pytorchmergebot · 2023-10-10T20:01:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-10-11T00:15:56Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

… provably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203869660 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)

…rovably have static shape" If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…ave static shape Pull Request resolved: #110892 If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. ghstack-source-id: 203924205 @exported-using-ghexport Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/)

swolchok · 2023-10-13T16:04:20Z

@pytorchbot merge

pytorchmergebot · 2023-10-13T16:06:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…on-AOT mode" We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

We found performance regression when using cpp wrapper in non-AOT mode due to the change in #110892. #110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. Pull Request resolved: #114741 Approved by: https://github.com/jgong5, https://github.com/desertfire

We found performance regression when using cpp wrapper in non-AOT mode due to the change in pytorch#110892. pytorch#110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. Pull Request resolved: pytorch#114741 Approved by: https://github.com/jgong5, https://github.com/desertfire

This was referenced Oct 9, 2023

[PyTorch] -DNDEBUG in inductor codecache builds #110876

Closed

[PyTorch] AOTI: add CPU fast path in aoti_torch_empty_strided #110877

Closed

github-actions bot added module: inductor ciflow/inductor labels Oct 9, 2023

swolchok commented Oct 9, 2023

View reviewed changes

torch/_inductor/config.py Outdated Show resolved Hide resolved

swolchok mentioned this pull request Oct 9, 2023

[PyTorch] AOTI: Add aoti_torch_assign_tensors to ABI #110909

Closed

desertfire reviewed Oct 10, 2023

View reviewed changes

bertmaher approved these changes Oct 10, 2023

View reviewed changes

swolchok added the topic: not user facing topic category label Oct 10, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2023

pytorchmergebot added the merging label Oct 10, 2023

swolchok force-pushed the gh/swolchok/588/head branch from ce7b56a to 4e03711 Compare October 10, 2023 19:50

pytorchmergebot removed the merging label Oct 10, 2023

pytorchmergebot added the merging label Oct 10, 2023

pytorchmergebot removed the merging label Oct 11, 2023

pytorchmergebot added the merging label Oct 13, 2023

pytorchmergebot added Merged and removed merging labels Oct 13, 2023

pytorchmergebot closed this in 8497533 Oct 13, 2023

facebook-github-bot deleted the gh/swolchok/588/head branch October 17, 2023 14:24

swolchok mentioned this pull request Nov 3, 2023

[PyTorch] Replace cached thread_locals with stack allocation in AOTI #112116

Closed

chunyuan-w mentioned this pull request Nov 29, 2023

Inductor cpp wrapper: fix buffer free in non-AOT mode #114741

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape #110892

[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape #110892

Uh oh!

swolchok commented Oct 9, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 9, 2023 •

edited

Loading

Uh oh!

Uh oh!

desertfire left a comment

Uh oh!

pytorchmergebot commented Oct 10, 2023

Uh oh!

pytorchmergebot commented Oct 10, 2023

Uh oh!

swolchok commented Oct 10, 2023

Uh oh!

pytorchmergebot commented Oct 10, 2023

Uh oh!

pytorchmergebot commented Oct 11, 2023

Uh oh!

swolchok commented Oct 13, 2023

Uh oh!

pytorchmergebot commented Oct 13, 2023

Uh oh!

Uh oh!

[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape #110892

[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape #110892

Uh oh!

Conversation

swolchok commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110892

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

desertfire left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Oct 10, 2023

Merge started

Uh oh!

pytorchmergebot commented Oct 10, 2023

Merge failed

Uh oh!

swolchok commented Oct 10, 2023

Uh oh!

pytorchmergebot commented Oct 10, 2023

Merge started

Uh oh!

pytorchmergebot commented Oct 11, 2023

Merge failed

Uh oh!

swolchok commented Oct 13, 2023

Uh oh!

pytorchmergebot commented Oct 13, 2023

Merge started

Uh oh!

Uh oh!

swolchok commented Oct 9, 2023 •

edited

Loading

pytorch-bot bot commented Oct 9, 2023 •

edited

Loading