New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOTI] Support module buffer mutation #123164
Conversation
Summary: Fixes #120424. Because in a forward pass module buffers may be mutated, we need to allow that in AOTI. In addition, this will be a necessary step if we want to extend AOTI to training. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123164
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 4faec2d with merge base bcb6e5a (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Reland #122824 because there was a merge conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (provided that Mac CI is green)
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 2 checks: pull / linux-docs / build-docs-python-false, pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, linux.4xlarge.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
Summary: Fixes #120424. Because in a forward pass module buffers may be mutated, we need to allow that in AOTI. In addition, this will be a necessary step if we want to extend AOTI to training. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames chauhang [ghstack-poisoned]
@pytorchbot merge -f "Trunk test passed previously; fixed a minor merge conflict" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@desertfire This is the second recent AOTI-related PR that has broken ROCm CI due to nosignal; the other one was 122882. Since the only ROCm CI jobs that run by default on AOTI PRs are the ones for Line 318 in 9288b27
It seems like it would add about an hour more to the inductor workflow test jobs for both CUDA and ROCm. Would that be acceptable? Testing this out in #123340 |
@desertfire Please provide forward fix for this issue this is a failing test on the trunk:
|
#123164 removed the below code (so that constants are not readonly) to support module buffer mutation: https://github.com/pytorch/pytorch/blob/a9a9ce6d9cf25f4fb87e1d74c79781dc404f0c59/torch/_inductor/codecache.py#L1685-L1691 However, it may cause relocation overflow when the `.data` section is large. Below is part of the output from `ld --versbose` (`GNU ld (GNU Binutils for Ubuntu) 2.38`). `.data` is in between `.text` and `.bss`. When `.data` is too large, during the linking, the relocation of `.text` against `.bss` may overflow. Rename it to `.ldata` (previously `.lrodata` is used) so that it won't be in between the `.text` and `.bss` section ``` .text .data .bss .lrodata .ldata ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
AOTI changes have been breaking for ROCm on trunk because we do not have testing of AOTI in inductor/pull/trunk workflow for ROCm. This PR adds `test_aot_inductor` to inductor workflow to catch such issues. More context here: #123164 (comment) Runtime increase for inductor workflow: CUDA: PR corresponding to base commit used for this PR: [100 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23415210028?pr=123290) This PR: [183 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23465530389?pr=123340) ROCM: PR corresponding to base commit used for this PR: [105 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23416422145?pr=123290) This PR: [148 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23466516866?pr=123340) Pull Request resolved: #123340 Approved by: https://github.com/atalman, https://github.com/desertfire
#123164 removed the below code (so that constants are not readonly) to support module buffer mutation: https://github.com/pytorch/pytorch/blob/a9a9ce6d9cf25f4fb87e1d74c79781dc404f0c59/torch/_inductor/codecache.py#L1685-L1691 However, it may cause relocation overflow when the `.data` section is large. Below is part of the output from `ld --versbose` (`GNU ld (GNU Binutils for Ubuntu) 2.38`). `.data` is in between `.text` and `.bss`. When `.data` is too large, during the linking, the relocation of `.text` against `.bss` may overflow. Rename it to `.ldata` (perhaps that's why previously `.lrodata` instead of `.rodata` is used) so that it won't be in between the `.text` and `.bss` section ``` .text .rodata .data .bss .lrodata .ldata ``` We met this issue when fixing #114450 and running the below models on CPU: - AlbertForMaskedLM - AlbertForQuestionAnswering - BlenderbotForCausalLM - DebertaV2ForMaskedLM - DebertaV2ForQuestionAnswering - XGLMForCausalLM Pull Request resolved: #123639 Approved by: https://github.com/jgong5, https://github.com/desertfire
Summary: Fixes pytorch#120424. Because in a forward pass module buffers may be mutated, we need to allow that in AOTI. In addition, this will be a necessary step if we want to extend AOTI to training. Pull Request resolved: pytorch#123164 Approved by: https://github.com/digantdesai, https://github.com/malfet, https://github.com/chenyang78, https://github.com/khabinov
AOTI changes have been breaking for ROCm on trunk because we do not have testing of AOTI in inductor/pull/trunk workflow for ROCm. This PR adds `test_aot_inductor` to inductor workflow to catch such issues. More context here: pytorch#123164 (comment) Runtime increase for inductor workflow: CUDA: PR corresponding to base commit used for this PR: [100 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23415210028?pr=123290) This PR: [183 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23465530389?pr=123340) ROCM: PR corresponding to base commit used for this PR: [105 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23416422145?pr=123290) This PR: [148 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23466516866?pr=123340) Pull Request resolved: pytorch#123340 Approved by: https://github.com/atalman, https://github.com/desertfire
) pytorch#123164 removed the below code (so that constants are not readonly) to support module buffer mutation: https://github.com/pytorch/pytorch/blob/a9a9ce6d9cf25f4fb87e1d74c79781dc404f0c59/torch/_inductor/codecache.py#L1685-L1691 However, it may cause relocation overflow when the `.data` section is large. Below is part of the output from `ld --versbose` (`GNU ld (GNU Binutils for Ubuntu) 2.38`). `.data` is in between `.text` and `.bss`. When `.data` is too large, during the linking, the relocation of `.text` against `.bss` may overflow. Rename it to `.ldata` (perhaps that's why previously `.lrodata` instead of `.rodata` is used) so that it won't be in between the `.text` and `.bss` section ``` .text .rodata .data .bss .lrodata .ldata ``` We met this issue when fixing pytorch#114450 and running the below models on CPU: - AlbertForMaskedLM - AlbertForQuestionAnswering - BlenderbotForCausalLM - DebertaV2ForMaskedLM - DebertaV2ForQuestionAnswering - XGLMForCausalLM Pull Request resolved: pytorch#123639 Approved by: https://github.com/jgong5, https://github.com/desertfire
) pytorch#123164 removed the below code (so that constants are not readonly) to support module buffer mutation: https://github.com/pytorch/pytorch/blob/a9a9ce6d9cf25f4fb87e1d74c79781dc404f0c59/torch/_inductor/codecache.py#L1685-L1691 However, it may cause relocation overflow when the `.data` section is large. Below is part of the output from `ld --versbose` (`GNU ld (GNU Binutils for Ubuntu) 2.38`). `.data` is in between `.text` and `.bss`. When `.data` is too large, during the linking, the relocation of `.text` against `.bss` may overflow. Rename it to `.ldata` (perhaps that's why previously `.lrodata` instead of `.rodata` is used) so that it won't be in between the `.text` and `.bss` section ``` .text .rodata .data .bss .lrodata .ldata ``` We met this issue when fixing pytorch#114450 and running the below models on CPU: - AlbertForMaskedLM - AlbertForQuestionAnswering - BlenderbotForCausalLM - DebertaV2ForMaskedLM - DebertaV2ForQuestionAnswering - XGLMForCausalLM Pull Request resolved: pytorch#123639 Approved by: https://github.com/jgong5, https://github.com/desertfire
AOTI changes have been breaking for ROCm on trunk because we do not have testing of AOTI in inductor/pull/trunk workflow for ROCm. This PR adds `test_aot_inductor` to inductor workflow to catch such issues. More context here: #123164 (comment) Runtime increase for inductor workflow: CUDA: PR corresponding to base commit used for this PR: [100 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23415210028?pr=123290) This PR: [183 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23465530389?pr=123340) ROCM: PR corresponding to base commit used for this PR: [105 mins](https://github.com/pytorch/pytorch/actions/runs/8545475047/job/23416422145?pr=123290) This PR: [148 mins](https://github.com/pytorch/pytorch/actions/runs/8562003098/job/23466516866?pr=123340) Pull Request resolved: #123340 Approved by: https://github.com/atalman, https://github.com/desertfire
Stack from ghstack (oldest at bottom):
Summary: Fixes #120424. Because in a forward pass module buffers may be mutated, we need to allow that in AOTI. In addition, this will be a necessary step if we want to extend AOTI to training.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @chauhang