New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aotinductor] Avoid generating redundant kernel loading code #110510
Conversation
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 0a8626035ffb259e5ead943685f3b1ed7cc3c531 Pull Request resolved: #110510
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 8deae81099e0270a4ebb0f2569f5f8dbd6fad410 Pull Request resolved: #110510
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 48b35e9ceca2f40ba34a57acdbebcd7fcbf0961f Pull Request resolved: #110510
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 4b85f2909ab748fafd418d6d9a339767dfece063 Pull Request resolved: #110510
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 66f5415322618aa71fa83126ec2e198c0d513937 Pull Request resolved: #110510
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 4725771e16833a6fd2e49cc4488e686f38fe9b00 Pull Request resolved: #110510
@pytorchbot merge -f "only affects AOTInductor tests and they have passed" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Flagging that this caused aot_inductor regression in last day |
This handles the cases like pytorch/pytorch#110608 or pytorch/pytorch#110510 where there were a bunch of infra flaky failures in which the runner crashes and no log was found. The `runner_name` and `failure_line` fields are all empty in such cases. Having no associated runner guarantees that the failure is an unrelated infra flake. ### Testing * **With pytorch/pytorch#110608 <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/110608](https://hud.pytorch.org/pr/110608) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110608/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110608/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ✅ You can merge normally! (7 Unrelated Failures) As of commit 2c38c884c7a8a39d713167cdc789d0e0f332f019 with merge base f17fe89e14ef7c29690d989c857ae011b8589b80 (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1696518288?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details ><summary><b>FLAKY</b> - The following jobs failed but were likely due to flakiness present on trunk:</summary><p> * [pull / linux-focal-py3.11-clang10 / test (default, 3, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434720187) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434720187)) * [pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434716451) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434716451)) * [pull / linux-focal-py3.8-clang10-onnx / test (default, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434729402) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434729402)) * [pull / linux-jammy-py3.8-gcc11 / test (backwards_compat, 1, 1, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434761501) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434761501)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434758713) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434758713)) * [pull / linux-jammy-py3.8-gcc11 / test (distributed, 2, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434762139) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434762139)) * [pull / linux-jammy-py3.9-clang12-asan / test (default, 3, 6, linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434743579) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434743579)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end --> * **With pytorch/pytorch#110510 <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/110510](https://hud.pytorch.org/pr/110510) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110510/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110510/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ❌ 1 New Failure, 7 Unrelated Failures As of commit b4b55bd4421e4af1f6749a8ebaa557a49e66c9ae with merge base cf1b494afd0d0368c22e70e93d91da3d9fe1ddce (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1696501025?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details open><summary><b>NEW FAILURE</b> - The following job has failed:</summary><p> * [inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17435647364) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420900552/job/17435647364)) </p></details> <details ><summary><b>FLAKY</b> - The following jobs failed but were likely due to flakiness present on trunk:</summary><p> * [pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434711945) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434711945)) * [pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434570849) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434570849)) * [pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434594635) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434594635)) * [pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434595559) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434595559)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434611759) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434611759)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434612149) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434612149)) * [pull / linux-jammy-py3.9-clang12-asan / test (default, 4, 6, linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434586348) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434586348)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end -->
@pytorchbot drci (Please ignore this comment, I'm testing Dr.CI) |
Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may in a later PR. [ghstack-poisoned]
Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may in a later PR. ghstack-source-id: d2d5531df77c4e69c38e0e13c21278ca6943f0f0 Pull Request resolved: #110800
Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may do in a later PR. Pull Request resolved: #110800 Approved by: https://github.com/jansel
Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]
Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]
…TInductor" Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
…TInductor" Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]
Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: #110972 Approved by: https://github.com/chenyang78
Stack from ghstack (oldest at bottom):
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler