[aotinductor] Avoid generating redundant kernel loading code #110510

desertfire · 2023-10-04T12:41:16Z

Stack from ghstack (oldest at bottom):

-> [aotinductor] Avoid generating redundant kernel loading code #110510

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. [ghstack-poisoned]

pytorch-bot · 2023-10-04T12:41:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110510

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 7 Unrelated Failures

As of commit b4b55bd with merge base cf1b494 ():

NEW FAILURE - The following job has failed:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 0a8626035ffb259e5ead943685f3b1ed7cc3c531 Pull Request resolved: #110510

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 8deae81099e0270a4ebb0f2569f5f8dbd6fad410 Pull Request resolved: #110510

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 48b35e9ceca2f40ba34a57acdbebcd7fcbf0961f Pull Request resolved: #110510

chenyang78

LGTM. Thanks.

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 4b85f2909ab748fafd418d6d9a339767dfece063 Pull Request resolved: #110510

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 66f5415322618aa71fa83126ec2e198c0d513937 Pull Request resolved: #110510

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves #105553. ghstack-source-id: 4725771e16833a6fd2e49cc4488e686f38fe9b00 Pull Request resolved: #110510

desertfire · 2023-10-05T19:57:54Z

@pytorchbot merge -f "only affects AOTInductor tests and they have passed"

pytorchmergebot · 2023-10-05T19:59:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eellison · 2023-10-06T21:57:00Z

Flagging that this caused aot_inductor regression in last day

This handles the cases like pytorch/pytorch#110608 or pytorch/pytorch#110510 where there were a bunch of infra flaky failures in which the runner crashes and no log was found. The `runner_name` and `failure_line` fields are all empty in such cases. Having no associated runner guarantees that the failure is an unrelated infra flake. ### Testing * **With pytorch/pytorch#110608  ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/110608](https://hud.pytorch.org/pr/110608) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110608/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110608/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ✅ You can merge normally! (7 Unrelated Failures) As of commit 2c38c884c7a8a39d713167cdc789d0e0f332f019 with merge base f17fe89e14ef7c29690d989c857ae011b8589b80 (<img alt="image" width=70 src="https://img.shields.io/date/1696518288?label=&color=FFFFFF&style=flat-square">): <details ><summary>FLAKY - The following jobs failed but were likely due to flakiness present on trunk:</summary> * [pull / linux-focal-py3.11-clang10 / test (default, 3, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434720187) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434720187)) * [pull / linux-focal-py3.8-clang10 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434716451) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434716451)) * [pull / linux-focal-py3.8-clang10-onnx / test (default, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434729402) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434729402)) * [pull / linux-jammy-py3.8-gcc11 / test (backwards_compat, 1, 1, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434761501) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434761501)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434758713) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434758713)) * [pull / linux-jammy-py3.8-gcc11 / test (distributed, 2, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434762139) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434762139)) * [pull / linux-jammy-py3.9-clang12-asan / test (default, 3, 6, linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110608#17434743579) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420960240/job/17434743579)) </details> This comment was automatically generated by Dr. CI and updates every 15 minutes.  * **With pytorch/pytorch#110510  ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/110510](https://hud.pytorch.org/pr/110510) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110510/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/110510/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ❌ 1 New Failure, 7 Unrelated Failures As of commit b4b55bd4421e4af1f6749a8ebaa557a49e66c9ae with merge base cf1b494afd0d0368c22e70e93d91da3d9fe1ddce (<img alt="image" width=70 src="https://img.shields.io/date/1696501025?label=&color=FFFFFF&style=flat-square">): <details open><summary>NEW FAILURE - The following job has failed:</summary> * [inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17435647364) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420900552/job/17435647364)) </details> <details ><summary>FLAKY - The following jobs failed but were likely due to flakiness present on trunk:</summary> * [pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434711945) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434711945)) * [pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434570849) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434570849)) * [pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434594635) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434594635)) * [pull / linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434595559) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434595559)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434611759) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434611759)) * [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3, linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434612149) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434612149)) * [pull / linux-jammy-py3.9-clang12-asan / test (default, 4, 6, linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/110510#17434586348) ([gh](https://github.com/pytorch/pytorch/actions/runs/6420898905/job/17434586348)) </details> This comment was automatically generated by Dr. CI and updates every 15 minutes.

huydhn · 2023-10-06T23:33:07Z

@pytorchbot drci

(Please ignore this comment, I'm testing Dr.CI)

Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may in a later PR. [ghstack-poisoned]

Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may in a later PR. ghstack-source-id: d2d5531df77c4e69c38e0e13c21278ca6943f0f0 Pull Request resolved: #110800

Summary: Forward fix a performance regression caused by #110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may do in a later PR. Pull Request resolved: #110800 Approved by: https://github.com/jansel

Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 6ffb0e3e035061dc24881cdd11651cf4e5122d2e Pull Request resolved: #110972

Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 329434d1fd74d36cede2096033da26615916606d Pull Request resolved: #110972

…TInductor" Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 93e313748ee7cbc37cac4bcbc44baa0e93c21438 Pull Request resolved: #110972

…TInductor" Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng [ghstack-poisoned]

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 039a824462413614d918d9dba63507f7251770a9 Pull Request resolved: #110972

Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: #110972 Approved by: https://github.com/chenyang78

desertfire mentioned this pull request Oct 4, 2023

[aotinductor] Enable test_non_default_cuda_device on CI #110509

Closed

github-actions bot added module: inductor ciflow/inductor labels Oct 4, 2023

desertfire requested review from jansel, angelayi and chenyang78 October 4, 2023 13:50

desertfire added the topic: not user facing topic category label Oct 4, 2023

chenyang78 approved these changes Oct 4, 2023

View reviewed changes

jansel approved these changes Oct 5, 2023

View reviewed changes

pytorchmergebot added the merging label Oct 5, 2023

pytorchmergebot added Merged and removed merging labels Oct 5, 2023

pytorchmergebot closed this in 298f01d Oct 5, 2023

huydhn mentioned this pull request Oct 6, 2023

Classify infra failures without any associated runners as flaky pytorch/test-infra#4622

Merged

desertfire mentioned this pull request Oct 7, 2023

[aotindutor] Forward fix a performance regression #110800

Closed

facebook-github-bot deleted the gh/desertfire/234/head branch October 9, 2023 14:23

desertfire mentioned this pull request Oct 10, 2023

[aotinductor] Add a perf smoke test for AOTInductor #110972

Closed

desertfire added a commit that referenced this pull request Oct 10, 2023

[aotinductor] Add a perf smoke test for AOTInductor

fb3a49b

Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]

desertfire added a commit that referenced this pull request Oct 10, 2023

[aotinductor] Add a perf smoke test for AOTInductor

88d698e

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 6ffb0e3e035061dc24881cdd11651cf4e5122d2e Pull Request resolved: #110972

desertfire added a commit that referenced this pull request Oct 10, 2023

Update on "[aotinductor] Add a perf smoke test for AOTInductor"

385c852

Summary: To prevent perf regression like the one caused by #110510 [ghstack-poisoned]

desertfire added a commit that referenced this pull request Oct 10, 2023

[aotinductor] Add a perf smoke test for AOTInductor

c04e236

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 329434d1fd74d36cede2096033da26615916606d Pull Request resolved: #110972

desertfire added a commit that referenced this pull request Oct 10, 2023

[aotinductor] Add a perf smoke test for AOTInductor

788b8de

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 93e313748ee7cbc37cac4bcbc44baa0e93c21438 Pull Request resolved: #110972

desertfire added a commit that referenced this pull request Oct 11, 2023

[aotinductor] Add a perf smoke test for AOTInductor

31f0968

Summary: To prevent perf regression like the one caused by #110510 ghstack-source-id: 039a824462413614d918d9dba63507f7251770a9 Pull Request resolved: #110972

pytorchmergebot pushed a commit that referenced this pull request Oct 11, 2023

[aotinductor] Add a perf smoke test for AOTInductor (#110972)

4abfa22

Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: #110972 Approved by: https://github.com/chenyang78

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aotinductor] Avoid generating redundant kernel loading code #110510

[aotinductor] Avoid generating redundant kernel loading code #110510

desertfire commented Oct 4, 2023 •

edited

pytorch-bot bot commented Oct 4, 2023 •

edited

chenyang78 left a comment

desertfire commented Oct 5, 2023

pytorchmergebot commented Oct 5, 2023

eellison commented Oct 6, 2023

huydhn commented Oct 6, 2023

[aotinductor] Avoid generating redundant kernel loading code #110510

[aotinductor] Avoid generating redundant kernel loading code #110510

Conversation

desertfire commented Oct 4, 2023 • edited

pytorch-bot bot commented Oct 4, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110510

❌ 1 New Failure, 7 Unrelated Failures

chenyang78 left a comment

Choose a reason for hiding this comment

desertfire commented Oct 5, 2023

pytorchmergebot commented Oct 5, 2023

Merge started

eellison commented Oct 6, 2023

huydhn commented Oct 6, 2023

desertfire commented Oct 4, 2023 •

edited

pytorch-bot bot commented Oct 4, 2023 •

edited