Skip to content

Conversation

desertfire
Copy link
Contributor

@desertfire desertfire commented Jun 20, 2024

Stack from ghstack (oldest at bottom):

Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune. This is to prepare for one-pass AOTI codegen implementation.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang

Differential Revision: D61018114

Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129135

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit dc0e943 with merge base 92151c8 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

ghstack-source-id: f473a52
Pull Request resolved: #129268
desertfire added a commit that referenced this pull request Jun 23, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 24, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 24, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

ghstack-source-id: 1495522
Pull Request resolved: #129268
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 25, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jun 25, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Jul 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

ghstack-source-id: 886857a
Pull Request resolved: #129268
@desertfire desertfire requested a review from eellison August 14, 2024 13:08
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@eellison
Copy link
Contributor

eellison commented Aug 14, 2024

i can review if not, but maybe one of the other people working on aot inductor from ae gpu can review ?

@eellison eellison removed their request for review August 14, 2024 22:49
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 19, 2024
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune. This is to prepare for one-pass AOTI codegen implementation.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

Differential Revision: [D61018114](https://our.internmc.facebook.com/intern/diff/D61018114)

[ghstack-poisoned]
@desertfire desertfire added the ciflow/linux-aarch64 linux aarch64 CI workflow label Aug 19, 2024
desertfire added a commit that referenced this pull request Aug 19, 2024
Summary: When generating CUDA kernel load and launch, certain Triton kernel meta data are needed, but those meta data only exist after kernel auto-tune is done. DeferredCudaKernelLine is a deferred line which can backfill a string template after kernel auto-tune.

ghstack-source-id: f35f762
Pull Request resolved: #129135
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

desertfire added a commit that referenced this pull request Aug 21, 2024
…cuda cpp wrapper"

Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Aug 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Aug 21, 2024
…cuda cpp wrapper"

Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Aug 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Aug 21, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

ghstack-source-id: cc942cc
Pull Request resolved: #129268
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2024
Summary: Similar to #129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

Differential Revision: [D61800622](https://our.internmc.facebook.com/intern/diff/D61800622)
Pull Request resolved: #129268
Approved by: https://github.com/angelayi
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
…29268)

Summary: Similar to pytorch#129135, use DeferredCudaGridLine to create a deferred grid computation line when generating cpp wrapper.

Differential Revision: [D61800622](https://our.internmc.facebook.com/intern/diff/D61800622)
Pull Request resolved: pytorch#129268
Approved by: https://github.com/angelayi
@github-actions github-actions bot deleted the gh/desertfire/415/head branch September 29, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/linux-aarch64 linux aarch64 CI workflow ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants