[inductor] Bypass FX graph cache when we have HigherOrderOperators #123325

masnesral · 2024-04-04T03:20:02Z

Stack from ghstack (oldest at bottom):

-> [inductor] Bypass FX graph cache when we have HigherOrderOperators #123325

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary.

Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton [ghstack-poisoned]

pytorch-bot · 2024-04-04T03:20:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123325

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f4dde59 with merge base 19f5033 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…rators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…perators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton ghstack-source-id: 8496a9d Pull Request resolved: #123325

…perators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton ghstack-source-id: 5b8bc4c Pull Request resolved: #123325

…perators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton ghstack-source-id: 2018983 Pull Request resolved: #123325

oulgen · 2024-04-08T20:16:47Z

test/inductor/test_codecache.py

+        """
+
+        @triton.jit
+        def add_kernel(


no need to redefine this, import it from torch/testing/_internal/triton_utils.py

oulgen · 2024-04-08T20:17:42Z

torch/_inductor/codecache.py

+        # HigherOrderOperators should be handled on a case-by-case basis.
+        # Currently, we just skip caching if we have any.


should we be skipping all of them by default? i assume cond/map etc should just work out of the box

This was @eellison's suggestion. Elias, what do you think? What is the set of higher order ops? Is it currently just triton + cond/map, or are there others as well? And if there are others, is there a criteria to distinguish which we expect to work vs. not?

A subset of all higher order ops: https://fburl.com/biggrep/h06kfrgg

@oulgen Ah thanks. So there are several. So do you think I should allowlist cond/map here specifically, i.e., because they're important? Or were those just two semi-random examples and you're advocating to allow more?

I suspect they are more popular HOPs but perhaps @zou3519 would have a more definitive answer on what is safe here

If I'm understanding correctly, we cache compiled FX graph from inductor. This PR is saying that a graph that contains HOPs cannot be cached by default.

If so, that sounds fine to me -- caching for things like torch.cond get a bit complicated because you also want to walk through the subgraphs. cc @aakhundov

@zou3519 so Sam added a cache from FX graph to inductor generated python file. This does not work for triton HOPs because the idx on the dynamo side table needs to match which is tricky. But for torch.cond etc everything is on the graph, so I think that should be fine to cache.

Is the cache from FX GraphModule to inductor generated python file? (And do we recurse into the submodules of the FX GraphModule? That's what's needed to get other HOPS like cond to work)

@zou3519

Is the cache from FX GraphModule to inductor generated python file?

Exactly.

This PR is saying that a graph that contains HOPs cannot be cached by default.

Correct. In other words, if we find find any HOPs, then bypass the cache altogether and compile normally.

And do we recurse into the submodules of the FX GraphModule?

Hmmm, I guess I still don't know enough about inductor compilation to answer that. Maybe @eellison can help give a definite answer. Is the question about how we compute the key to look up entries or the makeup of the cache entries themselves? I don't know if a code pointer will help, but this is where the cache plugs in: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/compile_fx.py#L464-L471. So at a high level, we compute the cache key from the FX GraphModule (the string representation of the code actually) and the inputs, and then we cache the output of fx_codegen_and_compile()

torch.cond and torch.while could be supported, but they dont work "out of the box". we'd need to recurse into the subgraphs. maybe we can do that automatically, but it does not work as is today. I think it's fine to add safety in this pr and follow up and add support.

It could also make more sense for folks who are adding the HOP to add the cache key then sam - they will know better about the details of what needs to be cached.

…gherOrderOperators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…rators" Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton ghstack-source-id: 4fdfc37 Pull Request resolved: #123325

masnesral · 2024-04-15T23:48:02Z

@pytorchbot merge

pytorchmergebot · 2024-04-15T23:49:55Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

masnesral · 2024-04-16T00:21:14Z

@pytorchbot merge

pytorchmergebot · 2024-04-16T00:22:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#123325) Summary: The initial motivation was to avoid caching when we have triton higher order ops, but it's probably safer to avoid the cache for all higher order ops and allow/implement if/when we find it necessary. Test Plan: Unit test cribbed from: https://docs-preview.pytorch.org/pytorch/tutorials/2783/recipes/torch_compile_user_defined_triton_kernel_tutorial.html?highlight=triton Pull Request resolved: pytorch#123325 Approved by: https://github.com/eellison

pytorch-bot bot added ciflow/inductor module: inductor labels Apr 4, 2024

masnesral changed the title ~~[inductor] Bypas FX graph cache when we have HigherOrderOperators~~ [inductor] Bypass FX graph cache when we have HigherOrderOperators Apr 4, 2024

masnesral requested review from eellison and oulgen April 8, 2024 20:10

masnesral marked this pull request as ready for review April 8, 2024 20:10

oulgen reviewed Apr 8, 2024

View reviewed changes

eellison approved these changes Apr 15, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 15, 2024

pytorchmergebot added the merging label Apr 15, 2024

pytorchmergebot removed the merging label Apr 15, 2024

masnesral added the topic: not user facing topic category label Apr 16, 2024

pytorchmergebot added the merging label Apr 16, 2024

pytorchmergebot added the Merged label Apr 16, 2024

pytorchmergebot closed this in 6babf00 Apr 16, 2024

pytorchmergebot removed the merging label Apr 16, 2024

eellison mentioned this pull request Apr 17, 2024

[torchbind] Add inductor support #123709

Closed

github-actions bot deleted the gh/masnesral/33/head branch May 31, 2024 01:56

		# HigherOrderOperators should be handled on a case-by-case basis.
		# Currently, we just skip caching if we have any.

[inductor] Bypass FX graph cache when we have HigherOrderOperators #123325

[inductor] Bypass FX graph cache when we have HigherOrderOperators #123325

Uh oh!

Conversation

masnesral commented Apr 4, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/123325

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

masnesral commented Apr 15, 2024

Uh oh!

pytorchmergebot commented Apr 15, 2024

Merge failed

Uh oh!

masnesral commented Apr 16, 2024

Uh oh!

pytorchmergebot commented Apr 16, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

masnesral commented Apr 4, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 4, 2024 •

edited

Loading