Update partitioner's is_fusible heuristic to respect auto_functionalized #134490

zou3519 · 2024-08-26T19:28:09Z

Stack from ghstack (oldest at bottom):

Update partitioner's is_fusible heuristic to respect triton kernels #134491
-> Update partitioner's is_fusible heuristic to respect auto_functionalized #134490
Unify lowerings for auto_functionalized and triton_kernel_wrapper_functional #134466
Add option to skip functional passes in the pattern matcher's replacement graph #134364

We say Node a is fusible into node b if node b is an auto_functionalized
node that may reinplace node a later on.

This PR also changes aten.empty to be recomputable w.r.t the Partitioner
(it is, like aten.zeros, cheap to recompute and fusible into other ops).

Fixes #134468

Test Plan:

new test

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

We say Node a is fusible into node b if node b is an auto_functionalized node that may reinplace node a later on. Fixes #134468 Test Plan: - new test [ghstack-poisoned]

pytorch-bot · 2024-08-26T19:28:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134490

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f071437 with merge base 2553278 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…functionalized" We say Node a is fusible into node b if node b is an auto_functionalized node that may reinplace node a later on. This PR also changes aten.empty to be recomputable w.r.t the Partitioner (it is, like aten.zeros, cheap to recompute and fusible into other ops). Fixes #134468 Test Plan: - new test [ghstack-poisoned]

zou3519 · 2024-08-26T19:58:12Z

torch/_inductor/pattern_matcher.py

+    if node.target is torch.ops.higher_order.auto_functionalized:
+        return False


auto_functionalized nodes can have out = blah arguments. Those are not mutable (the auto_functionalized node is always functional), so we fix that here.

This is unrelated?

Fair enough, I'll split this out into its own PR

Chillee · 2024-08-26T23:05:43Z

torch/_functorch/partitioners.py

+            arg = b.kwargs[name]
+            if a is arg:
+                return True
+            if isinstance(arg, list):


should this be a tree_map or something?

We only support TensorList args as inputs to operators, so the list is fine

Chillee · 2024-08-26T23:06:29Z

test/inductor/test_inplacing_pass.py

+            subtest(torch.empty_like, name="empty_like"),
+        ],
+    )
+    def test_partitioner_recomputes_factory(self, factory_op):


I personally like test_perf tests, since they're "property-based" (i.e. they measure the amount of bytes read and written). It's possible we don't support HOPs yet or something though.

I'll add some test_perf tests too (if possible)

Chillee · 2024-08-26T23:06:53Z

torch/_inductor/pattern_matcher.py

+    if node.target is torch.ops.higher_order.auto_functionalized:
+        return False


This is unrelated?

Chillee · 2024-08-26T23:08:02Z

torch/_functorch/partitioners.py

        aten.full,
        aten.as_strided,
        aten.zeros,
+        aten.empty,


I think plausibly aten.empty should be treated even more specially than this. Since aten.empty is always free to recompute, regardless of whether or not it's fusible into a downstream op.

…functionalized" We say Node a is fusible into node b if node b is an auto_functionalized node that may reinplace node a later on. This PR also changes aten.empty to be recomputable w.r.t the Partitioner (it is, like aten.zeros, cheap to recompute and fusible into other ops). Fixes #134468 Test Plan: - new test [ghstack-poisoned]

zou3519 · 2024-08-27T12:57:45Z

@pytorchbot merge

pytorchmergebot · 2024-08-27T12:59:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…134491) mutated arguments to triton kernels are fusible into the triton kernel. Test Plan: - new test Pull Request resolved: #134491 Approved by: https://github.com/Chillee ghstack dependencies: #134364, #134466, #134490

ROCM doesn't trigger the layout optimization that makes the test case valid so we're going to skip the checks. Should fix the following (I'll close them later) - #134481 - #134519 Pull Request resolved: #134690 Approved by: https://github.com/FindHao ghstack dependencies: #134466, #134490, #134491

Fixes #134119 From user feedback, it's difficult to understand what the tests do. We clarify the docs more. Pull Request resolved: #134692 Approved by: https://github.com/albanD ghstack dependencies: #134466, #134490, #134491, #134690

Fixes #134278 Test Plan: - tested locally Pull Request resolved: #134688 Approved by: https://github.com/yushangdi ghstack dependencies: #134466, #134490, #134491, #134690, #134692

aten.empty is almost always fusible into its consumer, so we never CSE it. This fixes a bug that looks like the following: ```py @torch.library.custom_op("_reinplacing::sin_cos", mutates_args={"out_sin", "out_cos"}) def sin_cos(x: torch.Tensor, out_sin: torch.Tensor, out_cos: torch.Tensor) -> None: out_sin.copy_(x.sin()) out_cos.copy_(x.cos()) @torch.compile def f(x): out0 = torch.empty_like(x) out1 = torch.empty_like(x) sin_cos(x, out0, out1) return x.clone(), out0, out1 x = torch.randn(3, requires_grad=True) f(x) ``` - cse would de-duplicate the empty nodes - reinplacing would add an additional clone (because it can't write to both tensors at the same time) - the clone lowers into a new buffer + a copy_ kernel - the copy_ kernel is unnecessary because "empty" is special - all reinplacing needed was an additional buffer, it doesn't matter what the values are. We could attempt to fix this on the reinplacing side but this seemed better as a partitioner heuristic and the reinplacing fix is a bit more tricky (we'd need to identify that the op never reads from the empty node). Test Plan: - new test (the old number was 27, the new number is 21, so this PR helped). Pull Request resolved: #134703 Approved by: https://github.com/yf225 ghstack dependencies: #134466, #134490, #134491

…zed (pytorch#134490) We say Node a is fusible into node b if node b is an auto_functionalized node that may reinplace node a later on. This PR also changes aten.empty to be recomputable w.r.t the Partitioner (it is, like aten.zeros, cheap to recompute and fusible into other ops). Fixes pytorch#134468 Test Plan: - new test Pull Request resolved: pytorch#134490 Approved by: https://github.com/Chillee ghstack dependencies: pytorch#134364, pytorch#134466

…ytorch#134491) mutated arguments to triton kernels are fusible into the triton kernel. Test Plan: - new test Pull Request resolved: pytorch#134491 Approved by: https://github.com/Chillee ghstack dependencies: pytorch#134364, pytorch#134466, pytorch#134490

ROCM doesn't trigger the layout optimization that makes the test case valid so we're going to skip the checks. Should fix the following (I'll close them later) - pytorch#134481 - pytorch#134519 Pull Request resolved: pytorch#134690 Approved by: https://github.com/FindHao ghstack dependencies: pytorch#134466, pytorch#134490, pytorch#134491

Fixes pytorch#134119 From user feedback, it's difficult to understand what the tests do. We clarify the docs more. Pull Request resolved: pytorch#134692 Approved by: https://github.com/albanD ghstack dependencies: pytorch#134466, pytorch#134490, pytorch#134491, pytorch#134690

Fixes pytorch#134278 Test Plan: - tested locally Pull Request resolved: pytorch#134688 Approved by: https://github.com/yushangdi ghstack dependencies: pytorch#134466, pytorch#134490, pytorch#134491, pytorch#134690, pytorch#134692

aten.empty is almost always fusible into its consumer, so we never CSE it. This fixes a bug that looks like the following: ```py @torch.library.custom_op("_reinplacing::sin_cos", mutates_args={"out_sin", "out_cos"}) def sin_cos(x: torch.Tensor, out_sin: torch.Tensor, out_cos: torch.Tensor) -> None: out_sin.copy_(x.sin()) out_cos.copy_(x.cos()) @torch.compile def f(x): out0 = torch.empty_like(x) out1 = torch.empty_like(x) sin_cos(x, out0, out1) return x.clone(), out0, out1 x = torch.randn(3, requires_grad=True) f(x) ``` - cse would de-duplicate the empty nodes - reinplacing would add an additional clone (because it can't write to both tensors at the same time) - the clone lowers into a new buffer + a copy_ kernel - the copy_ kernel is unnecessary because "empty" is special - all reinplacing needed was an additional buffer, it doesn't matter what the values are. We could attempt to fix this on the reinplacing side but this seemed better as a partitioner heuristic and the reinplacing fix is a bit more tricky (we'd need to identify that the op never reads from the empty node). Test Plan: - new test (the old number was 27, the new number is 21, so this PR helped). Pull Request resolved: pytorch#134703 Approved by: https://github.com/yf225 ghstack dependencies: pytorch#134466, pytorch#134490, pytorch#134491

Update partitioner's is_fusible heuristic to respect auto_functionalized

3b28ab1

We say Node a is fusible into node b if node b is an auto_functionalized node that may reinplace node a later on. Fixes #134468 Test Plan: - new test [ghstack-poisoned]

zou3519 mentioned this pull request Aug 26, 2024

Unify lowerings for auto_functionalized and triton_kernel_wrapper_functional #134466

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Aug 26, 2024

zou3519 mentioned this pull request Aug 26, 2024

Update partitioner's is_fusible heuristic to respect triton kernels #134491

Closed

zou3519 added keep-going Don't stop on first failure, keep running tests until the end ci-no-td Do not run TD on this PR labels Aug 26, 2024

zou3519 commented Aug 26, 2024

View reviewed changes

zou3519 requested a review from Chillee August 26, 2024 23:03

Chillee approved these changes Aug 26, 2024

View reviewed changes

zou3519 added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 27, 2024

zou3519 added the release notes: inductor label Aug 27, 2024

pytorchmergebot added the merging label Aug 27, 2024

pytorchmergebot added the Merged label Aug 27, 2024

pytorchmergebot closed this in c7cbcda Aug 27, 2024

pytorchmergebot removed the merging label Aug 27, 2024

This was referenced Aug 28, 2024

Improve custom ops aliasing error message #134688

Closed

Skip test_mutable_custom_op_fixed_layout2 on ROCM #134690

Closed

Improve opcheck docs. #134692

Closed

Never CSE aten.empty in the partitioner #134703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update partitioner's is_fusible heuristic to respect auto_functionalized #134490

Update partitioner's is_fusible heuristic to respect auto_functionalized #134490

Uh oh!

zou3519 commented Aug 26, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 26, 2024 •

edited

Loading

Uh oh!

zou3519 Aug 26, 2024

Uh oh!

Chillee Aug 26, 2024

Uh oh!

zou3519 Aug 26, 2024

Uh oh!

Chillee Aug 26, 2024

Uh oh!

zou3519 Aug 26, 2024

Uh oh!

Chillee Aug 26, 2024

Uh oh!

zou3519 Aug 26, 2024 •

edited

Loading

Uh oh!

Chillee Aug 26, 2024

Uh oh!

Chillee Aug 26, 2024

Uh oh!

zou3519 commented Aug 27, 2024

Uh oh!

pytorchmergebot commented Aug 27, 2024

Uh oh!

Uh oh!

		if node.target is torch.ops.higher_order.auto_functionalized:
		return False

Update partitioner's is_fusible heuristic to respect auto_functionalized #134490

Update partitioner's is_fusible heuristic to respect auto_functionalized #134490

Uh oh!

Conversation

zou3519 commented Aug 26, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134490

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 Aug 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Aug 27, 2024

Uh oh!

pytorchmergebot commented Aug 27, 2024

Merge started

Uh oh!

Uh oh!

zou3519 commented Aug 26, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 26, 2024 •

edited

Loading

zou3519 Aug 26, 2024 •

edited

Loading