[Inductor] add contiguous layout optm for bmm input #122599

Valentine233 · 2024-03-25T09:24:53Z

Add contiguous layout optimization for bmm input, to avoid additional copies.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-03-25T09:24:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122599

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 13c5c18 with merge base 03a05e7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5 · 2024-03-29T13:02:24Z

torch/_inductor/kernel/bmm.py

+        if (
+            isinstance(t.data, ir.View)
+            and isinstance(t.data.data, ir.PermuteView)
+            and t.data.data.dims == [0, 3, 1, 2]
+        ):
+            t = ir.Pointwise.create(
+                device=t.get_device(),
+                dtype=t.get_dtype(),
+                inner_fn=t.make_loader(),
+                ranges=t.get_size(),
+                origin_node=t.get_origin_node(),
+                traceback=t.get_traceback(),
+            )
+            t.realize()
+            t.freeze_layout()


Several issues with this change:

It only handles a specific case where the tensor is in 4D and permuted with a particular order. Can we make it general? Basically, we want a particular order of the last two dims?

Related to 1, bmm can actually handle non-contiguous cases and also transposed cases for the last two dims. It only requires one of the dims to be contiguous while the other can have a stride larger than the size of the former dim. Is it too strict to always force contiguous here?

Perhaps we can call require_stride_order instead of implementing the copy_input again here?

Thanks for your suggestions!

A new function called is_pointwise_contiguous_or_transposed_after_perm is added to deduce the layout of a pointwise from its readers, because it has not been realized yet.

torch/_inductor/ir.py

Valentine233 · 2024-04-09T02:39:18Z

@leslie-fang-intel @jgong5 Hi, I did some code changes. Please review again :)

jgong5 · 2024-04-10T13:54:32Z

test/inductor/test_torchinductor.py

+        self.assertEqual(out_expected, out_actual)
+        self.assertEqual(out_expected.stride(), out_actual.stride())


For this particular case, does the test pass with or without the PR change?

Yes, this is just to check the accuracy with the PR. I would remove it.

jgong5 · 2024-04-10T14:03:25Z

torch/_inductor/kernel/bmm.py

+        return t
+
+    if all(x.get_device().type == "cpu" for x in [mat1, mat2]):
+        if not ir.is_storage_and_layout(mat1):


How about mat1 or mat2 has flexible layout? Shall we also apply the similar logic here?

Thanks and modified!

jgong5 · 2024-04-10T14:04:36Z

torch/_inductor/kernel/bmm.py

+    # Make the inputs of bmm contiguous
+    # because bmm cpu implementation does contiguous() if not
+    # this is to avoid additional copies in bmm


Is this comment accurate? If the input is transposed in the last two dims, would bmm still make the inputs contiguous?

Thanks and modified!

jgong5 · 2024-04-10T14:08:05Z

torch/_inductor/ir.py

    )


+def is_contiguous_or_transposed(sizes, strides):


Transposition can happen in any pair of dims, not necessarily the last two dims. Also you are checking stride >= size not stride == size. The function name doesn't seem to match the implementation here. Perhaps, it is clearer we just inline the implementation in the tuned_bmm code without factoring out a util function here.

Thanks and modified!

jgong5 · 2024-04-10T14:09:57Z

torch/_inductor/kernel/bmm.py

+    # Make the inputs of bmm contiguous
+    # because bmm cpu implementation does contiguous() if not
+    # this is to avoid additional copies in bmm
+    def do_bmm_input_contiguous(t, meta_t):


How about may_require_contiguous?

Thanks and applied!

jgong5 · 2024-04-12T12:16:07Z

torch/_inductor/kernel/bmm.py

+            if not ir.is_storage_and_layout(t):
+                return True
+            _, layout = ir.as_storage_and_layout(t, freeze=False)
+            return not isinstance(layout, ir.FixedLayout)


why not checking ir.FlexiableLyout directly?

Modified to ir.FlexiableLyout.

Valentine233 · 2024-04-17T01:09:52Z

@eellison Hi, please help review the PR. Thanks!

Valentine233 · 2024-04-17T02:10:18Z

@pytorchbot merge

pytorchmergebot · 2024-04-17T02:12:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#117743. Add contiguous layout optimization for `bmm` input, to avoid additional copies. Pull Request resolved: pytorch#122599 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison

Fixes #117743. Add contiguous layout optimization for `bmm` input, to avoid additional copies. Pull Request resolved: #122599 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison

pytorch-bot bot added ciflow/inductor module: inductor labels Mar 25, 2024

Valentine233 added the topic: not user facing topic category label Mar 25, 2024

pytorchbot added the open source label Mar 25, 2024

Valentine233 marked this pull request as draft March 25, 2024 11:09

Valentine233 force-pushed the contiguous_node branch from f222739 to 466338b Compare March 28, 2024 03:08

Valentine233 marked this pull request as ready for review March 29, 2024 05:58

Valentine233 requested review from jgong5 and leslie-fang-intel March 29, 2024 05:58

jgong5 requested changes Mar 29, 2024

View reviewed changes

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 1, 2024

Valentine233 force-pushed the contiguous_node branch from 6300552 to 9e4b415 Compare April 2, 2024 10:23

Valentine233 requested a review from jgong5 April 2, 2024 10:29

leslie-fang-intel reviewed Apr 3, 2024

View reviewed changes

torch/_inductor/ir.py Outdated Show resolved Hide resolved

Valentine233 requested a review from leslie-fang-intel April 3, 2024 05:36

Valentine233 force-pushed the contiguous_node branch from 8d8509e to c3fd623 Compare April 8, 2024 11:29

Valentine233 force-pushed the contiguous_node branch from e853e82 to 758f0b3 Compare April 10, 2024 10:08

Valentine233 added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 10, 2024

jgong5 requested changes Apr 10, 2024

View reviewed changes

Valentine233 requested a review from jgong5 April 12, 2024 02:10

jgong5 reviewed Apr 12, 2024

View reviewed changes

Valentine233 added 5 commits April 14, 2024 18:56

[Inductor] add contiguous layout optm for bmm input

6f878c0

Update add contiguous layout optm for bmm input

34a796e

add UT

d1c1c75

fix ci issue

acb41b8

change layout from graph to bmm lowering

ba952dd

Valentine233 added 11 commits April 14, 2024 18:56

update contiguous node

e4cb4e9

update contiguous node

19c093d

update contiguous node

f2003be

update contiguous node

f97ad2f

update contiguous node

6800db7

update contiguous node

9323a66

update contiguous node

547f7aa

update contiguous node

7e35730

update contiguous node

7fca37f

update contiguous node

6bf3e02

update contiguous node

13c5c18

Valentine233 force-pushed the contiguous_node branch from bbd71a7 to 13c5c18 Compare April 15, 2024 01:56

Valentine233 requested a review from jgong5 April 15, 2024 01:58

jgong5 approved these changes Apr 15, 2024

View reviewed changes

jgong5 requested a review from eellison April 15, 2024 03:11

leslie-fang-intel approved these changes Apr 16, 2024

View reviewed changes

eellison approved these changes Apr 17, 2024

View reviewed changes

pytorchmergebot added the merging label Apr 17, 2024

pytorchmergebot closed this in 6e4c4e9 Apr 17, 2024

pytorchmergebot added Merged and removed merging labels Apr 17, 2024

Valentine233 mentioned this pull request Apr 23, 2024

[INDUCTOR] [CPU] [GPT-FAST-MOE] large perf regression with coordinate_descent_tuning disabled #124697

Open

github-actions bot deleted the contiguous_node branch May 31, 2024 01:55

		self.assertEqual(out_expected, out_actual)
		self.assertEqual(out_expected.stride(), out_actual.stride())

[Inductor] add contiguous layout optm for bmm input #122599

[Inductor] add contiguous layout optm for bmm input #122599

Uh oh!

Conversation

Valentine233 commented Mar 25, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122599

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Valentine233 commented Apr 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Valentine233 commented Apr 17, 2024

Uh oh!

Valentine233 commented Apr 17, 2024

Uh oh!

pytorchmergebot commented Apr 17, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Valentine233 commented Mar 25, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 25, 2024 •

edited

Loading