[inductor] [cpp] fix the input contiguous check in max-autotune #134982

chunyuan-w · 2024-09-03T05:33:17Z

Stack from ghstack (oldest at bottom):

-> [inductor] [cpp] fix the input contiguous check in max-autotune #134982

Description

Fixes the FP32 accuracy failure of resmlp_12_224 and BF16 accuracy failure of volo_d1_224 in timm.

In this PR, we check whether input is contiguous using the following way:
If it has FixedLayout, we know the accurate strides. For FlexibleLayout, if its data is a ComputedBuffer, we could get the fill order of the buffer to decide whether it's contiguous. For the other cases, we won't use GEMM template as we can't infer whether it's contiguous.

Additional context

The current GEMM template only supports this case: input.get_stride()[-1] == 1. In resmlp_12_224, when we run into this check, the layout of input is a FlexibleLayout. The reason is that when realizing the input which is a View IR, the convert_to_reinterpret_view call fails:

pytorch/torch/_inductor/ir.py

Lines 4712 to 4715 in d14fe3f

    
           try: 
        
               return cls.convert_to_reinterpret_view(x) 
        
           except NotImplementedError: 
        
               pass

And it finally runs into this copy_input and returns a FlexibleLayout.

pytorch/torch/_inductor/ir.py

Line 4722 in d14fe3f

return cls.copy_input(x)

When checking its stride, this FlexibleLayout indeed satisfies input.get_stride()[-1] == 1 but it is later decided as a FixedLayout with size = (3072, 196), stride = (1, 3072), which is not supported by the GEMM template, thus causing accuracy issue in this model.
The FlexibleLayout is converted to FixedLayout during CppPackedGemmTemplate.add_choices which calls slice_nd when rendering the kernel (slice_nd(X)). When creating the SliceView IR, as_storage_and_layout invokes
decide_layout and converts it to a FixedLayout with size = (3072, 196), stride = (1, 3072).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-09-03T05:33:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134982

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8f2b3f8 with merge base 217ba7b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 83f3361 Pull Request resolved: #134982

ghstack-source-id: 5f73009 Pull Request resolved: #134982

[ghstack-poisoned]

ghstack-source-id: e1a64fa Pull Request resolved: #134982

[ghstack-poisoned]

ghstack-source-id: b53996e Pull Request resolved: #134982

[ghstack-poisoned]

ghstack-source-id: 420591d Pull Request resolved: #134982

[ghstack-poisoned]

chunyuan-w · 2024-09-08T16:04:34Z

@pytorchbot merge

pytorchmergebot · 2024-09-08T16:06:20Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jansel · 2024-09-08T16:45:42Z

torch/_inductor/utils.py

    )

+    def is_last_dim_stride1(x):
+        if isinstance(x.layout, ir.FixedLayout):


If this is failing, should we be calling freeze_layout? Flexible layouts are allowed to change, so any check you do could become false later on. If you have a FlexibleLayout you can force the last dim to be stride=1 without a copy.

Thanks for the suggestion. I've updated the code. Could you help take another look?

ghstack-source-id: 263e15c Pull Request resolved: #134982

[ghstack-poisoned]

chunyuan-w · 2024-09-10T02:40:34Z

@pytorchbot merge

pytorchmergebot · 2024-09-10T02:42:15Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rch#134982) ## Description Fixes the FP32 accuracy failure of `resmlp_12_224` and BF16 accuracy failure of `volo_d1_224` in timm. In this PR, we check whether input is contiguous using the following way: If it has `FixedLayout`, we know the accurate strides. For `FlexibleLayout`, if its data is a `ComputedBuffer`, we could get the fill order of the buffer to decide whether it's contiguous. For the other cases, we won't use GEMM template as we can't infer whether it's contiguous. ## Additional context The current GEMM template only supports this case: `input.get_stride()[-1] == 1`. In `resmlp_12_224`, when we run into this check, the layout of `input` is a `FlexibleLayout`. The reason is that when realizing the input which is a `View` IR, the `convert_to_reinterpret_view` call fails: https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L4712-L4715 And it finally runs into this `copy_input` and returns a `FlexibleLayout`. https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L4722 When checking its stride, this `FlexibleLayout` indeed satisfies `input.get_stride()[-1] == 1` but it is later decided as a `FixedLayout` with `size = (3072, 196), stride = (1, 3072)`, which is not supported by the GEMM template, thus causing accuracy issue in this model. The `FlexibleLayout` is converted to `FixedLayout` during [CppPackedGemmTemplate.add_choices](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/mkldnn_lowerings.py#L1051) which calls [slice_nd](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/codegen/cpp_template_kernel.py#L150) when rendering the kernel (`slice_nd(X)`). When creating the `SliceView` IR, [as_storage_and_layout](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L2288) invokes [decide_layout](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L2135) and converts it to a `FixedLayout` with `size = (3072, 196), stride = (1, 3072)`. Pull Request resolved: pytorch#134982 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel

) [inductor] [cpp] fix the input contiguous check in max-autotune (#134982) ## Description Fixes the FP32 accuracy failure of `resmlp_12_224` and BF16 accuracy failure of `volo_d1_224` in timm. In this PR, we check whether input is contiguous using the following way: If it has `FixedLayout`, we know the accurate strides. For `FlexibleLayout`, if its data is a `ComputedBuffer`, we could get the fill order of the buffer to decide whether it's contiguous. For the other cases, we won't use GEMM template as we can't infer whether it's contiguous. ## Additional context The current GEMM template only supports this case: `input.get_stride()[-1] == 1`. In `resmlp_12_224`, when we run into this check, the layout of `input` is a `FlexibleLayout`. The reason is that when realizing the input which is a `View` IR, the `convert_to_reinterpret_view` call fails: https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L4712-L4715 And it finally runs into this `copy_input` and returns a `FlexibleLayout`. https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L4722 When checking its stride, this `FlexibleLayout` indeed satisfies `input.get_stride()[-1] == 1` but it is later decided as a `FixedLayout` with `size = (3072, 196), stride = (1, 3072)`, which is not supported by the GEMM template, thus causing accuracy issue in this model. The `FlexibleLayout` is converted to `FixedLayout` during [CppPackedGemmTemplate.add_choices](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/mkldnn_lowerings.py#L1051) which calls [slice_nd](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/codegen/cpp_template_kernel.py#L150) when rendering the kernel (`slice_nd(X)`). When creating the `SliceView` IR, [as_storage_and_layout](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L2288) invokes [decide_layout](https://github.com/pytorch/pytorch/blob/d14fe3ffeddff743af09ce7c8d91127940ddf7ed/torch/_inductor/ir.py#L2135) and converts it to a `FixedLayout` with `size = (3072, 196), stride = (1, 3072)`. Pull Request resolved: #134982 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/jansel

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 3, 2024

pytorchbot added the open source label Sep 3, 2024

chunyuan-w added the topic: not user facing topic category label Sep 3, 2024

chunyuan-w mentioned this pull request Sep 3, 2024

[inductor] [cpp] generate reindexer for each epilogue_node #134984

Closed

chunyuan-w added 2 commits September 3, 2024 13:31

Update

8de8ee2

[ghstack-poisoned]

Update

2eb5066

[ghstack-poisoned]

chunyuan-w mentioned this pull request Sep 4, 2024

[inductor] [cpp] use_local_acc if template_buffer_has_other_users #135081

Closed

chunyuan-w added a commit that referenced this pull request Sep 4, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

a92d4e0

ghstack-source-id: 83f3361 Pull Request resolved: #134982

chunyuan-w added a commit that referenced this pull request Sep 4, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

334be08

ghstack-source-id: 5f73009 Pull Request resolved: #134982

Update

3c807e5

[ghstack-poisoned]

chunyuan-w marked this pull request as draft September 4, 2024 14:02

Update

235c7d7

[ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Sep 5, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

ebdfa02

ghstack-source-id: e1a64fa Pull Request resolved: #134982

chunyuan-w added 2 commits September 5, 2024 23:34

Update

9f46574

[ghstack-poisoned]

Update

44c7658

[ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Sep 6, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

e9bba0b

ghstack-source-id: b53996e Pull Request resolved: #134982

Update

9b7ce08

[ghstack-poisoned]

chunyuan-w changed the title ~~[inductor] [cpp] input should be FixedLayout when checking its stride~~ [inductor] [cpp] always converts input to contiguous for max-autotune Sep 6, 2024

chunyuan-w added 2 commits September 6, 2024 11:32

Update

a7bb208

[ghstack-poisoned]

Update

b92bf59

[ghstack-poisoned]

chunyuan-w added a commit that referenced this pull request Sep 6, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

0b0da34

ghstack-source-id: 420591d Pull Request resolved: #134982

Update

6263c68

[ghstack-poisoned]

chunyuan-w changed the title ~~[inductor] [cpp] always converts input to contiguous for max-autotune~~ [inductor] [cpp] fix the input contiguous check in max-autotune Sep 7, 2024

chunyuan-w added 2 commits September 7, 2024 14:31

Update

62d8bff

[ghstack-poisoned]

Update

75246f8

[ghstack-poisoned]

leslie-fang-intel approved these changes Sep 8, 2024

View reviewed changes

pytorchmergebot added the merging label Sep 8, 2024

pytorchmergebot removed the merging label Sep 8, 2024

chunyuan-w requested review from desertfire and jansel September 8, 2024 16:07

jansel requested changes Sep 8, 2024

View reviewed changes

chunyuan-w added a commit that referenced this pull request Sep 9, 2024

[inductor] [cpp] input should be FixedLayout when checking its stride

f8bc767

ghstack-source-id: 263e15c Pull Request resolved: #134982

chunyuan-w requested a review from jansel September 9, 2024 06:17

Update

8f2b3f8

[ghstack-poisoned]

jansel approved these changes Sep 10, 2024

View reviewed changes

pytorchmergebot added the merging label Sep 10, 2024

pytorchmergebot added the Merged label Sep 10, 2024

pytorchmergebot closed this in 1469210 Sep 10, 2024

pytorchmergebot removed the merging label Sep 10, 2024

chunyuan-w mentioned this pull request Sep 10, 2024

[inductor] [cpp] fix the input contiguous check in max-autotune #135561

Merged

chunyuan-w mentioned this pull request Sep 13, 2024

[v.2.5.0] Release Tracker #135522

Closed

github-actions bot deleted the gh/chunyuan-w/26/head branch October 12, 2024 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] [cpp] fix the input contiguous check in max-autotune #134982

[inductor] [cpp] fix the input contiguous check in max-autotune #134982

Uh oh!

chunyuan-w commented Sep 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading

Uh oh!

chunyuan-w commented Sep 8, 2024

Uh oh!

pytorchmergebot commented Sep 8, 2024

Uh oh!

jansel Sep 8, 2024 •

edited

Loading

Uh oh!

chunyuan-w Sep 9, 2024

Uh oh!

chunyuan-w commented Sep 10, 2024

Uh oh!

pytorchmergebot commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	try:
	return cls.convert_to_reinterpret_view(x)
	except NotImplementedError:
	pass

[inductor] [cpp] fix the input contiguous check in max-autotune #134982

[inductor] [cpp] fix the input contiguous check in max-autotune #134982

Uh oh!

Conversation

chunyuan-w commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context

Uh oh!

pytorch-bot bot commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134982

✅ No Failures

Uh oh!

chunyuan-w commented Sep 8, 2024

Uh oh!

pytorchmergebot commented Sep 8, 2024

Merge failed

Uh oh!

jansel Sep 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chunyuan-w Sep 9, 2024

Choose a reason for hiding this comment

Uh oh!

chunyuan-w commented Sep 10, 2024

Uh oh!

pytorchmergebot commented Sep 10, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chunyuan-w commented Sep 3, 2024 •

edited

Loading

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading

jansel Sep 8, 2024 •

edited

Loading