Wrap indirect indexing on CUDA #105055

lezcano · 2023-07-12T12:07:15Z

Stack from ghstack (oldest at bottom):

-> Wrap indirect indexing on CUDA #105055

Lifting this to CPU should be rather easy. @jgong5
Partially fixes #97365. I'd wait to close that issue once this works on CPU as well.

This fix works with dynamic shapes as well.

@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

@jgong5

Lifting this to CPU should be rather easy. cc @jgong5 Partially fixes #97365 [ghstack-poisoned]

pytorch-bot · 2023-07-12T12:07:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105055

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 1f762df with merge base d0f8ee4 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

trunk / linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

peterbell10 · 2023-07-12T13:09:44Z

torch/_inductor/codegen/triton.py

+            )
+
+            new_var.update_on_args("index_wrap", (var,), {})
+            var = new_var


Should index_wrap be it's own operator? _unsafe_index for example probably shouldn't generate wrapping.

Either a separate operator, or a wrap=True argument that we can selectively disable.

what about freeride off the check argument, and just generate this check=True. At the moment we're using check=False to circumvent limitations of the bound variable analysis. In all those cases, the indices are always positive so...

I can think of examples like backwards functions where you might still need wrapping, but could prove that the indices are in-bounds because they've already been checked elsewhere in the program.

Fair enough. Let's write that optimisation in another PR though. I can add the if check in this one if you want, though, as it'd be correct with the uses we currently have of _unsafe_index and index_put.

Lifting this to CPU should be rather easy. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]

Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 866298a Pull Request resolved: #105055

test/inductor/test_torchinductor.py

peterbell10 · 2023-07-12T17:23:22Z

test/inductor/test_torchinductor.py

+
+            def flip_with_index(A):
+                b = -torch.arange(start=-a.numel() + 1, end=-1, device="cuda")
+                return A[b]


We optimize away the indirect_indexing call, so this never gets wrapped.

@lezcano you don't seem to have fixed this, you just removed the test case. To fix it I think you need to either do range analysis in the IndexPropagation pass, or add wrapping to the sympy expression,

ah, right, I misunderstood you. Now I see what you mean, I'll fix that.

Skylion007 · 2023-07-12T17:30:04Z

torch/utils/_sympy/value_ranges.py

+        return self & other
+
+    # Intersection
+    def __and__(self, other):


Return type for these functions should always be "ValueRange", right?

yep, similar to dict.

Might want to explicitly typehint that. :)

Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 141cefb Pull Request resolved: #105055

lezcano · 2023-07-12T21:19:37Z

@peterbell10 rehasehed the tests, and now we test that we are actually generating warping when we claim we do.

jgong5

Lifting this to CPU should be rather easy. @jgong5
Partially fixes #97365. I'd wait to close that issue once this works on CPU as well.

@blzheng is working on fixing a similar issue #102064 with PR #102602. But I guess the fix in this PR can address that problem as well and is more complete. Probably @blzheng can provide a follow-up PR to cover CPP backend.

jgong5 · 2023-07-12T23:47:56Z

torch/_inductor/codegen/triton.py

+            if var.bounds != ValueRanges.unknown():
+                # Take the negative part of the bound and add size to it
+                # Then take union of that and the positive part
+                # This is a tighter bound than that of a generic ops.where, as we have info on the conde


nit: conde -> code ;-)

cond* (as in the condition from the where) but yep :D

jgong5 · 2023-07-12T23:49:30Z

torch/_inductor/codegen/triton.py

+            new_var = self.cse.generate(
+                self.compute,
+                f"tl.where({var} < 0, {var} + {str_size}, {var})",
+                bounds=new_bounds,
+            )


As you commented, better to move this to common.py so that it can be applied to both triton and cpp backends? I guess if this is changed to the ops calls, it becomes general? We can submit a follow-up PR for that if you prefer.

I just did it this way, because that way I could call update_on_args in order to get the masks updated appropriately (note that the logic for when to assign the masks to var on update_on_args is less than good). If that is taken care of, then yes, calling ops. here would allow for this code to be generic.

lezcano · 2023-07-13T07:56:26Z

Yep, I just started this PR because it was reasonably straightforward and #102602 hadn't been updated in the last month.

Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. This fix works with dynamic shapes as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 7b20036 Pull Request resolved: #105055

lezcano · 2023-08-23T08:57:50Z

Ok, moved from failed to run to not accurate. I will investigate the not accurate part as well.

lezcano · 2023-08-23T08:57:58Z

@pytorchbot merge

pytorchmergebot · 2023-08-23T09:00:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-08-23T09:01:04Z

Merge failed

Reason: 14 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

lezcano · 2023-08-23T11:53:01Z

@pytorchbot drci

lezcano · 2023-08-23T11:55:53Z

drci seems broken, but there's just one unrelated test failing, so I'll merge with -i

lezcano · 2023-08-23T11:55:59Z

@pytorchbot merge -i

pytorchmergebot · 2023-08-23T11:56:19Z

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

lezcano · 2023-08-23T11:56:50Z

@pytorchbot merge -i

pytorchmergebot · 2023-08-23T11:59:15Z

Merge started

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This addresses a confusing bug on HUD and Dr.CI where a bunch of unrelated cancelled signals showing up, forcing people to force merge. For example, * Dr.CI pytorch/pytorch#107339 * HUD https://hud.pytorch.org/pr/107339 * Dr.CI pytorch/pytorch#105055 * HUD https://hud.pytorch.org/pr/105055 They are cancelled signals from the previous workflow run that had been retried successfully. The cancelled signal were there because the names are different, i.e. `manywheel-py3_10-cuda11_8-test (cancel)` became `manywheel-py3_10-cuda11_8-test / test (success)` after retrying The fix I have here is to use a trie search to check if a cancelled job has been retried successfully and remove it from the list accordingly. ### Testing * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/107339 * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/105055 * **Dr.CI #107339**  ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/107339](https://hud.pytorch.org/pr/107339) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ❌ 1 New Failure, 6 Unrelated Failures As of commit 59b327cdb6891318111fe98c0dbc72c7da0e7b95 with merge base 5531a23b204b4daa2c0bb3c52610e9a0ba79dacf (<img alt="image" width=70 src="https://img.shields.io/date/1694521453?label=&color=FFFFFF&style=flat-square">): <details open><summary>NEW FAILURE - The following job has failed:</summary> * [wheel-py3_10-cpu-test](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727706927) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427856/job/16727706927)) </details> <details ><summary>FLAKY - The following job failed but was likely due to flakiness present on trunk:</summary> * [macos-12-py3-x86-64 / test (default, 1, 4, macos-12)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727674082) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727674082)) </details> <details ><summary>UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:</summary> * [linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727757788) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727757788)) * [linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273623) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273623)) * [linux-focal-py3.11-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273805) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273805)) * [linux-focal-py3.8-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267602) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267602)) * [linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267789) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267789)) </details> This comment was automatically generated by Dr. CI and updates every 15 minutes.  * **Dr.CI #105055**  ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/105055](https://hud.pytorch.org/pr/105055) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ✅ You can merge normally! (2 Unrelated Failures) As of commit 1f762dfc92b46323950c3e6e95d99d9687741451 with merge base d0f8ee45bdd3d68895dfecf38b39c363ebf82483 (<img alt="image" width=70 src="https://img.shields.io/date/1692769911?label=&color=FFFFFF&style=flat-square">): <details ><summary>FLAKY - The following job failed but was likely due to flakiness present on trunk:</summary> * [cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135412390) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203435/job/16135412390)) </details> <details ><summary>UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:</summary> * [linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135257231) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203290/job/16135257231)) </details> This comment was automatically generated by Dr. CI and updates every 15 minutes.

… in CPU" **Summary** Fix the issue: #109019 of negative value used in tensor index. This implementation refers to #105055 **Test Plan** ``` python -m pytest test_torchinductor.py -k test_negative_index ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Wrap indirect indexing on CUDA

5e3ec01

Lifting this to CPU should be rather easy. cc @jgong5 Partially fixes #97365 [ghstack-poisoned]

lezcano mentioned this pull request Jul 12, 2023

Fix bug when an index appears in two expressions #104886

Closed

This was referenced Jul 12, 2023

Remove unnecessary casting in triton #104975

Closed

Fix ranges for range vars #104987

Closed

ValueRange analysis for indirect indexing #102611

Closed

github-actions bot added module: inductor ciflow/inductor labels Jul 12, 2023

pytorchbot added the open source label Jul 12, 2023

lezcano requested review from jansel and peterbell10 July 12, 2023 12:18

lezcano added the release notes: inductor label Jul 12, 2023

peterbell10 reviewed Jul 12, 2023

View reviewed changes

lezcano requested a review from peterbell10 July 12, 2023 16:30

lezcano added a commit that referenced this pull request Jul 12, 2023

Wrap indirect indexing on CUDA

7cdea60

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 866298a Pull Request resolved: #105055

peterbell10 requested changes Jul 12, 2023

View reviewed changes

test/inductor/test_torchinductor.py Outdated Show resolved Hide resolved

peterbell10 reviewed Jul 12, 2023

View reviewed changes

Skylion007 reviewed Jul 12, 2023

View reviewed changes

lezcano added a commit that referenced this pull request Jul 12, 2023

Wrap indirect indexing on CUDA

7275fb6

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 141cefb Pull Request resolved: #105055

lezcano requested a review from peterbell10 July 12, 2023 21:19

jgong5 reviewed Jul 12, 2023

View reviewed changes

lezcano reopened this Aug 22, 2023

lezcano added a commit that referenced this pull request Aug 23, 2023

Wrap indirect indexing on CUDA

35f76bd

Lifting this to CPU should be rather easy. cc jgong5 Partially fixes #97365 ghstack-source-id: 7b20036 Pull Request resolved: #105055

github-actions bot added the module: dynamo label Aug 23, 2023

pytorchmergebot added the merging label Aug 23, 2023

pytorchmergebot added merging and removed merging labels Aug 23, 2023

pytorch deleted a comment from pytorchmergebot Aug 23, 2023

pytorchmergebot removed the merging label Aug 23, 2023

pytorchmergebot closed this in 2b6249e Aug 23, 2023

facebook-github-bot deleted the gh/Lezcano/211/head branch August 26, 2023 14:16

huydhn mentioned this pull request Sep 5, 2023

Confusing cancelled failures on HUD and PR during GitHub outage pytorch/test-infra#4552

Open

huydhn mentioned this pull request Sep 16, 2023

Remove invalid cancelled signals after retrying pytorch/test-infra#4579

Merged

leslie-fang-intel mentioned this pull request Oct 12, 2023

[Inductor] add negative index handle for indirect indexing in CPU #111118

Closed

Wrap indirect indexing on CUDA #105055

Wrap indirect indexing on CUDA #105055

Uh oh!

Conversation

lezcano commented Jul 12, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105055

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano commented Jul 12, 2023

Uh oh!

jgong5 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lezcano commented Jul 13, 2023

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

pytorchmergebot commented Aug 23, 2023

Merge started

Uh oh!

pytorchmergebot commented Aug 23, 2023

Merge failed

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

pytorchmergebot commented Aug 23, 2023

Uh oh!

lezcano commented Aug 23, 2023

Uh oh!

pytorchmergebot commented Aug 23, 2023

Merge started

Uh oh!

Uh oh!

lezcano commented Jul 12, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 12, 2023 •

edited

Loading