-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Wrap indirect indexing on CUDA #105055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap indirect indexing on CUDA #105055
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/105055
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 1f762df with merge base d0f8ee4 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
) | ||
|
||
new_var.update_on_args("index_wrap", (var,), {}) | ||
var = new_var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should index_wrap
be it's own operator? _unsafe_index
for example probably shouldn't generate wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either a separate operator, or a wrap=True
argument that we can selectively disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about freeride off the check
argument, and just generate this check=True
. At the moment we're using check=False
to circumvent limitations of the bound variable analysis. In all those cases, the indices are always positive so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of examples like backwards functions where you might still need wrapping, but could prove that the indices are in-bounds because they've already been checked elsewhere in the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Let's write that optimisation in another PR though. I can add the if check
in this one if you want, though, as it'd be correct with the uses we currently have of _unsafe_index
and index_put
.
Lifting this to CPU should be rather easy. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]
Lifting this to CPU should be rather easy. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]
Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]
test/inductor/test_torchinductor.py
Outdated
|
||
def flip_with_index(A): | ||
b = -torch.arange(start=-a.numel() + 1, end=-1, device="cuda") | ||
return A[b] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We optimize away the indirect_indexing
call, so this never gets wrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lezcano you don't seem to have fixed this, you just removed the test case. To fix it I think you need to either do range analysis in the IndexPropagation pass, or add wrapping to the sympy expression,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right, I misunderstood you. Now I see what you mean, I'll fix that.
torch/utils/_sympy/value_ranges.py
Outdated
return self & other | ||
|
||
# Intersection | ||
def __and__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type for these functions should always be "ValueRange", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, similar to dict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to explicitly typehint that. :)
Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 [ghstack-poisoned]
@peterbell10 rehasehed the tests, and now we test that we are actually generating warping when we claim we do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifting this to CPU should be rather easy. @jgong5
Partially fixes #97365. I'd wait to close that issue once this works on CPU as well.
@blzheng is working on fixing a similar issue #102064 with PR #102602. But I guess the fix in this PR can address that problem as well and is more complete. Probably @blzheng can provide a follow-up PR to cover CPP backend.
torch/_inductor/codegen/triton.py
Outdated
if var.bounds != ValueRanges.unknown(): | ||
# Take the negative part of the bound and add size to it | ||
# Then take union of that and the positive part | ||
# This is a tighter bound than that of a generic ops.where, as we have info on the conde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: conde -> code ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cond* (as in the condition from the where
) but yep :D
torch/_inductor/codegen/triton.py
Outdated
new_var = self.cse.generate( | ||
self.compute, | ||
f"tl.where({var} < 0, {var} + {str_size}, {var})", | ||
bounds=new_bounds, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you commented, better to move this to common.py
so that it can be applied to both triton and cpp backends? I guess if this is changed to the ops
calls, it becomes general? We can submit a follow-up PR for that if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did it this way, because that way I could call update_on_args
in order to get the masks updated appropriately (note that the logic for when to assign the masks to var
on update_on_args
is less than good). If that is taken care of, then yes, calling ops.
here would allow for this code to be generic.
Yep, I just started this PR because it was reasonably straightforward and #102602 hadn't been updated in the last month. |
Lifting this to CPU should be rather easy. jgong5 Partially fixes #97365. I'd wait to close that issue once this works on CPU as well. This fix works with dynamic shapes as well. voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Ok, moved from failed to run to not accurate. I will investigate the not accurate part as well. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 14 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud |
@pytorchbot drci |
drci seems broken, but there's just one unrelated test failing, so I'll merge with -i |
@pytorchbot merge -i |
The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot. |
@pytorchbot merge -i |
This addresses a confusing bug on HUD and Dr.CI where a bunch of unrelated cancelled signals showing up, forcing people to force merge. For example, * Dr.CI pytorch/pytorch#107339 * HUD https://hud.pytorch.org/pr/107339 * Dr.CI pytorch/pytorch#105055 * HUD https://hud.pytorch.org/pr/105055 They are cancelled signals from the previous workflow run that had been retried successfully. The cancelled signal were there because the names are different, i.e. `manywheel-py3_10-cuda11_8-test (cancel)` became `manywheel-py3_10-cuda11_8-test / test (success)` after retrying The fix I have here is to use a trie search to check if a cancelled job has been retried successfully and remove it from the list accordingly. ### Testing * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/107339 * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/105055 * **Dr.CI #107339** <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/107339](https://hud.pytorch.org/pr/107339) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ❌ 1 New Failure, 6 Unrelated Failures As of commit 59b327cdb6891318111fe98c0dbc72c7da0e7b95 with merge base 5531a23b204b4daa2c0bb3c52610e9a0ba79dacf (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1694521453?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details open><summary><b>NEW FAILURE</b> - The following job has failed:</summary><p> * [wheel-py3_10-cpu-test](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727706927) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427856/job/16727706927)) </p></details> <details ><summary><b>FLAKY</b> - The following job failed but was likely due to flakiness present on trunk:</summary><p> * [macos-12-py3-x86-64 / test (default, 1, 4, macos-12)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727674082) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727674082)) </p></details> <details ><summary><b>UNSTABLE</b> - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:</summary><p> * [linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727757788) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727757788)) * [linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273623) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273623)) * [linux-focal-py3.11-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273805) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273805)) * [linux-focal-py3.8-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267602) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267602)) * [linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267789) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267789)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end --> * **Dr.CI #105055** <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/105055](https://hud.pytorch.org/pr/105055) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ✅ You can merge normally! (2 Unrelated Failures) As of commit 1f762dfc92b46323950c3e6e95d99d9687741451 with merge base d0f8ee45bdd3d68895dfecf38b39c363ebf82483 (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1692769911?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details ><summary><b>FLAKY</b> - The following job failed but was likely due to flakiness present on trunk:</summary><p> * [cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135412390) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203435/job/16135412390)) </p></details> <details ><summary><b>UNSTABLE</b> - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:</summary><p> * [linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135257231) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203290/job/16135257231)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end -->
… in CPU" **Summary** Fix the issue: #109019 of negative value used in tensor index. This implementation refers to #105055 **Test Plan** ``` python -m pytest test_torchinductor.py -k test_negative_index ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
… in CPU" **Summary** Fix the issue: #109019 of negative value used in tensor index. This implementation refers to #105055 **Test Plan** ``` python -m pytest test_torchinductor.py -k test_negative_index ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
… in CPU" **Summary** Fix the issue: #109019 of negative value used in tensor index. This implementation refers to #105055 **Test Plan** ``` python -m pytest test_torchinductor.py -k test_negative_index ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
… in CPU" **Summary** Fix the issue: #109019 of negative value used in tensor index. This implementation refers to #105055 **Test Plan** ``` python -m pytest test_torchinductor.py -k test_negative_index ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
Lifting this to CPU should be rather easy. @jgong5
Partially fixes #97365. I'd wait to close that issue once this works on CPU as well.
This fix works with dynamic shapes as well.
@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305