Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] at::native::countRadixUsingMask misuses __activemask intrinsic #98157

Closed
JackWolfard opened this issue Apr 2, 2023 · 0 comments
Closed
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@JackWolfard
Copy link
Contributor

JackWolfard commented Apr 2, 2023

馃悰 Describe the bug

Within at::native::countRadixUsingMask, the __activemask intrinsic is used to conduct a warp vote (__ballot_sync(__activemask(), vote)) between the threads counting the distribution of the radix within the input data.

Since at least CUDA 9, __activemask should not be used to determine which threads are along the same execution path as detailed in this NVIDIA blog post. Since there is not a guarantee all threads within the same execution path are in the same active thread group, the distribution count results can be off resulting in an wrong assumption of unique results in at::native::findPattern which leads to a data race. Can affect topk, kthvalue, and median operators since all three rely on at::native::radixSelect.

The issue is hard to reproduce since even though CUDA does not guarantee threads in the same execution path operate within the same active thread group, in practice, this divergence is rare.

Example crash when using topk

import torch
x = torch.tensor([0, 1, 2]).cuda()
x.topk(1)
aten/src/ATen/native/cuda/TensorTopK.cu:144:gatherTopK: assertion: outputSliceSize >= writeIndexStar

Versions

Affects all versions since at least #17544

cc @ngimel

JackWolfard added a commit to JackWolfard/pytorch that referenced this issue Apr 2, 2023
JackWolfard added a commit to JackWolfard/pytorch that referenced this issue Apr 2, 2023
@lezcano lezcano added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 3, 2023
JackWolfard added a commit to JackWolfard/pytorch that referenced this issue Apr 6, 2023
JackWolfard added a commit to JackWolfard/pytorch that referenced this issue Apr 6, 2023
pytorchmergebot pushed a commit to JackWolfard/pytorch that referenced this issue Apr 11, 2023
ZainRizvi pushed a commit that referenced this issue Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants