[CUDA] at::native::countRadixUsingMask
misuses __activemask
intrinsic
#98157
Labels
module: cuda
Related to torch.cuda, and CUDA support in general
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
Within
at::native::countRadixUsingMask
, the__activemask
intrinsic is used to conduct a warp vote (__ballot_sync(__activemask(), vote)
) between the threads counting the distribution of the radix within the input data.Since at least CUDA 9,
__activemask
should not be used to determine which threads are along the same execution path as detailed in this NVIDIA blog post. Since there is not a guarantee all threads within the same execution path are in the same active thread group, the distribution count results can be off resulting in an wrong assumption of unique results inat::native::findPattern
which leads to a data race. Can affecttopk
,kthvalue
, andmedian
operators since all three rely onat::native::radixSelect
.The issue is hard to reproduce since even though CUDA does not guarantee threads in the same execution path operate within the same active thread group, in practice, this divergence is rare.
Example crash when using
topk
Versions
Affects all versions since at least #17544
cc @ngimel
The text was updated successfully, but these errors were encountered: