-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Unconvertible NCCL type Short when sending torch.cuda.ShortTensor. #74734
Comments
I can repo this issue and looks like short tensor is not supported. Not sure if this is the expected behavior. cc: @cbalioglu |
New feature request is here: #74528 |
cc: @kwen2501, do you mind checking with NCCL folks on this. From today's oncall triage meeting, looks like NCCL does not support 16 bit? Correct me if I am missing anything. Thanks! |
@timmywanttolearn Just curious -- is there a specific use case that asks for 16-integer support? Cc @sjeaugey for visibility. |
Sure, I use uniform quantization. And I need to send some 16bits int tensors |
|
Indeed NCCL does not support 16-bit integers at the moment, but if the goal is to do send/recv, then there is no real need to wait for specific support. PyTorch can simply implement it using uint8 and doubling the count. We do not implement type-specific NCCL kernels except for reductions. They all map to int8 in the end, simply multiplying the size by the datatype size. |
I got it. Thank you. |
🐛 Describe the bug
The bug happens when I try to use dist.send to send torch.cuda.ShortTensor.
The code is
The error is
Versions
python --version 3.8
pytorch --version 1.12.0(pytorch nightly)
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang
The text was updated successfully, but these errors were encountered: