-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor] atomic_add does not support bf16 #97016
Comments
I remember that bfloat16 atomics on older versions of CUDA were horrendously slow, so perhaps that's why Triton doesn't support them. @ngimel any thoughts on this now? |
According to the CUDA programming guide there is an
Whereas triton targets 7.0+. PyTorch works around this limitation with an |
Triton has functions that work on 8.0+ only, atomic bf16 can be one of those. |
Ah okay, I could have a go at adding it to triton if you like? |
Yeah that would be great! |
I tried it out but it looks like the PTX |
FYI: I'm adding tl.atomic_add for bf16: triton-lang/triton#1689 |
+1 |
similar error:
|
I have made some progress here. Working on it at: https://github.com/plotfi/triton/commits/plotfi-atomic-add-bf16 I want to make it so that if you are working on hopper it will use the native atomic_add bf16, and use the fallback form ampere. |
What is the status of this? It seems that some triton upstream PRs were rejected? |
🐛 Describe the bug
This is may be known already, but triton does not support
atomic_add
with bf16, see https://github.com/openai/triton/blob/c9740f0870f6ae2480acd2a76a5fb4c920bc5ce5/python/triton/language/semantic.py#L904.This is not a problem in eager mode, only with
torch.compile
as it works right now, ideally this op should not be currently selected?I made a minified repro below, but there is probably an even easier way to replicate this, I'm just unsure how to exactly trigger
atomic_add
.Error logs
Minified repro
Versions
cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @soumith @ngimel @Xia-Weiwen @desertfire
The text was updated successfully, but these errors were encountered: