New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward of Flash Attention produces incorrect results in fp32 with tl.dot(allow_tf32=False) #1821
Comments
Thanks, we are working on it. #1671 |
Is it the same problem? The other issue happens only with |
Oh I see, hmm, let's take a look at this after the other issue has been solved. |
I reproduced this problem on RTX A6000 and A100 in both pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel and nvcr.io/nvidia/pytorch:23.03-py3 containers (with latest triton: 7b30e24) |
Will be fixed with #1913 |
Repro with modified code from tutorials:
Launch:
Output:
Modifications applied to original code from tutorials:
Triton: latest main (5686c51)
GPU: NVIDIA RTX A6000 (Ampere, sm_86)
Container: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
The text was updated successfully, but these errors were encountered: