Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN Forward Attention + FP16 non-cuDNN version in /dev/cuda/ #215

Closed
wants to merge 8 commits into from

Conversation

ademeure
Copy link
Contributor

Previous Kernel 4: 1.74ms
Kernel 4 with TF32: 1.70ms
Kernel 5 (4 with BF16 I/O): 0.91ms
Kernel 6 (5 without permute, not realistic): 0.76ms
Kernel 10 (cuDNN BF16, with FP32 conversion): 0.33ms
Kernel 11 (cuDNN BF16 with direct BF16 inputs): 0.13ms

This has been a mess to get working, e.g. wasted 3+ hours until I realised that even for cuBLASLt with explicit type parameters for them, alpha/beta need to be FP16 with CUBLAS_COMPUTE_16F and need to be FP32 with _32F, but there are zero warnings if you get it wrong, it just returns garbage :(

I still haven't managed to get the cuDNN backward pass to give the correct results, means I can't integrate the forward pass as an option for the full training pass in train_gpt2.cu because our current backwards pass requires "att" which cuDNN doesn't provide (it needs its own stats tensor instead) unfortunately.

@dagelf
Copy link
Contributor

dagelf commented Apr 22, 2024

The CI fail isn't because of your commit, it's because HF is taking punishment today 🐳
image

#217 https://status.huggingface.co/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants