-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrectness in Flash Attention #1
Comments
Hello, I use it for continued pretraining with packing wherein there is no padding and attention mask involved. As such, it works as intended, I have mentioned this caveat here: huggingface/accelerate#1864 (comment)
I will raise a PR to raise this warning. Thank you! |
Hey, great |
We completely ignore
attention_mask
here: https://github.com/pacman100/DHS-LLM-Workshop/blob/53672e1b774da7798fb10a50ef8ca5b2750c5608/personal_copilot/training/starcoder_flash_attn_monkey_patch.py#L60If the input had padding, then this is incorrect (not by a major amount I think but that might depend on how much padding the input has).
We need to maintain
cu_seqlens
and use the packed version of flash here.But the current implementementation is easier to implement and maintain I guess?
Can we add a note regarding the
incorrect
behaviour?The text was updated successfully, but these errors were encountered: