Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectness in Flash Attention #1

Closed
mayank31398 opened this issue Aug 23, 2023 · 3 comments
Closed

Incorrectness in Flash Attention #1

mayank31398 opened this issue Aug 23, 2023 · 3 comments

Comments

@mayank31398
Copy link

mayank31398 commented Aug 23, 2023

We completely ignore attention_mask here: https://github.com/pacman100/DHS-LLM-Workshop/blob/53672e1b774da7798fb10a50ef8ca5b2750c5608/personal_copilot/training/starcoder_flash_attn_monkey_patch.py#L60

If the input had padding, then this is incorrect (not by a major amount I think but that might depend on how much padding the input has).
We need to maintain cu_seqlens and use the packed version of flash here.
But the current implementementation is easier to implement and maintain I guess?

Can we add a note regarding the incorrect behaviour?

@pacman100
Copy link
Owner

Hello, I use it for continued pretraining with packing wherein there is no padding and attention mask involved. As such, it works as intended, I have mentioned this caveat here: huggingface/accelerate#1864 (comment)

Note:

Flash V2 support that I have implemented above ignores padding/attention_mask/custom_mask. It is meant for continued pre-training with packing inputs to consume the entire sequence lengths.

I will raise a PR to raise this warning. Thank you!

@pacman100
Copy link
Owner

@mayank31398
Copy link
Author

Hey, great
Sorry, didn't know you were using it with dense packing.
Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants