Incorrectness in Flash Attention #1

mayank31398 · 2023-08-23T19:11:09Z

We completely ignore attention_mask here: https://github.com/pacman100/DHS-LLM-Workshop/blob/53672e1b774da7798fb10a50ef8ca5b2750c5608/personal_copilot/training/starcoder_flash_attn_monkey_patch.py#L60

If the input had padding, then this is incorrect (not by a major amount I think but that might depend on how much padding the input has).
We need to maintain cu_seqlens and use the packed version of flash here.
But the current implementementation is easier to implement and maintain I guess?

Can we add a note regarding the incorrect behaviour?

The text was updated successfully, but these errors were encountered:

pacman100 · 2023-08-24T01:47:06Z

Hello, I use it for continued pretraining with packing wherein there is no padding and attention mask involved. As such, it works as intended, I have mentioned this caveat here: huggingface/accelerate#1864 (comment)

Note:

Flash V2 support that I have implemented above ignores padding/attention_mask/custom_mask. It is meant for continued pre-training with packing inputs to consume the entire sequence lengths.

I will raise a PR to raise this warning. Thank you!

pacman100 · 2023-08-24T01:54:03Z

Added the warning:
https://github.com/pacman100/DHS-LLM-Workshop/blob/main/personal_copilot/training/train.py#L381-L384

mayank31398 · 2023-08-24T02:30:38Z

Hey, great
Sorry, didn't know you were using it with dense packing.
Closing this issue.

mayank31398 closed this as completed Aug 24, 2023

This was referenced Sep 4, 2023

Flash Attention for fine-tuning #4

Closed

Flash attention for fine-tuning lm-sys/FastChat#2354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrectness in Flash Attention #1

Incorrectness in Flash Attention #1

mayank31398 commented Aug 23, 2023 •

edited

Loading

pacman100 commented Aug 24, 2023

pacman100 commented Aug 24, 2023

mayank31398 commented Aug 24, 2023

Incorrectness in Flash Attention #1

Incorrectness in Flash Attention #1

Comments

mayank31398 commented Aug 23, 2023 • edited Loading

pacman100 commented Aug 24, 2023

pacman100 commented Aug 24, 2023

mayank31398 commented Aug 24, 2023

mayank31398 commented Aug 23, 2023 •

edited

Loading