Fixes for FlashAttention #2126

tmm1 · 2023-08-01T05:30:14Z

Why are these changes needed?

We need to mark LlamaRMSNorm layers as bf16 after peft conversion to fix errors from flash-attn about dtype support

I also updated the flash attention patch

I am able to run training on llama2 with this change.

Related issue number (if applicable)

Closes #1828

Checks

I've run format.sh to lint the changes in this PR.
I've included any doc changes needed.
I've made sure the relevant tests are passing (if applicable).

merrymercy · 2023-08-08T10:35:15Z

@tmm1 Thanks! This is merged.

philschmid · 2023-08-09T15:49:02Z

This code looks quite similar to the patch in my blog We know noticed that 70B models with GQA are not supported.

Have you seen the same issue?

tmm1 · 2023-08-09T16:14:48Z

Hi @philschmid, yes you could refer to this approach:

LAION-AI/Open-Assistant@3c8f93e

philschmid · 2023-08-09T16:18:03Z

Yeah saw the commit as well. Just wanted to share here that you are aware its not working for 70B atm.

tmm1 · 2023-08-12T22:26:49Z

@philschmid I'm looking at this again now to add 70B support. Did you end up doing any more work in this area?

I'm also interested in making the patch work for forward-pass with past_key_value support, which is something that's still not quite working.

merrymercy · 2023-08-13T01:52:01Z

@tmm1 Hi, I did some minor style cleanup in PR #2212.
I also found the current implementation does not support generative inference or 70B.
Could you take a look and fix it? Thanks!

tmm1 added 2 commits August 1, 2023 05:21

ensure bf16 is used feeding into flash attn

02ee3ee

update patch

519cb81

tmm1 changed the title ~~Fixed for FlashAttention~~ Fixes for FlashAttention Aug 1, 2023

This was referenced Aug 1, 2023

LoRA Fine Tuning Crash at FlashAttention Issue #1828

Closed

Fine-tuning Vicuna-7B with Local GPUs: RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false #459

Closed

tmm1 added 2 commits August 1, 2023 07:09

update layers after peft

2b17dc7

reformat

d6ce271

tmm1 marked this pull request as ready for review August 1, 2023 07:17

merrymercy force-pushed the main branch from 6ec2e71 to f76f4c9 Compare August 1, 2023 18:33

merrymercy merged commit 060c9f1 into lm-sys:main Aug 8, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for FlashAttention #2126

Fixes for FlashAttention #2126

tmm1 commented Aug 1, 2023 •

edited

merrymercy commented Aug 8, 2023

philschmid commented Aug 9, 2023

tmm1 commented Aug 9, 2023 •

edited

philschmid commented Aug 9, 2023

tmm1 commented Aug 12, 2023

merrymercy commented Aug 13, 2023

Fixes for FlashAttention #2126

Fixes for FlashAttention #2126

Conversation

tmm1 commented Aug 1, 2023 • edited

Why are these changes needed?

Related issue number (if applicable)

Checks

merrymercy commented Aug 8, 2023

philschmid commented Aug 9, 2023

tmm1 commented Aug 9, 2023 • edited

philschmid commented Aug 9, 2023

tmm1 commented Aug 12, 2023

merrymercy commented Aug 13, 2023

tmm1 commented Aug 1, 2023 •

edited

tmm1 commented Aug 9, 2023 •

edited