Grouped Query Attention + Refactor Attn #492

sashaDoubov · 2023-07-27T17:38:46Z

Adding grouped query attention to LLM-foundry, and making the GQA class a superclass of MQA and MHA attention, as it is a generalization of those two variants of attention.
Things to note:

we currently use repeat_interleave to make the grouped query tensor the same dimensions as multi-head attention, which does allocate new memory, compared to using expand. This can be updated in the future, but is the safer bet for now, given that we previously saw edge-cases with using expand vs repeat for particular head_dim settings causing NaNs
Part of this change is also changing how we initialize the QKV matrix, which used to initialize (n_heads * head_dim, d_model), but now is changed to initialize each n_head (head_dim, d_model) matrix separately

llmfoundry/models/mpt/modeling_mpt.py

…ed_attn

llmfoundry/models/layers/attention.py

llmfoundry/models/mpt/modeling_mpt.py

tests/test_model.py

vchiley

GeneralizedAttention is a valid name, but should be call it GroupedQueryAttention to not confuse ppl?

or alternatively create

class GroupedQueryAttention(GeneralizedAttention):
    def __init__(..., groups=G, ...):
        super().__init__(..., kv_n_heads=G, ...)

or something like that (following convention of other impl)

Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>

sashaDoubov · 2023-08-03T18:30:12Z

GeneralizedAttention is a valid name, but should be call it GroupedQueryAttention to not confuse ppl?

or alternatively create
class GroupedQueryAttention(GeneralizedAttention):
    def __init__(..., groups=G, ...):
        super().__init__(..., kv_n_heads=G, ...)
or something like that (following convention of other impl)

I've renamed it to GroupedQueryAttention, as per discussion offline

llmfoundry/models/layers/attention.py

dakinggg

Could you please include a comparison run for a normal mha model before and after this pr? I'd like to make sure we don't have a perf regression (or at least know the quantity of it)

llmfoundry/models/mpt/configuration_mpt.py

llmfoundry/models/layers/attention.py

llmfoundry/models/mpt/modeling_mpt.py

llmfoundry/models/layers/blocks.py

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

…ed_attn

dakinggg

few nits, but lgtm pending the performance comparison

llmfoundry/models/layers/attention.py

llmfoundry/models/layers/blocks.py

Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>

sashaDoubov · 2023-08-10T23:33:09Z

Could you please include a comparison run for a normal mha model before and after this pr? I'd like to make sure we don't have a perf regression (or at least know the quantity of it)

Here are some perf numbers, which look good outside of the loss spikes. However, I did see loss spikes with both "before and after" the change:

root and others added 3 commits July 26, 2023 22:10

Add grouped query attention implementation + refactor

f5411bb

add config description and support in blocks, unpadding and fix test

35d410a

add tests and clone attn_config

5d26ff6

sashaDoubov changed the title ~~Generalized attn~~ Grouped Attention + Refactor Attn Jul 27, 2023

sashaDoubov changed the title ~~Grouped Attention + Refactor Attn~~ Grouped Query Attention + Refactor Attn Jul 27, 2023

sashaDoubov added 7 commits July 27, 2023 10:41

Merge branch 'main' into generalized_attn

d07de71

fix comment

952244a

fix bad pop of attn_impl

b950eb0

change init in test_model to be cpu and then move to gpu for stability

f5af9d8

comment out test temporarily

985adc2

undo commenting out test

ff4c12c

Merge branch 'main' into generalized_attn

e8d61d5

sashaDoubov marked this pull request as ready for review August 2, 2023 17:52

sashaDoubov requested review from vchiley and dakinggg August 2, 2023 17:52

sashaDoubov commented Aug 2, 2023

View reviewed changes

llmfoundry/models/mpt/modeling_mpt.py Outdated Show resolved Hide resolved

Merge branch 'main' of github.com:mosaicml/llm-foundry into HEAD

9f792bf

sashaDoubov force-pushed the generalized_attn branch from db9d47e to 9f792bf Compare August 3, 2023 00:04

sashaDoubov added 3 commits August 3, 2023 16:30

Fixing pyright issues

d84cfa9

Merge branch 'main' of github.com:mosaicml/llm-foundry into generaliz…

916d8dd

…ed_attn

fix return types

a663dde

sashaDoubov commented Aug 3, 2023

View reviewed changes

llmfoundry/models/layers/attention.py Show resolved Hide resolved

fix typo

f873835