Flash attention #28

Taytay · 2024-01-22T17:40:05Z

Firstly, thank you so much for this repo! I'm a huge fan of T5, and these results are extremely impressive.

I saw that you experimented with different positional embeddings like ALiBi in order to facilitate FA down the line. Was that attempt due to the fact that FA doesn't support bias? If so, there is a PR to add it that is making progress:

Dao-AILab/flash-attention#617

It would be fun to see this repo get even faster.

PiotrNawrot · 2024-02-06T13:54:08Z

@Taytay Thanks for the nice comments, I'm glad you like the repo! Please accept my apologies for the late reply. I've been very busy lately with the ICML submission.

Yes, exactly. FA didn't support back propagation through the extra additive bias (after dot-products, before softmax). I've just noticed this PR, it looks great - I'm sure that backprop through these bias would help not only in the T5 case! Can't wait to have it merged to FA. I'll defo test it soon after it : ).

Closing for now

harish-kamath · 2024-02-22T00:08:27Z

Someone has started a repo, based off of this one, with FA2 support
@catie-aq

https://github.com/catie-aq/flashT5

PiotrNawrot closed this as completed Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash attention #28

Flash attention #28

Taytay commented Jan 22, 2024

PiotrNawrot commented Feb 6, 2024

harish-kamath commented Feb 22, 2024

Flash attention #28

Flash attention #28

Comments

Taytay commented Jan 22, 2024

PiotrNawrot commented Feb 6, 2024

harish-kamath commented Feb 22, 2024