Add flash_attention_1/flash_attention_2 support & examples #130

truskovskiyk · 2024-04-07T01:15:45Z

Are we supporting flash_attention feature? https://github.com/Dao-AILab/flash-attention/tree/main

benjaminye · 2024-04-08T14:50:48Z

This should be a relatively easy implementation. When loading a model with hf transformers, users can pass in attn_implementation="flash_attention_2". Details here

I suggest us:

freeze the version of transformers and trl in dependencies
add flash-attn into dependencies list
have ConfigModel mapping to parameters of the underlying transformers modules

benjaminye added enhancement New feature or request dependencies Pull requests that update a dependency file labels Apr 8, 2024

benjaminye self-assigned this Apr 8, 2024

benjaminye mentioned this issue Apr 8, 2024

Flash Attention Implementation & Fuller Config Options #139

Merged

benjaminye linked a pull request Apr 8, 2024 that will close this issue

Flash Attention Implementation & Fuller Config Options #139

Merged

benjaminye closed this as completed in #139 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flash_attention_1/flash_attention_2 support & examples #130

Add flash_attention_1/flash_attention_2 support & examples #130

truskovskiyk commented Apr 7, 2024

benjaminye commented Apr 8, 2024

Add flash_attention_1/flash_attention_2 support & examples #130

Add flash_attention_1/flash_attention_2 support & examples #130

Comments

truskovskiyk commented Apr 7, 2024

benjaminye commented Apr 8, 2024