You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This should be a relatively easy implementation. When loading a model with hf transformers, users can pass in attn_implementation="flash_attention_2". Details here
I suggest us:
freeze the version of transformers and trl in dependencies
add flash-attn into dependencies list
have ConfigModel mapping to parameters of the underlying transformers modules
Are we supporting flash_attention feature? https://github.com/Dao-AILab/flash-attention/tree/main
The text was updated successfully, but these errors were encountered: