-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): flash attention v2 #624
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It looks like this will remove compatibility with compute capability 7.5. That means T4 cards and AWS G4 instance types will no longer be supported. |
Support for 7.5 is coming apparently, in the meantime we might try to keep both |
That would be great. The motivation for my 4bit bnb PR was to improve throughout on Falcon models on g4s. We have workloads that run in regions that only have g4 availability so maintaining compatibility would be best. |
I know. That's why the PR is not merged. |
No description provided.