feat(server): flash attention v2 #624

OlivierDehaene · 2023-07-17T16:38:45Z

No description provided.

Narsil

LGTM

krzim · 2023-07-17T21:17:59Z

It looks like this will remove compatibility with compute capability 7.5. That means T4 cards and AWS G4 instance types will no longer be supported.

Narsil · 2023-07-18T09:02:45Z

Support for 7.5 is coming apparently, in the meantime we might try to keep both

krzim · 2023-07-18T13:37:34Z

That would be great. The motivation for my 4bit bnb PR was to improve throughout on Falcon models on g4s. We have workloads that run in regions that only have g4 availability so maintaining compatibility would be best.

OlivierDehaene · 2023-07-18T13:38:34Z

I know. That's why the PR is not merged.

OlivierDehaene added 2 commits July 17, 2023 17:34

feat(server): flash attention v2

107fcfe

fix

2d4b310

Narsil previously approved these changes Jul 17, 2023

View reviewed changes

njhill mentioned this pull request Jul 17, 2023

Upgrade to Flash attention V2 #627

Closed

OlivierDehaene added 2 commits July 18, 2023 09:21

use native grouped attention

f400f2d

abstraction above flash

bc2f351

fix

d186b13

OlivierDehaene dismissed Narsil’s stale review via d186b13 July 18, 2023 10:36

Narsil previously approved these changes Jul 18, 2023

View reviewed changes

fix dockerfile

751f26b

OlivierDehaene dismissed Narsil’s stale review via 751f26b July 18, 2023 13:29

OlivierDehaene merged commit 3b71c38 into main Jul 18, 2023
4 of 5 checks passed

OlivierDehaene deleted the feat/flash_v2 branch July 18, 2023 14:21

tgaddair mentioned this pull request Apr 26, 2024

Fallback to Flash Attention v1 for pre-Ampere GPUs predibase/lorax#440

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): flash attention v2 #624

feat(server): flash attention v2 #624

OlivierDehaene commented Jul 17, 2023

Narsil left a comment

krzim commented Jul 17, 2023

Narsil commented Jul 18, 2023

krzim commented Jul 18, 2023

OlivierDehaene commented Jul 18, 2023

feat(server): flash attention v2 #624

feat(server): flash attention v2 #624

Conversation

OlivierDehaene commented Jul 17, 2023

Narsil left a comment

Choose a reason for hiding this comment

krzim commented Jul 17, 2023

Narsil commented Jul 18, 2023

krzim commented Jul 18, 2023

OlivierDehaene commented Jul 18, 2023