Flash attention 加速效果较差，大约只提升5%的推理速度 #49

zcuuu · 2023-08-04T06:54:33Z

Hi
我按照您这边给的flash attention 安装步骤，成功安装了 flash attention，
在运行时 log也显示了：
use flash_attn rotary
use flash_attn rms_norm

我在A100 机器上测试，方式安装flash attention比不安装带来的性能提速，只能带来低于5%的推理提速，（每个token的生成耗时）
所以我想问问，在你们内部实测时，flash attention 带来的性能提升大概是多少呀

jeffchy · 2023-08-04T07:29:13Z

flash主要是训练降显存，提速度。推理一般都没太大用处。

jackaihfia2334 · 2023-08-04T07:59:18Z

我也是，还使用了最新的flash-attention-2，是因为flah-attn确实对推理加速不明显吗？

logicwong · 2023-08-04T08:06:13Z

其实跟序列长度关系比较大，短序列提速不明显，长序列提速还是比较可观的。简单测试了一下：

5000左右的context length，提速有20%以上；
50左右的context length (Readme里的case），提速不到5%

logicwong · 2023-08-04T08:27:40Z

补充一点，刚才的实验只提到了context length，其实跟output length的关系也很大。
当output length比较小，context length比较大时，瓶颈主要context的计算上，这时候FlashAttention就比较有优势；
当output length比较大，瓶颈就落在autoregressive的单步推理上，FlashAttention的提速就比较弱了

oxyhexagen · 2023-08-08T05:46:39Z

我也是，还使用了最新的flash-attention-2，是因为flah-attn确实对推理加速不明显吗？

@jackaihfia2334 您好，请问使用flash attn2需要对原始代码的一些层进行更名吗？还是可以直接使用？

logicwong closed this as completed Aug 4, 2023

logicwong mentioned this issue Aug 10, 2023

[BUG] <title>安装官方给的flash-attention装不上什么原因？ #156

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flash attention 加速效果较差，大约只提升5%的推理速度 #49

Flash attention 加速效果较差，大约只提升5%的推理速度 #49

zcuuu commented Aug 4, 2023

jeffchy commented Aug 4, 2023

jackaihfia2334 commented Aug 4, 2023

logicwong commented Aug 4, 2023

logicwong commented Aug 4, 2023

oxyhexagen commented Aug 8, 2023

Flash attention 加速效果较差，大约只提升5%的推理速度 #49

Flash attention 加速效果较差，大约只提升5%的推理速度 #49

Comments

zcuuu commented Aug 4, 2023

jeffchy commented Aug 4, 2023

jackaihfia2334 commented Aug 4, 2023

logicwong commented Aug 4, 2023

logicwong commented Aug 4, 2023

oxyhexagen commented Aug 8, 2023