Fuse split rotary for Group Query Attention #940

jinhongyii · 2023-09-19T19:50:56Z

No description provided.

junrushao · 2023-09-19T20:05:33Z

CC @masahi as the original author of this optimization :)

MasterJH5574 · 2023-09-19T20:09:33Z

mlc_llm/transform/fuse_split_rotary_embedding.py

@@ -70,10 +70,84 @@ def split_rotary(

    return split_rotary

+def get_split_rotary_group_query_attention(num_query_heads, num_kv_heads,  head_dim, position_embedding_base):


I’m wondering if we can merge the two get_split_rotary into one.

We fuse split rotary of GQA into 2 kernels because of num_heads are different, while split rotary of non-GQA is fused into 1 kernel. So let's keep them as different function

masahi · 2023-09-19T20:30:34Z

nice, does a big model that use MQA / GQA benefit from this pass? The speed up for 13B was much smaller than 7B when I tested.

jinhongyii · 2023-09-19T20:34:56Z

llama2 70b uses GQA. Indeed the speed up is small. Only 2% I observed

fuse split rotary for GQA

24e1a99

MasterJH5574 reviewed Sep 19, 2023

View reviewed changes

format

736afb2

MasterJH5574 approved these changes Sep 19, 2023

View reviewed changes

MasterJH5574 merged commit 8c87e09 into mlc-ai:main Sep 19, 2023

masahi mentioned this pull request Oct 9, 2023

[Transform] Apply split_rotary optimization on prefill #1033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse split rotary for Group Query Attention #940

Fuse split rotary for Group Query Attention #940

jinhongyii commented Sep 19, 2023

junrushao commented Sep 19, 2023

MasterJH5574 Sep 19, 2023

jinhongyii Sep 19, 2023 •

edited

Loading

masahi commented Sep 19, 2023

jinhongyii commented Sep 19, 2023

		@@ -70,10 +70,84 @@ def split_rotary(

		return split_rotary

		def get_split_rotary_group_query_attention(num_query_heads, num_kv_heads, head_dim, position_embedding_base):

Fuse split rotary for Group Query Attention #940

Fuse split rotary for Group Query Attention #940

Conversation

jinhongyii commented Sep 19, 2023

junrushao commented Sep 19, 2023

MasterJH5574 Sep 19, 2023

Choose a reason for hiding this comment

jinhongyii Sep 19, 2023 • edited Loading

Choose a reason for hiding this comment

masahi commented Sep 19, 2023

jinhongyii commented Sep 19, 2023

jinhongyii Sep 19, 2023 •

edited

Loading