Experiment: shared cos sin #11

gante · 2023-06-25T19:42:06Z

⚠️ do not merge!

This is an experimental PR that shares cos and sin across the decoder layers.

If we look at the profile, a LOT of time is spent on apply_rotary_pos_emb_opt. This is an attempt to reduce it.

Learnings

PT version: torch==2.1.0.dev20230621+cu118

No significant result changes

fxmarty · 2023-06-26T04:33:01Z

src/trfs_fast/llama.py

+    q_embed[..., half_dim:] += q[..., :half_dim] * sin[..., half_dim:]
+    q_embed[..., :half_dim] += q[..., half_dim:] * sin[..., :half_dim] * -1
+
+    k_embed = (key_states * cos)
+    k_embed[..., half_dim:] += key_states[..., :half_dim] * sin[..., half_dim:]
+    k_embed[..., :half_dim] += key_states[..., half_dim:] * sin[..., :half_dim] * -1


I assume there is a lot of overhead from multiple aten::slice calls (had the same issue when slicing past_key_values)

The alternative, calling rotate_half, is equally as bad 😅

It's quite frustrating to know that our attention layers take more time to apply the rotary embedding than the attention itself 😞

Yes, at least it's good to know there's a strong bottleneck there. Maybe there exist better pytorch-based implementation (not sure how TGI handles it).

gante · 2023-06-27T23:16:18Z

Using the plot facilities from #12 (and using the plots in that PR as a reference for the performance in main)

batch size sweep

prompt length sweep

performance conclusions

No major performance changes confirmed (I'd attribute the slightly faster runs to lower temperature in the room, we can see that the transformers model runs are also slightly faster)

fxmarty · 2023-06-28T01:21:05Z

@gante maybe this can be useful for apple-to-apple comparison NVIDIA/cutlass#430 (comment)

gante · 2023-06-28T09:22:01Z

@fxmarty TIL, that's interesting to make comparisons!

fxmarty reviewed Jun 26, 2023

View reviewed changes

shared cos sin

e746c78

gante force-pushed the shared_cos_sin branch from a0f4719 to e746c78 Compare June 27, 2023 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experiment: shared cos sin #11

Experiment: shared cos sin #11

Uh oh!

gante commented Jun 25, 2023

Uh oh!

fxmarty Jun 26, 2023 •

edited

Loading

Uh oh!

gante Jun 27, 2023

Uh oh!

fxmarty Jun 27, 2023

Uh oh!

gante commented Jun 27, 2023

Uh oh!

fxmarty commented Jun 28, 2023 •

edited

Loading

Uh oh!

gante commented Jun 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Experiment: shared cos sin #11

Are you sure you want to change the base?

Experiment: shared cos sin #11

Uh oh!

Conversation

gante commented Jun 25, 2023

⚠️ do not merge!

Learnings

Uh oh!

fxmarty Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Jun 27, 2023

Choose a reason for hiding this comment

Uh oh!

fxmarty Jun 27, 2023

Choose a reason for hiding this comment

Uh oh!

gante commented Jun 27, 2023

batch size sweep

prompt length sweep

performance conclusions

Uh oh!

fxmarty commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante commented Jun 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fxmarty Jun 26, 2023 •

edited

Loading

fxmarty commented Jun 28, 2023 •

edited

Loading