MHA optimizations #93234

milesial · 2023-01-29T18:56:01Z

Slight perf optimizations for regular MHA by reducing the number of kernels called

Before:

After:

cc @ngimel

pytorch-bot · 2023-01-29T18:56:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93234

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b293762:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2023-01-29T22:19:24Z

torch/nn/functional.py

-            return linear(q, w, b).chunk(3, dim=-1)
+            proj = linear(q, w, b)
+            # reshape to 3, E and not E, 3 is deliberate for better memory coalescing
+            proj = proj.view(1, *proj.shape[:-1], 3, E).transpose(0, -2).squeeze(-1).contiguous()


will this make old checkpoints return a different result due to a different interpretation of output channels? or is the interpretation the same? 3, E is not the same as E, 3 in that respect... also wondering if this should be made into chunk option somehow

chunk was doing the same as (3, E), so checkpoints won't be affected

>>> proj = torch.arange(0, 3 * E) >>> proj tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]) >>> proj.chunk(3, dim=-1) (tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]), tensor([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])) >>> proj.view(1, *proj.shape[:-1], 3, E).transpose(0, -2).squeeze(-1).contiguous() tensor([[[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]], [[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]], [[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]]])

In this case we do it this way because we specifically know we want the chunks to be contiguous. But in most cases it's fine and faster to use chunk without calling contiguous(). If there were a contiguous flag to torch.chunk then yes this kind of trick could also be applied there.

yeah, maybe copy=True/False argument could be introduced for chunk/split

chunk was doing the same as (3, E), so checkpoints won't be affected

I see, I guess I was misled by the original comment reshape to 3, E and not E, 3

Made the comment more clear.
I think we can leave chunk() like it is now, and if the same pattern appears in multiple places we can think about adding a flag later.

Also, there might exist an internal method torch.transpose_copy, not sure if it's of any use here and runs faster than transpose + contiguous

Do you know where this is implemented? I couldn't find it.

Hmm, not sure. Looks like it's some sort of clutch, and not implemented anywhere :( https://github.com/pytorch/pytorch/search?q=transpose_copy

drisspg · 2023-01-30T19:11:37Z

torch/nn/functional.py

@@ -5178,9 +5185,9 @@ def multi_head_attention_forward(
    #
    # reshape q, k, v for multihead attention and make em batch first
    #
-    q = q.contiguous().view(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)
+    q = q.reshape(tgt_len, bsz * num_heads, head_dim).transpose(0, 1)


So the above changes assures us that q,k,v is contiguous? Could we call view() instead of reshape?

In the path where we call in_projection_packed yes we could do a view. I think it's also fine in the other paths but to be safe in case we add other paths in the future I changed to reshape.

Is reshape more expensive than view on the CPU side?

It does some checking to see if it is safe to call view otherwise it will clone. I think making it view is more explicit. I think if the projection changes are to assure contiguity of q,k,v at this point of the computation for perf reasons than we should be explicit. Also I pray no new paths are added

Got it, changed to view in ad36237

drisspg

Thanks!

ngimel · 2023-01-30T22:03:00Z

torch/nn/functional.py

-            return linear(q, w, b).chunk(3, dim=-1)
+            proj = linear(q, w, b)
+            # reshape to 3, E and not E, 3 is deliberate for better memory coalescing and keeping same order as chunk()
+            proj = proj.unflatten(-1, (3, E)).unsqueeze(0).transpose(0, -2).squeeze(-1).contiguous()


squeeze(-1) doesn't look right, last dimension most likely isn't 1 (unless E happens to be 1).

Should be squeeze(-2) probably...

Oops yes you're right, forgot to change when I updated from transpose(0, -1). Still works because of the view() later. I fixed

milesial · 2023-02-03T10:42:41Z

@pytorchbot merge

pytorchmergebot · 2023-02-03T10:45:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

milesial requested review from albanD and jbschlosser as code owners January 29, 2023 18:56

pytorchbot added the open source label Jan 29, 2023

vadimkantorov reviewed Jan 29, 2023

View reviewed changes

milesial added 3 commits January 30, 2023 00:50

MHA optimizations

818ee51

Clarify comment

e64092f

Fix JIT

33d4923

albanD requested review from drisspg and removed request for albanD January 30, 2023 18:19

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 30, 2023

drisspg reviewed Jan 30, 2023

View reviewed changes

Change reshape to view

ad36237

drisspg added module: performance Issues related to performance, either of kernel code or framework glue topic: performance topic category labels Jan 30, 2023

drisspg self-requested a review January 30, 2023 21:51

drisspg approved these changes Jan 30, 2023

View reviewed changes

ngimel reviewed Jan 30, 2023

View reviewed changes

Fix squeeze

b293762

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 3, 2023

pytorchmergebot added the Merged label Feb 3, 2023

pytorchmergebot closed this in 6c555b2 Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MHA optimizations #93234

MHA optimizations #93234

milesial commented Jan 29, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 29, 2023 •

edited

vadimkantorov Jan 29, 2023 •

edited

milesial Jan 29, 2023

vadimkantorov Jan 29, 2023 •

edited

milesial Jan 29, 2023

vadimkantorov Jan 30, 2023 •

edited

milesial Jan 30, 2023

vadimkantorov Jan 30, 2023 •

edited

drisspg Jan 30, 2023 •

edited

milesial Jan 30, 2023

drisspg Jan 30, 2023

milesial Jan 30, 2023

drisspg left a comment

ngimel Jan 30, 2023

vadimkantorov Jan 30, 2023

milesial Jan 30, 2023 •

edited

milesial commented Feb 3, 2023

pytorchmergebot commented Feb 3, 2023

MHA optimizations #93234

MHA optimizations #93234

Conversation

milesial commented Jan 29, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Jan 29, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93234

✅ No Failures

vadimkantorov Jan 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkantorov Jan 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkantorov Jan 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkantorov Jan 30, 2023 • edited

Choose a reason for hiding this comment

drisspg Jan 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drisspg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milesial Jan 30, 2023 • edited

Choose a reason for hiding this comment

milesial commented Feb 3, 2023

pytorchmergebot commented Feb 3, 2023

Merge started

milesial commented Jan 29, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 29, 2023 •

edited

vadimkantorov Jan 29, 2023 •

edited

vadimkantorov Jan 29, 2023 •

edited

vadimkantorov Jan 30, 2023 •

edited

vadimkantorov Jan 30, 2023 •

edited

drisspg Jan 30, 2023 •

edited

milesial Jan 30, 2023 •

edited