Update the sdpa benchmark to measure forward backward time in isolation #115986

drisspg · 2023-12-16T17:44:30Z

Summary

The benchmarks were getting a little stale and I think it makes more sense to measure in isolation now rather than E2E in a mha component.

This is a pre-req for getting the data for #115357

Output from run:

+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+
| batch_size | num_heads | q_seq_len | kv_seq_len | embed_dim | is_causal |     dtype      |    forward_time    |   backward_time    |
+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+
|     1      |    16     |    128    |    128     |   2048    |   True    | torch.bfloat16 | 23.86634959839284  | 66.21150835417211  |
|     1      |    16     |    128    |    128     |   2048    |   False   | torch.bfloat16 | 23.452017060481012 | 66.90612225793302  |
|     1      |    16     |    256    |    256     |   2048    |   True    | torch.bfloat16 | 24.478124547749758 |  76.4232068322599  |
|     1      |    16     |    256    |    256     |   2048    |   False   | torch.bfloat16 |  24.6928428998217  | 75.76151192188263  |
|     1      |    16     |    512    |    512     |   2048    |   True    | torch.bfloat16 | 28.69622849393636  | 114.73898496478796 |
|     1      |    16     |    512    |    512     |   2048    |   False   | torch.bfloat16 | 34.399422979913645 | 112.96746158041059 |
|     1      |    16     |   1024    |    1024    |   2048    |   True    | torch.bfloat16 |  65.4690912924707  | 216.26344555988908 |
|     1      |    16     |   1024    |    1024    |   2048    |   False   | torch.bfloat16 | 88.57532404363155  | 212.07790216431025 |
|     8      |    16     |    128    |    128     |   2048    |   True    | torch.bfloat16 | 11.582905380055308 | 70.09557797573505  |
|     8      |    16     |    128    |    128     |   2048    |   False   | torch.bfloat16 | 12.068384909071026 | 70.01491216942668  |
|     8      |    16     |    256    |    256     |   2048    |   True    | torch.bfloat16 | 31.671419646590945 | 203.54910241439939 |
|     8      |    16     |    256    |    256     |   2048    |   False   | torch.bfloat16 |  33.0585768679157  | 209.45609430782497 |
|     8      |    16     |    512    |    512     |   2048    |   True    | torch.bfloat16 | 87.43969700299202  | 469.8729298543185  |
|     8      |    16     |    512    |    512     |   2048    |   False   | torch.bfloat16 | 123.9265550393611  | 580.1084265112877  |
|     8      |    16     |   1024    |    1024    |   2048    |   True    | torch.bfloat16 | 561.1918237991632  | 1181.655174586922  |
|     8      |    16     |   1024    |    1024    |   2048    |   False   | torch.bfloat16 | 884.2707145959139  | 1662.4679416418073 |
+------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+

pytorch-bot · 2023-12-16T17:44:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115986

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3368ffd with merge base 34fe850 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 7, linux.2xlarge) (gh)
test_unary_ufuncs.py::TestUnaryUfuncsCPU::test_reference_numerics_large__refs_special_multigammaln_mvlgamma_p_1_cpu_bfloat16

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benchmarks/transformer/sdpa.py

drisspg · 2023-12-18T20:05:45Z

@pytorchbot merge

pytorchmergebot · 2023-12-18T20:08:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…on (pytorch#115986) # Summary The benchmarks were getting a little stale and I think it makes more sense to measure in isolation now rather than E2E in a mha component. This is a pre-req for getting the data for pytorch#115357 Output from run: ``` Shell +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ | batch_size | num_heads | q_seq_len | kv_seq_len | embed_dim | is_causal | dtype | forward_time | backward_time | +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ | 1 | 16 | 128 | 128 | 2048 | True | torch.bfloat16 | 23.86634959839284 | 66.21150835417211 | | 1 | 16 | 128 | 128 | 2048 | False | torch.bfloat16 | 23.452017060481012 | 66.90612225793302 | | 1 | 16 | 256 | 256 | 2048 | True | torch.bfloat16 | 24.478124547749758 | 76.4232068322599 | | 1 | 16 | 256 | 256 | 2048 | False | torch.bfloat16 | 24.6928428998217 | 75.76151192188263 | | 1 | 16 | 512 | 512 | 2048 | True | torch.bfloat16 | 28.69622849393636 | 114.73898496478796 | | 1 | 16 | 512 | 512 | 2048 | False | torch.bfloat16 | 34.399422979913645 | 112.96746158041059 | | 1 | 16 | 1024 | 1024 | 2048 | True | torch.bfloat16 | 65.4690912924707 | 216.26344555988908 | | 1 | 16 | 1024 | 1024 | 2048 | False | torch.bfloat16 | 88.57532404363155 | 212.07790216431025 | | 8 | 16 | 128 | 128 | 2048 | True | torch.bfloat16 | 11.582905380055308 | 70.09557797573505 | | 8 | 16 | 128 | 128 | 2048 | False | torch.bfloat16 | 12.068384909071026 | 70.01491216942668 | | 8 | 16 | 256 | 256 | 2048 | True | torch.bfloat16 | 31.671419646590945 | 203.54910241439939 | | 8 | 16 | 256 | 256 | 2048 | False | torch.bfloat16 | 33.0585768679157 | 209.45609430782497 | | 8 | 16 | 512 | 512 | 2048 | True | torch.bfloat16 | 87.43969700299202 | 469.8729298543185 | | 8 | 16 | 512 | 512 | 2048 | False | torch.bfloat16 | 123.9265550393611 | 580.1084265112877 | | 8 | 16 | 1024 | 1024 | 2048 | True | torch.bfloat16 | 561.1918237991632 | 1181.655174586922 | | 8 | 16 | 1024 | 1024 | 2048 | False | torch.bfloat16 | 884.2707145959139 | 1662.4679416418073 | +------------+-----------+-----------+------------+-----------+-----------+----------------+--------------------+--------------------+ ``` Pull Request resolved: pytorch#115986 Approved by: https://github.com/mikaylagawarecki

drisspg requested review from cpuhrsch and mikaylagawarecki December 16, 2023 17:45

drisspg added the topic: not user facing topic category label Dec 16, 2023

drisspg force-pushed the update_benchmark branch from 45ec4a9 to aa66a0b Compare December 18, 2023 02:12

mikaylagawarecki approved these changes Dec 18, 2023

View reviewed changes

benchmarks/transformer/sdpa.py Outdated Show resolved Hide resolved

Update the sdpa benchmark to measure forward backward time in isolation

3368ffd

drisspg force-pushed the update_benchmark branch from aa66a0b to 3368ffd Compare December 18, 2023 17:29

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 18, 2023

pytorchmergebot added the merging label Dec 18, 2023

pytorchmergebot added Merged and removed merging labels Dec 18, 2023

pytorchmergebot closed this in 6b120c6 Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the sdpa benchmark to measure forward backward time in isolation #115986

Update the sdpa benchmark to measure forward backward time in isolation #115986

Uh oh!

drisspg commented Dec 16, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 16, 2023 •

edited

Loading

Uh oh!

Uh oh!

drisspg commented Dec 18, 2023

Uh oh!

pytorchmergebot commented Dec 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update the sdpa benchmark to measure forward backward time in isolation #115986

Update the sdpa benchmark to measure forward backward time in isolation #115986

Uh oh!

Conversation

drisspg commented Dec 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Dec 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/115986

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

drisspg commented Dec 18, 2023

Uh oh!

pytorchmergebot commented Dec 18, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drisspg commented Dec 16, 2023 •

edited

Loading

pytorch-bot bot commented Dec 16, 2023 •

edited

Loading