REDO of dropout support for mem eff #102038 #103704

drisspg · 2023-06-15T21:09:25Z

THIS IS A new PR with the changes from #102038 + #103201 + plus namespacing changes to fix bug.

Summary

This PR builds off of:

It specifically adds dropout support to the memory efficient attention kernel. In the process of doing so roughly 3 changes were made:

Update sdpa dispatching to allow for inputs requiring grad to be sent to efficient attention
Update how memory efficient attention handles passing the rng state from forward to backward in order to enable cuda_graph support
Fix a bug in the kernel that was causing incorrect gradients to be produced for num_keys > 64 with dropout and causal masking set. Use the absolute index for rng state indexing facebookresearch/xformers#755

cc @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @anijain2305 @aakhundov @bertmaher

pytorch-bot · 2023-06-15T21:09:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103704

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3c3a39b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-06-15T21:11:54Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-15T23:05:31Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-16T01:16:08Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-16T04:16:51Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-16T18:41:49Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-21T16:30:45Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-22T23:56:15Z

This pull request was exported from Phabricator. Differential Revision: D46778637

Summary: bypass-github-export-checks THIS IS A new PR with the changes from pytorch#102038 + pytorch#103201 + plus namespacing changes to fix bug. This PR builds off of: - pytorch#101847 - pytorch#100583 It specifically adds dropout support to the memory efficient attention kernel. In the process of doing so roughly 3 changes were made: - Update sdpa dispatching to allow for inputs requiring grad to be sent to efficient attention - Update how memory efficient attention handles passing the rng state from forward to backward in order to enable cuda_graph support - Fix a bug in the kernel that was causing incorrect gradients to be produced for num_keys > 64 with dropout and causal masking set. facebookresearch/xformers#755 cc albanD voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 anijain2305 aakhundov bertmaher Pull Request resolved: pytorch#103704 Reviewed By: cpuhrsch Differential Revision: D46778637 Pulled By: drisspg fbshipit-source-id: 79cb8d9641b00c27b8ec669be1398b1bf36e7a6e

facebook-github-bot · 2023-06-23T00:06:24Z

This pull request was exported from Phabricator. Differential Revision: D46778637

facebook-github-bot · 2023-06-23T17:32:01Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-23T18:26:38Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-06-26T23:03:07Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-06-26T23:04:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

davidberard98 · 2023-06-27T18:36:17Z

I bisected a 4% slowdown on huggingface to this PR (specifically using ElectraForQuestionAnswering). @drisspg any ideas what could cause this?

before this: baseline 102.8ms, 36.2ms PT2
after this: baseline 102.8ms, 41.9ms PT2

python benchmarks/dynamo/huggingface.py --output=bisect.csv --training -dcuda --no-skip --dashboard --cold_start_latency --inductor --performance --amp --only ElectraForQuestionAnswering

drisspg · 2023-06-27T19:03:45Z

@davidberard98 This is likely because the model question is using dropout. Previously Memory effiecient attention would have not been dispatched to and this PR allows this now. A good way to verify this would be to select which kernel: https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html#explicit-dispatcher-control

pytorch/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp

Line 44 in a66107a

std::array<SDPBackend, num_backends> priority_order(sdp_params params) {

This defines the kernel ordering. I have one more PR to add attn_bias support to mem_eff and then was going see if any updates are needed to the priority order

eellison · 2023-06-28T16:56:11Z

@drisspg do we expect the other pr to recover perf ? 5% performance regression is significant. FWIW the memory compression stayed stable with this pr. I guess potentially our heuristics on which backend to select here didn't work great here ?

drisspg · 2023-06-28T17:38:58Z

@eellison I do not think the other PR will recover the PERF. I suspect the heurisitic is choosing the new mem_eff + dropout path now that it exists when the more performant path is FlashAttn.

I am planning though to create a 3rd pr once the mem_eff attn PR to re run the perf sweep and see if there are more modifications to be made to the heuristic

davidberard98 · 2023-06-28T17:43:08Z

@drisspg you mentioned selecting the kernel using the tutorial you linked - do you recommend this or do you think it's better to rely on the heuristics (and wait for your updates to see if the heuristics)

drisspg · 2023-06-28T18:18:12Z

@davidberard98 Depends, for user who want the maximum peformance I think they should always measure on their specific shapes and hardware the 3 different options and use the dispatcher to pick the fastest one. I assume this only effects PT2 models becuase of the sdpa re-writer. And for that there are two options. Similiar to matmul we could add an autotune pass? Otherwise I think we need to rely on the heurisitc

eellison · 2023-06-28T18:33:55Z

Where are the current heuristics ?

drisspg · 2023-06-28T18:52:59Z

Where are the current heuristics ?

Linked up above in my response to David

eellison · 2023-06-28T19:18:13Z

cc @Chillee who updated the heuristics recently

drisspg requested review from mruberry, ngimel, albanD and soulitzer as code owners June 15, 2023 21:09

github-actions bot added the module: inductor label Jun 15, 2023

drisspg added topic: new features topic category module: multi-headed-attention labels Jun 15, 2023

cpuhrsch approved these changes Jun 15, 2023

View reviewed changes

drisspg added skip-pr-sanity-checks release notes: nn release notes category labels Jun 15, 2023

github-actions bot added the module: dynamo label Jun 16, 2023

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from dd9356b to dbb2d16 Compare June 16, 2023 01:15

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from dbb2d16 to 053644c Compare June 16, 2023 04:07

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from 053644c to 37640a2 Compare June 16, 2023 18:41

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from 37640a2 to c18880c Compare June 21, 2023 16:29

drisspg added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 21, 2023

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from c18880c to 0ffaf1f Compare June 22, 2023 23:56

drisspg force-pushed the redo_dropout_support_for_mem_eff branch from 0ffaf1f to 5ab17c4 Compare June 23, 2023 00:06

internal rebase weirdness

1d1b95d

lint will be the death of me

3c3a39b

pytorchmergebot added the merging label Jun 26, 2023

pytorchmergebot added Merged and removed merging labels Jun 26, 2023

pytorchmergebot closed this in 4a008d2 Jun 26, 2023

eellison mentioned this pull request Jun 28, 2023

5% HF Regression from mem eff PR #104356

Closed

REDO of dropout support for mem eff #102038 #103704

REDO of dropout support for mem eff #102038 #103704

Uh oh!

Conversation

drisspg commented Jun 15, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103704

✅ No Failures

Uh oh!

facebook-github-bot commented Jun 15, 2023

Uh oh!

facebook-github-bot commented Jun 15, 2023

Uh oh!

facebook-github-bot commented Jun 16, 2023

Uh oh!

facebook-github-bot commented Jun 16, 2023

Uh oh!

facebook-github-bot commented Jun 16, 2023

Uh oh!

facebook-github-bot commented Jun 21, 2023

Uh oh!

facebook-github-bot commented Jun 22, 2023

Uh oh!

facebook-github-bot commented Jun 23, 2023

Uh oh!

facebook-github-bot commented Jun 23, 2023

Uh oh!

facebook-github-bot commented Jun 23, 2023

Uh oh!

facebook-github-bot commented Jun 26, 2023

Uh oh!

pytorchmergebot commented Jun 26, 2023

Merge started

Uh oh!

davidberard98 commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drisspg commented Jun 27, 2023

Uh oh!

eellison commented Jun 28, 2023

Uh oh!

drisspg commented Jun 28, 2023

Uh oh!

davidberard98 commented Jun 28, 2023

Uh oh!

drisspg commented Jun 28, 2023

Uh oh!

eellison commented Jun 28, 2023

Uh oh!

drisspg commented Jun 28, 2023

Uh oh!

eellison commented Jun 28, 2023

Uh oh!

Uh oh!

drisspg commented Jun 15, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 15, 2023 •

edited

Loading

davidberard98 commented Jun 27, 2023 •

edited

Loading