Enable fused foreach Adam compilation #104121

mlazos · 2023-06-23T19:50:32Z

Fixes #ISSUE_NUMBER

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @anijain2305

pytorch-bot · 2023-06-23T19:50:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104121

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ 1 Unrelated Failure

As of commit 1cb8777:

BROKEN TRUNK - The following job failed but were present on the merge base 803c144:

👉 Rebase onto the `viable/strict` branch to avoid these failures

cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mlazos · 2023-06-23T19:50:55Z

Currently getting NaNs so I'm investigating that, but wanted to get feedback on the optim changes

torch/optim/adam.py

mlazos · 2023-06-24T21:19:32Z

Currently getting NaNs so I'm investigating that, but wanted to get feedback on the optim changes

Fixed by #104137

janeyx99

Instead of using self.compiled and introducing a new "internal-only" flag, which...meh, not good engineering, we can just overload self.capturable. So dynamo can just set capturable to True when compiling.

Then, we're guaranteed that step will follow to the right device. e.g., rn one would need to update line 432 in optimizer.py to check the flag so step could be loaded correctly.

torch/optim/optimizer.py

torch/optim/adamw.py

torch/optim/adam.py

janeyx99

Did a deeper dive of the changes. I think we should clarify what users should expect when they torch compile vs capturable vs both. I had previously thought of torch.compile be an overriding of capturable…but is there a space where the eager capturable will vary from torch compiled step?

test/inductor/test_compiled_optimizers.py

torch/optim/adamw.py

torch/optim/adam.py

test/inductor/test_compiled_optimizers.py

…lazos/adam-fused

test/inductor/test_compiled_optimizers.py

torch/optim/optimizer.py

torch/optim/adamw.py

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>

janeyx99

🚀 ⭐️

mlazos · 2023-07-05T19:36:07Z

@pytorchbot merge

pytorchmergebot · 2023-07-05T19:37:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-07-05T23:20:07Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

mlazos · 2023-07-05T23:38:19Z

@pytorchbot merge -f "Failures on trunk"

pytorchmergebot · 2023-07-05T23:39:58Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eellison · 2023-07-07T00:42:31Z

I think this is responsible for a 50s increase in TIMM dynamic shapes compilation time 131s → 180s

as title for reference: this is a followup to #104121 Pull Request resolved: #104888 Approved by: https://github.com/janeyx99

Fixes #104713 ### Testing Manual testing locally using #104121 and confirm that the correct merge base commit is returned [803c144](https://github.com/pytorch/pytorch/commits/1cb87771c1efef32df7009d75bed08249df8ecad) instead of the wrong value provided by `baseRefOid` (de7b6e5). Here is the JSON output of the GraphQL query for PR info https://paste.sh/TJ-QQWz4#fvE3Y6qoJ8vDkRBZ3vowkZ3m Pull Request resolved: #105098 Approved by: https://github.com/malfet

mlazos added 2 commits June 23, 2023 19:25

Changes to fully fuse Adam

c1f503f

Add compiled optimizer test suite

96c205b

mlazos requested review from albanD and janeyx99 as code owners June 23, 2023 19:50

pytorch-bot bot added the release notes: optimizer Relating to optimizers, torch.optim label Jun 23, 2023

github-actions bot added ciflow/inductor module: dynamo module: inductor labels Jun 23, 2023

janeyx99 reviewed Jun 23, 2023

View reviewed changes

torch/optim/adam.py Outdated Show resolved Hide resolved

mlazos added 3 commits June 24, 2023 01:42

Fix adam change for pow

e33ff2f

Merge remote-tracking branch 'origin/main' into mlazos/adam-fused

e069c26

Fix test case

445ab88

mlazos added the release notes: inductor label Jun 24, 2023

mlazos added 2 commits June 24, 2023 02:12

Implement internal compile flag + test

298a982

Merge remote-tracking branch 'origin/main' into mlazos/adam-fused

97d0664

mlazos added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 24, 2023

Add compiled flag to serdes

1da06d8

mlazos requested a review from janeyx99 June 26, 2023 21:18

janeyx99 reviewed Jun 27, 2023

View reviewed changes

Overrdie capturable in dynamo

6f192d4

mlazos requested a review from janeyx99 June 27, 2023 19:07

mlazos force-pushed the mlazos/adam-fused branch from c6c4fe0 to 6f192d4 Compare June 27, 2023 22:10

mlazos added 2 commits June 27, 2023 15:11

Merge branch 'main' into mlazos/adam-fused

3e026ee

Ignore asserts handled by the compiler

948f6cc

pytorch deleted a comment from linux-foundation-easycla bot Jun 28, 2023

Fix bug in load_state_dict

827d5ea

janeyx99 reviewed Jun 29, 2023

View reviewed changes

torch/optim/optimizer.py Show resolved Hide resolved

torch/optim/optimizer.py Show resolved Hide resolved

torch/optim/adamw.py Show resolved Hide resolved

torch/optim/adamw.py Show resolved Hide resolved

torch/optim/adam.py Outdated Show resolved Hide resolved

Updated comments

c6d3b40

janeyx99 reviewed Jun 30, 2023

View reviewed changes

mlazos added 2 commits June 30, 2023 14:47

Added additional commenting and test

d2260ea

Merge branch 'mlazos/adam-fused' of github.com:pytorch/pytorch into m…

643e385

…lazos/adam-fused

janeyx99 reviewed Jul 5, 2023

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

torch/optim/optimizer.py Outdated Show resolved Hide resolved

torch/optim/adamw.py Outdated Show resolved Hide resolved

mlazos and others added 3 commits July 5, 2023 12:23

Update torch/optim/adamw.py

84402fd

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>

Update torch/optim/optimizer.py

8769945

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>

Update test/inductor/test_compiled_optimizers.py

d8e6c45

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>

janeyx99 approved these changes Jul 5, 2023

View reviewed changes

Updated comments

1cb8777

pytorchmergebot added the merging label Jul 5, 2023

pytorchmergebot removed the merging label Jul 5, 2023

pytorchmergebot added the merging label Jul 5, 2023

pytorchmergebot added Merged and removed merging labels Jul 5, 2023

pytorchmergebot closed this in a290cbf Jul 5, 2023

huydhn mentioned this pull request Jul 6, 2023

Problematic use of baseRefOid to get the merge base commit in trymerge #104713

Closed

janeyx99 mentioned this pull request Jul 10, 2023

Enable running compiled optimizers in CI #104888

Closed

pytorchmergebot pushed a commit that referenced this pull request Jul 10, 2023

Enable running compiled optimizers in CI (#104888)

4063158

as title for reference: this is a followup to #104121 Pull Request resolved: #104888 Approved by: https://github.com/janeyx99

huydhn mentioned this pull request Jul 12, 2023

Use GitHub REST API to get the merge base commit SHA #105098

Closed

github-actions bot deleted the mlazos/adam-fused branch January 4, 2025 02:03

Enable fused foreach Adam compilation #104121

Enable fused foreach Adam compilation #104121

Uh oh!

Conversation

mlazos commented Jun 23, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104121

✅ 1 Unrelated Failure

Uh oh!

mlazos commented Jun 23, 2023

Uh oh!

Uh oh!

mlazos commented Jun 24, 2023

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

mlazos commented Jul 5, 2023

Uh oh!

pytorchmergebot commented Jul 5, 2023

Merge started

Uh oh!

pytorchmergebot commented Jul 5, 2023

Merge failed

Uh oh!

mlazos commented Jul 5, 2023

Uh oh!

pytorchmergebot commented Jul 5, 2023

Merge started

Uh oh!

eellison commented Jul 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mlazos commented Jun 23, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 23, 2023 •

edited

Loading