feat(inductor): Improve `Adamax` to be better fused by Inductor and enable it #110345

jon-chuang · 2023-10-01T01:50:24Z

In order to avoid running into #110342, we replace complicated cat + max logic into a simple torch.maximum. (see also here in fairseq repo - the same operation with a different, less-explicit syntax).

Like #110339, we also ensure that step is on device.

As noted in #107006 (comment), it is likely that num_kernels=2 is optimal, unless one can move the numel=1, shape=(0,) scalar computations entirely into CPU.

That being said, in the foreach case, a parallel foreach scalar computation on GPU doesn't seem too bad.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

pytorch-bot · 2023-10-01T01:50:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110345

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 61c685b with merge base 4e73eee ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jon-chuang · 2023-10-01T03:00:41Z

torch/optim/adamax.py

+                out=exp_inf,
+            )
        else:
+            norm_buf = torch.cat(


maximum didn't work for differentiable case. so fallback to prev

Error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [10]], which is output 0 of torch::autograd::CopyBackwards, is at version 2; expected version 1 instead.

Related issue on maximum: #54216

…uang/add-adamax-to-inductor

torch/optim/adamax.py

janeyx99

Do you have benchmark numbers on perf for eager foreach (switching differentiable=T/F) showing that this change is not regressing that?

Also, could you move the stylistic changes in another PR?

test/inductor/test_compiled_optimizers.py

test/optim/test_optim.py

janeyx99 · 2023-10-02T22:34:56Z

torch/_dynamo/eval_frame.py

        }

        disabled_multi_tensor_opt_modules = {
-            adamax,


Nice! Thanks for taking this on in general!

torch/optim/adamax.py

test/inductor/test_compiled_optimizers.py

…uang/add-adamax-to-inductor

@janeyx99

… against list comprehensions (e.g. complex conversion) (#110613) Fully fixes: #110506 Depends: #110607 Potential merge conflicts: - #110339 - #110345 - #110454 Related: - #110606 (we can apply the improvements here orthogonally to the complex support) ### Results Benchmark: 100 params. Breakdowns (float32, dynamo): ``` Adagrad: this PR: 4.4s, main: 8.8s Adam: this PR: 2.1s, main: 9.8s AdamW: this PR: 2.5s, main: 8.2s ASGD: this PR: 3.1s, main: 8.5s RMSProp: this PR: 1.3s, main: 4.2s RProp: this PR: 6.7s, main: 14.9s ``` Notes: 1. Adagrad is still slow due to `_get_value` list comprehension. Can be fixed in https://github.com/pytorch/pytorch/pull/110339/files by utilizing capturable path 2. Adamax is not actually compiled (it is currently disabled). 3. Inductor compile time is quite variable. We calculate dynamo by subtracting `call_user_compiler` from `compile_inner` timing. <details> This PR: ``` Adagrad (torch.float32): 28.47496461868286s Adagrad (torch.complex64): 29.379547357559204s Adam (torch.float32): 17.334211587905884s Adam (torch.complex64): 29.637500524520874s Adamax (torch.float32): 2.4749321937561035s Adamax (torch.complex64): 3.1997995376586914s AdamW (torch.float32): 18.06532859802246s AdamW (torch.complex64): 28.25661015510559s ASGD (torch.float32): 23.70255398750305s ASGD (torch.complex64): 25.33756995201111s RMSprop (torch.float32): 7.964028596878052s RMSprop (torch.complex64): 12.909599781036377s Rprop (torch.float32): 30.512362003326416s Rprop (torch.complex64): 44.74405765533447s ``` Main ``` Adagrad (torch.float32): 26.919506072998047s Adagrad (torch.complex64): 35.190622091293335s Adam (torch.float32): 25.715000867843628s Adam (torch.complex64): 24.17716670036316s Adamax (torch.float32): 2.4404726028442383s Adamax (torch.complex64): 3.3538928031921387s AdamW (torch.float32): 25.2022807598114s AdamW (torch.complex64): 28.915700912475586s ASGD (torch.float32): 24.108731985092163s ASGD (torch.complex64): 26.589075088500977s RMSprop (torch.float32): 10.781344175338745s RMSprop (torch.complex64): 15.136352777481079s Rprop (torch.float32): 42.46482181549072s Rprop (torch.complex64): 48.28277635574341s ``` Seems that it doesn't help the complex case by much (but that's not the majority case). torch.float32 is generally positive, when it does not show drastic improvement / regresses, it is due to inductor variance (by manually inspecting the logs). </details> ### Benchmark Script ```python import torch import time from torch.optim import Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop OPTIMS = [Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop] DTYPES = [torch.float, torch.cfloat] NUM_PARAMS = 100 kwargs = { "lr": 0.01, "foreach": True } summary = [] for optim_cls in OPTIMS: for dtype in DTYPES: torch._dynamo.reset() # torch._inductor.metrics.reset() input = torch.ones([10, 10], dtype=dtype, device="cuda:0") model = torch.nn.Sequential( *[torch.nn.Linear(10, 10, dtype=dtype, device="cuda:0") for _ in range(NUM_PARAMS)] ) model(input).sum().abs().backward() opt_compiled = optim_cls(model.parameters(), **kwargs) compiled_step = torch.compile(opt_compiled.step) with torch.set_grad_enabled(False): start_time = time.time() compiled_step() summary.append(f"{optim_cls.__name__} ({dtype}): {time.time() - start_time}s") print(optim_cls, kwargs, dtype, torch._dynamo.utils.compile_times()) for s in summary: print(s) ``` CC: @janeyx99 @mlazos Pull Request resolved: #110613 Approved by: https://github.com/janeyx99

mlazos · 2023-11-06T22:37:13Z

@jon-chuang what's the status on this?

jon-chuang · 2023-11-07T02:34:04Z

@mlazos will be tryna get this PR and a few other optim ones in shape this week.

mlazos · 2023-11-07T04:15:00Z

@mlazos will be tryna get this PR and a few other optim ones in shape this week.

Cool lemme know if you need help, this will be great to add another optimizer to our collection of compiled optimizers.

janeyx99

Hey @jon-chuang sorry for the late review here. This PR is close--but could you add testing for the capturable component in https://github.com/pytorch/pytorch/blob/main/test/test_cuda.py#L3109 and in our OptimizerInfos that I introduced in common_optimizers.py? This review applies for adagrad as well :)

mlazos · 2024-01-19T10:13:38Z

closing in favor of #117835

Based off of #110345 Fixes #117812 Pull Request resolved: #117835 Approved by: https://github.com/janeyx99

jon-chuang added 4 commits September 30, 2023 18:33

fix

3327015

fix

d87af9a

replace complicated concat+max with torch.maximum

57c66ce

undo

fc01583

pytorch-bot bot added the release notes: optim label Oct 1, 2023

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Oct 1, 2023

jon-chuang added 2 commits September 30, 2023 21:57

improve

a479826

lint

9627b95

pytorchbot added the open source label Oct 1, 2023

jon-chuang marked this pull request as ready for review October 1, 2023 02:06

jon-chuang requested review from albanD and janeyx99 as code owners October 1, 2023 02:06

jon-chuang added 2 commits September 30, 2023 22:11

break line

8464067

fix

5a722ab

jon-chuang changed the title ~~feat(inductor): Improve Adamax implementation to be better fused by Inductor~~ feat(inductor): Improve Adamax to be better fused by Inductor and enable Oct 1, 2023

jon-chuang added 3 commits September 30, 2023 22:51

improve

ccbc261

fix differentiable

ba9e26a

format

3d000a5

jon-chuang changed the title ~~feat(inductor): Improve Adamax to be better fused by Inductor and enable~~ feat(inductor): Improve Adamax to be better fused by Inductor and enable it Oct 1, 2023

minor

b968c6d

jon-chuang commented Oct 1, 2023

View reviewed changes

jon-chuang changed the title ~~feat(inductor): Improve Adamax to be better fused by Inductor and enable it~~ feat(inductor): Improve Adamax to be better fused by Inductor and enable it Oct 1, 2023

jon-chuang added 3 commits October 1, 2023 11:22

fix inplace

5d82e9f

fix peak mem test for intermediates

a183425

Merge branch 'main' of https://github.com/pytorch/pytorch into jon-ch…

40a809b

…uang/add-adamax-to-inductor

mlazos reviewed Oct 2, 2023

View reviewed changes

torch/optim/adamax.py Outdated Show resolved Hide resolved

janeyx99 reviewed Oct 2, 2023

View reviewed changes

test/inductor/test_compiled_optimizers.py Show resolved Hide resolved

janeyx99 reviewed Oct 2, 2023

View reviewed changes

test/optim/test_optim.py Show resolved Hide resolved

janeyx99 reviewed Oct 2, 2023

View reviewed changes

torch/optim/adamax.py Outdated Show resolved Hide resolved

jon-chuang commented Oct 2, 2023

View reviewed changes

torch/optim/adamax.py Outdated Show resolved Hide resolved

This was referenced Oct 3, 2023

feat(optimizer): Adagrad will use device when capturable - True always when compiling with dynamo #110339

Closed

Explore Hybrid (CPU+GPU) Graphs in Scalar parameters #110448

Open

jon-chuang added 3 commits October 3, 2023 11:00

add capturable flag

b32eb67

remove peak mem

21e5163

stylistic

c807fbc

jon-chuang commented Oct 3, 2023

View reviewed changes

torch/optim/adamax.py Outdated Show resolved Hide resolved

jon-chuang commented Oct 3, 2023

View reviewed changes

test/inductor/test_compiled_optimizers.py Show resolved Hide resolved

jon-chuang added 2 commits October 3, 2023 14:41

add more adamax configs, fix poor foreach_abs supp

549132e

add meta reg and prim decomp for foreach_abs

936a85a

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 3, 2023

jon-chuang added 2 commits October 4, 2023 18:52

fix foreach_abs

417d8dd

Merge branch 'main' of https://github.com/pytorch/pytorch into jon-ch…

61c685b

…uang/add-adamax-to-inductor

jon-chuang mentioned this pull request Oct 5, 2023

perf(inductor): use for loop with shortcut in Optimizers to speedup against list comprehensions (e.g. complex conversion) #110613

Closed

janeyx99 reviewed Dec 15, 2023

View reviewed changes

This was referenced Jan 19, 2024

Support full graph compilation of foreach Adamax #117812

Closed

Add compilable and capturable foreach adamax with tests #117835

Closed

mlazos closed this Jan 19, 2024

pytorchmergebot pushed a commit that referenced this pull request Jan 20, 2024

Add compilable and capturable foreach adamax with tests (#117835)

aaae2d8

Based off of #110345 Fixes #117812 Pull Request resolved: #117835 Approved by: https://github.com/janeyx99

feat(inductor): Improve Adamax to be better fused by Inductor and enable it #110345

feat(inductor): Improve Adamax to be better fused by Inductor and enable it #110345

Uh oh!

Conversation

jon-chuang commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110345

✅ No Failures

Uh oh!

jon-chuang Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vadimkantorov Oct 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

janeyx99 Oct 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mlazos commented Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jon-chuang commented Nov 7, 2023

Uh oh!

mlazos commented Nov 7, 2023

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

mlazos commented Jan 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

feat(inductor): Improve `Adamax` to be better fused by Inductor and enable it #110345

feat(inductor): Improve `Adamax` to be better fused by Inductor and enable it #110345

jon-chuang commented Oct 1, 2023 •

edited

Loading

pytorch-bot bot commented Oct 1, 2023 •

edited

Loading

jon-chuang Oct 1, 2023 •

edited

Loading

vadimkantorov Oct 1, 2023 •

edited

Loading

mlazos commented Nov 6, 2023 •

edited

Loading