[mta] Implement fused SGD #94791

crcrpar · 2023-02-14T02:32:51Z

rel:

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @mcarilli @ptrblck @leslie-fang-intel @voznesenskym @penguinwu @EikanWang @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @avikchaudhuri @gmagogsfm

pytorch-bot · 2023-02-14T02:32:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94791

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 9 Unrelated Failures

As of commit 2e83e7a with merge base cf1b494 ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2023-07-18T20:33:41Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

janeyx99 · 2023-07-20T21:37:35Z

test/optim/test_optim.py

+            (optim.SGD,),
+            [
+                {"lr": 0.1, "momentum": 0.0, "dampening": d, "weight_decay": w, "nesterov": n}
+                for d, w, n in itertools.product((0.0, 0.5), (0.0, 0.5), (False,))


Minor nit but it seems like this entry and the next entry in the list could be combined by adding another tuple (0.0, 0.5) for momentum.

janeyx99

I am NOT done reviewing this! Will continue reviewing later

test/test_cuda.py

torch/optim/sgd.py

github-actions · 2023-09-18T22:33:36Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

janeyx99 · 2023-10-06T18:01:06Z

aten/src/ATen/native/native_functions.yaml

 - func: _propagate_xla_data(Tensor input, Tensor output) -> ()
  variants: function
+
+- func: _fused_sgd_(Tensor(a!)[] self, Tensor(b!)[] grads, *, float weight_decay, float momentum, float lr, float dampening, bool nesterov, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()


Similar to fused Adam and AdamW, we should allow Tensor lr as well.

janeyx99 · 2023-10-06T18:08:21Z

test/test_cuda.py

-                                                         expected_scales,
-                                                         expected_growth_trackers,
-                                                         expected_grad_vals):
+        for data, scale, growth_tracker, grad_val in zip(input_vals, expected_scales, expected_growth_trackers, expected_grad_vals):


is this purely stylistic?

test/test_cuda.py

janeyx99 · 2023-10-06T18:17:33Z

torch/optim/sgd.py

        # because JIT can't handle Optionals nor fancy conditionals when scripting
        if not torch.jit.is_scripting():
-            _, foreach = _default_to_fused_or_foreach(params, differentiable=False, use_fused=False)
+            fused, foreach = _default_to_fused_or_foreach(params, differentiable=False, use_fused=False)


Could you add a comment here similar to in adam(w):
Note that we default to foreach and pass False to use_fused. This is not a mistake--we want to give the fused implementation bake-in time before making it the default, even if it is typically faster.

torch/optim/sgd.py

janeyx99

Overall looks p good. The highlevel feedback I have in addition to comments:

We now support tensor lr for fused adam/w. We should strive to do the same with sgd. We don't need to land this as a part of the PR though, if you think it will make this PR too large, but we should then be explicit that the fused implementation does not accept tensor LRs. Then in a followup PR, we can add the tensor LR overloads.
Could you get some benchmarks showing the wins?

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

janeyx99 · 2023-10-13T20:59:54Z

aten/src/ATen/native/native_functions.yaml

+    CUDA: _fused_sgd_with_momentum_kernel_cuda_
+  autogen: _fused_sgd_with_momentum, _fused_sgd_with_momentum.out
+
+- func: _fused_sgd_with_momentum_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] momentum_buffer_list, *, float weight_decay, float momentum, Tensor lr, float dampening, bool nesterov, bool maximize, bool is_first_step, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()


can this be an overload instead?

or like, an optional value?

janeyx99 · 2023-10-13T21:01:54Z

test/test_cuda.py

+                for d, w, n in product((0.0, 0.5), (0.0, 0.5), (False,))
+            ] + [
+                {"lr": 0.1, "momentum": 0.5, "dampening": d, "weight_decay": w, "nesterov": n, "fused": True}
+                for d, w, n in product((0.0,), (0.0, 0.5), (True,))


janeyx99 · 2023-10-13T21:02:04Z

test/optim/test_optim.py

+                for lr, d, w, n in itertools.product((0.1, torch.tensor(0.1)), (0.0, 0.5), (0.0, 0.5), (False,))
+            ] + [
+                {"lr": lr, "momentum": 0.5, "dampening": d, "weight_decay": w, "nesterov": n}
+                for lr, d, w, n in itertools.product((0.1, torch.tensor(0.1)), (0.0,), (0.0, 0.5), (True,))


Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

depends on #116583 rel: - #94791 Pull Request resolved: #116585 Approved by: https://github.com/janeyx99

pytorch-bot bot added the release notes: nn release notes category label Feb 14, 2023

pytorchbot added the open source label Feb 14, 2023

crcrpar force-pushed the fused_sgd branch 2 times, most recently from 38cd2bd to b3945f1 Compare March 15, 2023 07:41

crcrpar force-pushed the fused_sgd branch from b3945f1 to f4e0650 Compare April 23, 2023 18:47

crcrpar changed the title ~~[mta] Implement SGD(..., fused)~~ [mta] Implement fused SGD Apr 23, 2023

crcrpar marked this pull request as ready for review April 23, 2023 23:44

crcrpar requested review from H-Huang, albanD, awgu, d4l3k, fegin, janeyx99, kiukchung, kwen2501, mrshenli, rohan-varma, wanchaol and zhaojuanmao as code owners April 23, 2023 23:45

albanD removed their request for review April 24, 2023 19:29

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 25, 2023

janeyx99 self-assigned this May 8, 2023

crcrpar force-pushed the fused_sgd branch from 71fa6b8 to 9cd6985 Compare May 19, 2023 16:44

github-actions bot added the Stale label Jul 18, 2023

janeyx99 removed the Stale label Jul 20, 2023

janeyx99 reviewed Jul 20, 2023

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

torch/optim/sgd.py Outdated Show resolved Hide resolved

torch/optim/sgd.py Outdated Show resolved Hide resolved

crcrpar added 2 commits October 5, 2023 08:32

misc

8cef33a

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

fix signature

8fe0ca9

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the fused_sgd branch from 8cab376 to 8fe0ca9 Compare October 5, 2023 15:48

crcrpar requested a review from janeyx99 October 6, 2023 16:45

janeyx99 reviewed Oct 6, 2023

View reviewed changes

test/test_cuda.py Outdated Show resolved Hide resolved

janeyx99 reviewed Oct 6, 2023

View reviewed changes

torch/optim/sgd.py Outdated Show resolved Hide resolved

janeyx99 reviewed Oct 6, 2023

View reviewed changes

crcrpar added 6 commits October 6, 2023 17:35

nesterov

e5fc619

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

comment

5b45da0

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

clang-format fused sgd

e51f3e6

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

format fused sgd

d4cff6c

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

add tensor_lr overload

30cd3b7

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

test with tensor lr

07dba50

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

janeyx99 reviewed Oct 13, 2023

View reviewed changes

crcrpar added 4 commits October 13, 2023 18:27

test: better coverage

228e005

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

merge _fused_sgd_with_momentum_ into _fused_sgd_

452895c

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

remove Fused* for now

ced6311

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

update decomp list

2e83e7a

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

github-actions bot closed this Nov 13, 2023

janeyx99 reopened this Nov 13, 2023

ydwu4 removed the module: export label Nov 28, 2023

github-actions bot closed this Dec 28, 2023

crcrpar deleted the fused_sgd branch January 1, 2024 14:57

crcrpar mentioned this pull request Jan 1, 2024

[mta] Fused SGD #116585

Closed

pytorchmergebot pushed a commit that referenced this pull request Jan 16, 2024

[mta] Fused SGD (#116585)

1d14adf

depends on #116583 rel: - #94791 Pull Request resolved: #116585 Approved by: https://github.com/janeyx99

[mta] Implement fused SGD #94791

[mta] Implement fused SGD #94791

Uh oh!

Conversation

crcrpar commented Feb 14, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94791

❌ 5 New Failures, 9 Unrelated Failures

Uh oh!

github-actions bot commented Jul 18, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 18, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

crcrpar commented Feb 14, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 14, 2023 •

edited

Loading