Skip to content

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 14, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94791

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 9 Unrelated Failures

As of commit 2e83e7a with merge base cf1b494 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Feb 14, 2023
@crcrpar crcrpar force-pushed the fused_sgd branch 2 times, most recently from 38cd2bd to b3945f1 Compare March 15, 2023 07:41
@crcrpar crcrpar changed the title [mta] Implement SGD(..., fused) [mta] Implement fused SGD Apr 23, 2023
@crcrpar crcrpar marked this pull request as ready for review April 23, 2023 23:44
@albanD albanD removed their request for review April 24, 2023 19:29
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 25, 2023
@janeyx99 janeyx99 self-assigned this May 8, 2023
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jul 18, 2023
@janeyx99 janeyx99 removed the Stale label Jul 20, 2023
(optim.SGD,),
[
{"lr": 0.1, "momentum": 0.0, "dampening": d, "weight_decay": w, "nesterov": n}
for d, w, n in itertools.product((0.0, 0.5), (0.0, 0.5), (False,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit but it seems like this entry and the next entry in the list could be combined by adding another tuple (0.0, 0.5) for momentum.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am NOT done reviewing this! Will continue reviewing later

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
- func: _propagate_xla_data(Tensor input, Tensor output) -> ()
variants: function

- func: _fused_sgd_(Tensor(a!)[] self, Tensor(b!)[] grads, *, float weight_decay, float momentum, float lr, float dampening, bool nesterov, bool maximize, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to fused Adam and AdamW, we should allow Tensor lr as well.

expected_scales,
expected_growth_trackers,
expected_grad_vals):
for data, scale, growth_tracker, grad_val in zip(input_vals, expected_scales, expected_growth_trackers, expected_grad_vals):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this purely stylistic?

# because JIT can't handle Optionals nor fancy conditionals when scripting
if not torch.jit.is_scripting():
_, foreach = _default_to_fused_or_foreach(params, differentiable=False, use_fused=False)
fused, foreach = _default_to_fused_or_foreach(params, differentiable=False, use_fused=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here similar to in adam(w):
Note that we default to foreach and pass False to use_fused. This is not a mistake--we want to give the fused implementation bake-in time before making it the default, even if it is typically faster.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks p good. The highlevel feedback I have in addition to comments:

  • We now support tensor lr for fused adam/w. We should strive to do the same with sgd. We don't need to land this as a part of the PR though, if you think it will make this PR too large, but we should then be explicit that the fused implementation does not accept tensor LRs. Then in a followup PR, we can add the tensor LR overloads.
  • Could you get some benchmarks showing the wins?

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
CUDA: _fused_sgd_with_momentum_kernel_cuda_
autogen: _fused_sgd_with_momentum, _fused_sgd_with_momentum.out

- func: _fused_sgd_with_momentum_.tensor_lr(Tensor(a!)[] self, Tensor(b!)[] grads, Tensor(c!)[] momentum_buffer_list, *, float weight_decay, float momentum, Tensor lr, float dampening, bool nesterov, bool maximize, bool is_first_step, Tensor? grad_scale=None, Tensor? found_inf=None) -> ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be an overload instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or like, an optional value?

for d, w, n in product((0.0, 0.5), (0.0, 0.5), (False,))
] + [
{"lr": 0.1, "momentum": 0.5, "dampening": d, "weight_decay": w, "nesterov": n, "fused": True}
for d, w, n in product((0.0,), (0.0, 0.5), (True,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False too?

for lr, d, w, n in itertools.product((0.1, torch.tensor(0.1)), (0.0, 0.5), (0.0, 0.5), (False,))
] + [
{"lr": lr, "momentum": 0.5, "dampening": d, "weight_decay": w, "nesterov": n}
for lr, d, w, n in itertools.product((0.1, torch.tensor(0.1)), (0.0,), (0.0, 0.5), (True,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False too

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>
@github-actions github-actions bot closed this Nov 13, 2023
@janeyx99 janeyx99 reopened this Nov 13, 2023
@github-actions github-actions bot closed this Dec 28, 2023
@crcrpar crcrpar deleted the fused_sgd branch January 1, 2024 14:57
@crcrpar crcrpar mentioned this pull request Jan 1, 2024
pytorchmergebot pushed a commit that referenced this pull request Jan 16, 2024
depends on #116583

rel:
- #94791

Pull Request resolved: #116585
Approved by: https://github.com/janeyx99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request module: amp (automated mixed precision) autocast module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor module: mkldnn Related to Intel IDEEP or oneDNN (a.k.a. mkldnn) integration open source release notes: nn release notes category release notes: quantization release notes category Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants