[optim] include nn.Parameter as foreach supported #95811

janeyx99 · 2023-03-01T19:49:56Z

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-03-01T19:50:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95811

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3add0bd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f92e1ac06f2d1c151ebefa45e6a5b444afcf7c30 Pull Request resolved: #95811

Skylion007 · 2023-03-01T19:59:29Z

torch/optim/optimizer.py

@@ -15,6 +15,7 @@
 __all__ = ['Optimizer', 'register_optimizer_step_pre_hook', 'register_optimizer_step_post_hook']
 _global_optimizer_pre_hooks: Dict[int, Callable] = OrderedDict()
 _global_optimizer_post_hooks: Dict[int, Callable] = OrderedDict()
+_foreach_supported_types = [torch.Tensor, torch.nn.parameter.Parameter]


nit: make it a tuple instead so it's not mutated accidentally mutated?

This is actually intentional! This PR was a realization that the defaulting was way too conservative and did not allow models to be included for faster foreach. We want to allow users to be able to add their own subclasses here if they know that foreach will not break on those tensor subclasses.

@janeyx99 What is the recommended way to add to this list? Should we directly append like the following in trainer code:

torch.optim.optimizer._foreach_supported_types.append(...)

@janeyx99 Have the same qq @awgu raised. Do you know how users are adding their own subclasses to the list?
Due to circular dependency, instead of adding DTensor here, I am just trying out what Andrew suggested first. Seems appending tensor subclass to the list has no effect at later stage so I am wondering how users are doing this if you are aware.

You may be the first user to try this--what Andrew suggests seems okay for now barring the fact that it's a private API. Once we want to productionize this, we should work together to make this an exposed public API.

It sounds like appending is not working?

It sounds like appending is not working?

My bad! False information. Just double check append seems working fine on the unit test locally!

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: pytorch#95811 Approved by: https://github.com/albanD

* [optim] include nn.Parameter as foreach supported (#95811) This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: #95811 Approved by: https://github.com/albanD * [optim] Widen the cases for defaulting to foreach (#95820) Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage. Pull Request resolved: #95820 Approved by: https://github.com/albanD

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: pytorch/pytorch#95811 Approved by: https://github.com/albanD

* [optim] include nn.Parameter as foreach supported (pytorch#95811) This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS. Pull Request resolved: pytorch#95811 Approved by: https://github.com/albanD * [optim] Widen the cases for defaulting to foreach (pytorch#95820) Big OOP correction continued. Also added a test this time to verify the defaulting was as expected. The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage. Pull Request resolved: pytorch#95820 Approved by: https://github.com/albanD

[optim] include nn.Parameter as foreach supported

3add0bd

[ghstack-poisoned]

janeyx99 requested a review from albanD as a code owner March 1, 2023 19:49

pytorch-bot bot added the release notes: nn release notes category label Mar 1, 2023

janeyx99 added a commit that referenced this pull request Mar 1, 2023

[optim] include nn.Parameter as foreach supported

d570f5b

ghstack-source-id: f92e1ac06f2d1c151ebefa45e6a5b444afcf7c30 Pull Request resolved: #95811

janeyx99 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Mar 1, 2023

Skylion007 reviewed Mar 1, 2023

View reviewed changes

This was referenced Mar 1, 2023

[optim] Widen the cases for defaulting to foreach #95818

Closed

[optim] Widen the cases for defaulting to foreach #95820

Closed

albanD approved these changes Mar 1, 2023

View reviewed changes

pytorchmergebot added the Merged label Mar 2, 2023

pytorchmergebot closed this in 2bcf863 Mar 2, 2023

This was referenced Mar 2, 2023

[optim] _actually_ default to foreach #95862

Merged

[v.2.0.0] Release Tracker #94937

Closed

facebook-github-bot deleted the gh/janeyx99/32/head branch June 8, 2023 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[optim] include nn.Parameter as foreach supported #95811

[optim] include nn.Parameter as foreach supported #95811

janeyx99 commented Mar 1, 2023 •

edited

pytorch-bot bot commented Mar 1, 2023 •

edited

Skylion007 Mar 1, 2023

janeyx99 Mar 1, 2023

awgu Apr 5, 2023

wz337 Apr 4, 2024

janeyx99 Apr 4, 2024

awgu Apr 4, 2024

wz337 Apr 4, 2024

[optim] include nn.Parameter as foreach supported #95811

[optim] include nn.Parameter as foreach supported #95811

Conversation

janeyx99 commented Mar 1, 2023 • edited

pytorch-bot bot commented Mar 1, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95811

✅ No Failures

Skylion007 Mar 1, 2023

Choose a reason for hiding this comment

janeyx99 Mar 1, 2023

Choose a reason for hiding this comment

awgu Apr 5, 2023

Choose a reason for hiding this comment

wz337 Apr 4, 2024

Choose a reason for hiding this comment

janeyx99 Apr 4, 2024

Choose a reason for hiding this comment

awgu Apr 4, 2024

Choose a reason for hiding this comment

wz337 Apr 4, 2024

Choose a reason for hiding this comment

janeyx99 commented Mar 1, 2023 •

edited

pytorch-bot bot commented Mar 1, 2023 •

edited