Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optim] include nn.Parameter as foreach supported #95811

Closed
wants to merge 1 commit into from

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Mar 1, 2023

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Stack from ghstack (oldest at bottom):

@janeyx99 janeyx99 requested a review from albanD as a code owner March 1, 2023 19:49
@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Mar 1, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 1, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95811

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3add0bd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99 added a commit that referenced this pull request Mar 1, 2023
ghstack-source-id: f92e1ac06f2d1c151ebefa45e6a5b444afcf7c30
Pull Request resolved: #95811
@janeyx99 janeyx99 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Mar 1, 2023
@@ -15,6 +15,7 @@
__all__ = ['Optimizer', 'register_optimizer_step_pre_hook', 'register_optimizer_step_post_hook']
_global_optimizer_pre_hooks: Dict[int, Callable] = OrderedDict()
_global_optimizer_post_hooks: Dict[int, Callable] = OrderedDict()
_foreach_supported_types = [torch.Tensor, torch.nn.parameter.Parameter]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make it a tuple instead so it's not mutated accidentally mutated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intentional! This PR was a realization that the defaulting was way too conservative and did not allow models to be included for faster foreach. We want to allow users to be able to add their own subclasses here if they know that foreach will not break on those tensor subclasses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 What is the recommended way to add to this list? Should we directly append like the following in trainer code:

torch.optim.optimizer._foreach_supported_types.append(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janeyx99 Have the same qq @awgu raised. Do you know how users are adding their own subclasses to the list?
Due to circular dependency, instead of adding DTensor here, I am just trying out what Andrew suggested first. Seems appending tensor subclass to the list has no effect at later stage so I am wondering how users are doing this if you are aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be the first user to try this--what Andrew suggests seems okay for now barring the fact that it's a private API. Once we want to productionize this, we should work together to make this an exposed public API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like appending is not working?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like appending is not working?

My bad! False information. Just double check append seems working fine on the unit test locally!

janeyx99 added a commit to janeyx99/pytorch that referenced this pull request Mar 2, 2023
This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: pytorch#95811
Approved by: https://github.com/albanD
atalman pushed a commit that referenced this pull request Mar 2, 2023
* [optim] include nn.Parameter as foreach supported (#95811)

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: #95811
Approved by: https://github.com/albanD

* [optim] Widen the cases for defaulting to foreach (#95820)

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: #95820
Approved by: https://github.com/albanD
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: pytorch/pytorch#95811
Approved by: https://github.com/albanD
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 5, 2023
This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: pytorch/pytorch#95811
Approved by: https://github.com/albanD
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 3, 2023
* [optim] include nn.Parameter as foreach supported (pytorch#95811)

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: pytorch#95811
Approved by: https://github.com/albanD

* [optim] Widen the cases for defaulting to foreach (pytorch#95820)

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: pytorch#95820
Approved by: https://github.com/albanD
@facebook-github-bot facebook-github-bot deleted the gh/janeyx99/32/head branch June 8, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nn release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants