Skip to content

Optim foreach cleanup for NAdam #70229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

@pytorch-probot
Copy link

pytorch-probot bot commented Dec 21, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/d97398b6addafe10494cc89294e05e8ed807009b/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-binary-conda ciflow/binaries, ciflow/binaries/conda 🚫 skipped
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-manywheel ciflow/binaries, ciflow/binaries/wheel 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 21, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 69ca308 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mus = [beta1 * (1. - 0.5 * (0.96 ** (step * momentum_decay))) for step in state_steps]
mu_nexts = [beta1 * (1. - 0.5 * (0.96 ** ((step + 1) * momentum_decay)))
for step in state_steps]
mu_products = [mu * mu_product for mu, mu_product in zip(mus, mu_products)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multitensor Nadam class does an update on mu_product before the call to nadam and then does computations on mu_products directly whereas the singletensor class does an update on mu_product after the call to nadam and computes the new mu_products within the nadam function. In this combined class I preserved the singletensor behavior so I added line 245 here to ensure the same behavior

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done with a foreach op no?
Also the existing code is modifying mu_products inplace, don't we want to preserve that/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done with a foreach op no?
Also the existing code is modifying mu_products inplace, don't we want to preserve that?

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I didn't do this in place because each mu_product is a float rather than a tensor so even if it's done in-place i don't think it updates the underlying state['mu_product'], but there is an update step within the NAdam.step function on line 146 which updates state['mu_product'], which is preserved from the single tensor version, am I thinking about this correctly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you computing the same thing twice here then?
I am a bit confused now where this value is updated for each case.

In general, I think we want to stay as close to the original code as possible. Even if we have to fold some of the state update in the functional function.

Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Jan 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree that the same thing is being computed twice, but that was what was being done initially in torch/optim/nadam.py. The single tensor version does the update of mu_product after calling F.nadam (and this same computation is done in the function on lines 224 and 226) whereas the multitensor one does it before calling F.nadam and so the logic isn't repeated in the function! So I think we need to change the code of either one of the functional forms (for example if I preserve the initial multitensor version and get rid of this change to _multi_tensor_nadam I think I would have to remove line 226 of _single_tensor_nadam (mu_product = mu_product * mu), does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two options I see are:

  • We want to keep the exact same behavior as before and so we fold the second update for the single Tensor impl into the functional. And fold the multitensor computation into its functional version.
  • We want to remove this duplicate code and so modify one of the two to match the other. In such case, I think we should keep the version that does not do the computation twice as it is better! Also the functional version should perform the full step, so I would argue it is a bug that part of it is done in the optimizer outside of the functional call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm which option do you think I should go ahead with? It seems that ASGD and SGD have the same bug where part of the state is updated outside the functional form in single tensor ASGD and SGD as well. I think they did this (for ASGD and NAdam) because arguments were passed as floats rather than singleton tensors so they couldn't update state within the function.

@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review December 22, 2021 08:02
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner!

@mikaylagawarecki
Copy link
Contributor Author

@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mikaylagawarecki
Copy link
Contributor Author

@mikaylagawarecki has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Feb 9, 2022
Summary: Pull Request resolved: #70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
@facebook-github-bot facebook-github-bot deleted the gh/mikaylagawarecki/20/head branch February 13, 2022 15:16
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 15, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 15, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 15, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 16, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 16, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 17, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 20, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 20, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 20, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 21, 2022
Summary: Pull Request resolved: pytorch/pytorch#70229

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D33767873

Pulled By: mikaylagawarecki

fbshipit-source-id: 833ead14c1d1659351ebfbeb41045a3c7eb96dad
(cherry picked from commit 9415df6b5c9620c9d53036c28fe3f297c6d4906c)
@mikaylagawarecki mikaylagawarecki added release notes: nn release notes category topic: new features topic category labels May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed release notes: nn release notes category topic: new features topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants