To add Rectified Adam Algorithm to Optimizers #58968

iramazanli · 2021-05-26T02:39:57Z

In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm.

It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process.

Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high.

Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well :

https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156

https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638

facebook-github-bot · 2021-05-26T02:40:03Z

💊 CI failures summary and remediations

As of commit d8ac387 (more details on the Dr. CI page and at hud.pytorch.org/pr/58968):

2/2 failures possibly* introduced in this PR
- 1/2 non-scanned failure(s)

1 failure not recognized by patterns:

Job	Step	Action
^{pytorch_linux_xenial_py3_clang5_asan_test2}	^{Run tests}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

torch/optim/_functional.py

vincentqb

LGTM!

vincentqb · 2021-06-18T21:20:37Z

discussed offline: we'll keep the name as RAdam instead of PlainRAdam. If we need another version, we could explore calling it something else (ModifiedRAdam/ApproximateRAdam for this one?) or have a toggle?

facebook-github-bot · 2021-06-18T21:24:15Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-22T17:38:45Z

This pull request has been merged in 0ff3634.

Summary: Fixes : #24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156 https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638 Pull Request resolved: #58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448

samestep · 2021-06-22T19:01:52Z

Reverting because this broke doc build:

facebook-github-bot · 2021-06-22T19:04:30Z

This pull request has been reverted by 57967dc498dee032dc189f9ab4fc264ab905581e.

facebook-github-bot · 2021-06-22T19:09:56Z

This pull request has been reverted by 1abf45e.

facebook-github-bot · 2021-06-22T20:08:30Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-22T22:18:31Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-22T22:25:41Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-23T21:43:45Z

@iramazanli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes : #24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156 https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638 Pull Request resolved: #58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9

Summary: Previously in the PR: #58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: #59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b

…rch#59161) Summary: Previously in the PR: pytorch#58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: pytorch#59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b

Summary: Previously in the PR: #58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch. Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. It has been one of the most used algorithm in Deep Learning community. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice. Pull Request resolved: #59161 Reviewed By: vincentqb Differential Revision: D29360576 Pulled By: iramazanli fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b

Summary: Fixes : pytorch/pytorch#24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156 https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638 Pull Request resolved: pytorch/pytorch#58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448

Summary: Fixes : pytorch/pytorch#24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : https://github.com/LiyuanLucasLiu/RAdam/blob/2f03dd197022da442c6a15c47321f4335d113a3f/radam/radam.py#L156 https://github.com/tensorflow/swift-apis/blob/f51ee4618d652a2419e998bf9418ad80bda67454/Sources/TensorFlow/Optimizers/MomentumBased.swift#L638 Pull Request resolved: pytorch/pytorch#58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9

facebook-github-bot added the cla signed label May 26, 2021

iramazanli force-pushed the adding_radam branch 15 times, most recently from a648b1c to 2714cb0 Compare May 27, 2021 19:33

iramazanli requested a review from vincentqb May 27, 2021 19:34

iramazanli force-pushed the adding_radam branch from 2714cb0 to fa9595d Compare May 30, 2021 19:38

iramazanli changed the title ~~To add Rectified Adam Algorithm to Optim package~~ To add Rectified Adam Algorithm to Optimizers Jun 1, 2021

iramazanli mentioned this pull request Jun 1, 2021

To add Rectified Adam algorithm for multi-tensor optimizers API #59161

Closed

vincentqb reviewed Jun 15, 2021

View reviewed changes

torch/optim/_functional.py Outdated Show resolved Hide resolved

torch/optim/_functional.py Outdated Show resolved Hide resolved

torch/optim/_functional.py Outdated Show resolved Hide resolved

iramazanli force-pushed the adding_radam branch 5 times, most recently from c2d19b3 to ee985fc Compare June 18, 2021 20:50

vincentqb approved these changes Jun 18, 2021

View reviewed changes

facebook-github-bot closed this in facebookresearch/ReAgent@0ff3634 Jun 22, 2021

facebook-github-bot added the Merged label Jun 22, 2021

facebook-github-bot added the Reverted label Jun 22, 2021

iramazanli reopened this Jun 22, 2021

iramazanli force-pushed the adding_radam branch from fb365a2 to a4fd2ab Compare June 22, 2021 19:43

iramazanli force-pushed the adding_radam branch from a4fd2ab to fbd7638 Compare June 22, 2021 22:15

iramazanli force-pushed the adding_radam branch from fbd7638 to 27c0e23 Compare June 22, 2021 22:25

iramazanli force-pushed the adding_radam branch 5 times, most recently from d35dadf to a5e1c12 Compare June 23, 2021 21:40

To add Rectified Adam Algorithm to Optim package

d8ac387

iramazanli force-pushed the adding_radam branch from a5e1c12 to d8ac387 Compare June 23, 2021 21:42

facebook-github-bot closed this in facebookresearch/ReAgent@d5394c5 Jun 24, 2021

To add Rectified Adam Algorithm to Optimizers #58968

To add Rectified Adam Algorithm to Optimizers #58968

Uh oh!

Conversation

iramazanli commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

1 failure not recognized by patterns:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentqb left a comment

Choose a reason for hiding this comment

Uh oh!

vincentqb commented Jun 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 18, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

samestep commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

iramazanli commented May 26, 2021 •

edited

Loading

facebook-github-bot commented May 26, 2021 •

edited

Loading

vincentqb commented Jun 18, 2021 •

edited

Loading