Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lookahead+RAdam optimizer #416

Merged
merged 19 commits into from
Sep 23, 2019
Merged

Add Lookahead+RAdam optimizer #416

merged 19 commits into from
Sep 23, 2019

Conversation

kengz
Copy link
Owner

@kengz kengz commented Sep 16, 2019

Experiment Result

Abstract

Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC).

Methods

  1. Implement RAdam and Lookahead optimizers
  2. Update the optim_spec to replace Adam with Lookahead(RAdam) optimizer
  3. Run the benchmark

Implementations inspired by/adapted from LiyuanLucasLiu/RAdam, lonePatient/lookahead_pytorch, and Less Wright's Medium article.

To Reproduce

Use this current commit e5988f0 to run the spec files.

Results

All the results contributed will be added to the benchmark, and made publicly available on Dropbox.

We run benchmark to directly compare the performance differences between Adam and Lookahead(RAdam) using the same code and spec files, and only changing the optimizers (see the git diff of this PR). Due to limited computational resources, we focus the study on continuous environments from Roboschool.

We find that:

  • A2C (n-step), PPO gain significant improvements overall in both the standard Roboschool and the harder Humanoid environments.
  • A2C (n-step), previously failing completely on the harder Humanoid environments, is now able to learn
  • A2C (GAE) results are mixed, with some improvements and some degradation
  • SAC are not improved in all the environments, and we exclude the results from below. Instead, we provide a rerun/old benchmark result using Adam for comparison and benchmark update.

The results are tabulated below. In sum, the new results are run with:

  • A2C (GAE), A2C (n-step), PPO using Lookahead + RAdam optimizer
  • SAC using its Adam optimizer

New Roboschool benchmark

Legend:
legend

Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt
graph
787 1396 1843 2915
RoboschoolAtlasForwardWalk
graph
59.87 88.04 172 800
RoboschoolHalfCheetah
graph
712 439 1960 2497
RoboschoolHopper
graph
710 285 2042 2045
RoboschoolInvertedDoublePendulum
graph
996 4410 8076 8085
RoboschoolInvertedPendulum
graph
995 978 986 941
RoboschoolReacher
graph
12.9 10.16 19.51 19.99
RoboschoolWalker2d
graph
280 220 1660 1894

Old Roboschool benchmark

Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt
graph
1029.51 1148.76 1931.35 2914.75
RoboschoolAtlasForwardWalk
graph
68.15 73.46 148.81 942.39
RoboschoolHalfCheetah
graph
895.24 409.59 1838.69 2496.54
RoboschoolHopper
graph
286.67 -187.91 2079.22 2251.36
RoboschoolInvertedDoublePendulum
graph
1769.74 486.76 7967.03 8085.04
RoboschoolInvertedPendulum
graph
1000.0 997.54 930.29 941.45
RoboschoolReacher
graph
14.57 -6.18 19.18 19.99
RoboschoolWalker2d
graph
413.26 141.83 1368.25 1894.05

New Humanoid benchmark

Humanoid environments are significantly harder. Note that due to the number of frames required, we could only run Async-SAC.

Env. \ Alg. A2C (GAE) A2C (n-step) PPO Async-SAC
RoboschoolHumanoid
graph
99.31 54.58 2388 2621
RoboschoolHumanoidFlagrun
graph
73.57 178 2014 2056
RoboschoolHumanoidFlagrunHarder
graph
-429 253 680 280

Old Humanoid benchmark

Env. \ Alg. A2C (GAE) A2C (n-step) PPO Async-SAC
RoboschoolHumanoid 122.23
graph
-6029.02
graph
1554.03
graph
2621.46
graph
RoboschoolHumanoidFlagrun 93.48
graph
-2079.02
graph
1635.64
graph
1937.77
graph
RoboschoolHumanoidFlagrunHarder -472.34
graph
-24620.71
graph
610.09
graph
280.18
graph

slm_lab/lib/optimizer.py Show resolved Hide resolved
slm_lab/lib/optimizer.py Show resolved Hide resolved
@kengz kengz merged commit bc4c61c into master Sep 23, 2019
@kengz kengz deleted the lookahead branch September 23, 2019 01:40
@kengz kengz changed the title Add Lookahead optimizer Add Lookahead+RAdam optimizer Sep 23, 2019
@kengz kengz added the result experiment result upload label Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
result experiment result upload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant