Add Lookahead+RAdam optimizer #416

kengz · 2019-09-16T05:51:10Z

Experiment Result

Abstract

Lookahead + RAdam optimizer significantly improves the performance of some RL algorithms (A2C (n-step), PPO) on continuous domain problems, but does not improve (A2C (GAE), SAC).

Methods

Implement RAdam and Lookahead optimizers
Update the optim_spec to replace Adam with Lookahead(RAdam) optimizer
Run the benchmark

Implementations inspired by/adapted from LiyuanLucasLiu/RAdam, lonePatient/lookahead_pytorch, and Less Wright's Medium article.

To Reproduce

Use this current commit e5988f0 to run the spec files.

Results

All the results contributed will be added to the benchmark, and made publicly available on Dropbox.

We run benchmark to directly compare the performance differences between Adam and Lookahead(RAdam) using the same code and spec files, and only changing the optimizers (see the git diff of this PR). Due to limited computational resources, we focus the study on continuous environments from Roboschool.

We find that:

A2C (n-step), PPO gain significant improvements overall in both the standard Roboschool and the harder Humanoid environments.
A2C (n-step), previously failing completely on the harder Humanoid environments, is now able to learn
A2C (GAE) results are mixed, with some improvements and some degradation
SAC are not improved in all the environments, and we exclude the results from below. Instead, we provide a rerun/old benchmark result using Adam for comparison and benchmark update.

The results are tabulated below. In sum, the new results are run with:

A2C (GAE), A2C (n-step), PPO using Lookahead + RAdam optimizer
SAC using its Adam optimizer

New Roboschool benchmark

Legend:

Env. \ Alg.	A2C (GAE)	A2C (n-step)	PPO	SAC
RoboschoolAnt graph	787	1396	1843	2915
RoboschoolAtlasForwardWalk graph	59.87	88.04	172	800
RoboschoolHalfCheetah graph	712	439	1960	2497
RoboschoolHopper graph	710	285	2042	2045
RoboschoolInvertedDoublePendulum graph	996	4410	8076	8085
RoboschoolInvertedPendulum graph	995	978	986	941
RoboschoolReacher graph	12.9	10.16	19.51	19.99
RoboschoolWalker2d graph	280	220	1660	1894

Old Roboschool benchmark

Env. \ Alg.	A2C (GAE)	A2C (n-step)	PPO	SAC
RoboschoolAnt graph	1029.51	1148.76	1931.35	2914.75
RoboschoolAtlasForwardWalk graph	68.15	73.46	148.81	942.39
RoboschoolHalfCheetah graph	895.24	409.59	1838.69	2496.54
RoboschoolHopper graph	286.67	-187.91	2079.22	2251.36
RoboschoolInvertedDoublePendulum graph	1769.74	486.76	7967.03	8085.04
RoboschoolInvertedPendulum graph	1000.0	997.54	930.29	941.45
RoboschoolReacher graph	14.57	-6.18	19.18	19.99
RoboschoolWalker2d graph	413.26	141.83	1368.25	1894.05

New Humanoid benchmark

Humanoid environments are significantly harder. Note that due to the number of frames required, we could only run Async-SAC.

Env. \ Alg.	A2C (GAE)	A2C (n-step)	PPO	Async-SAC
RoboschoolHumanoid graph	99.31	54.58	2388	2621
RoboschoolHumanoidFlagrun graph	73.57	178	2014	2056
RoboschoolHumanoidFlagrunHarder graph	-429	253	680	280

Old Humanoid benchmark

Env. \ Alg.	A2C (GAE)	A2C (n-step)	PPO	Async-SAC
RoboschoolHumanoid	122.23 graph	-6029.02 graph	1554.03 graph	2621.46 graph
RoboschoolHumanoidFlagrun	93.48 graph	-2079.02 graph	1635.64 graph	1937.77 graph
RoboschoolHumanoidFlagrunHarder	-472.34 graph	-24620.71 graph	610.09 graph	280.18 graph

slm_lab/lib/optimizer.py

This reverts commit d1d9e59.

kengz added 15 commits September 5, 2019 10:05

implement lookahead optimizer

42742c3

try LookAhead on ppo roboschool

fc08261

use lookahead in a2c

a8a0a4a

replace all roboschool benchmark to use Lookahead

9d76af7

remove invalid swingup env

54de6b9

make RAdam and Lookahead global

6e9a8fd

share optimizer based on method availability

a417124

set properties in RAdam and Lookahead for a3c share_memory

3f61399

update RAdam from author, fix adap_lr as scalar

a45373a

use defaults to track lookahead

add6a17

commit working a3c gae cartpole spec

37570d9

handle inner optimizer setting for global nets

8fb302b

lower lr for sac lookahead roboschool

108c24b

Merge remote-tracking branch 'origin/master' into lookahead

d64f24e

revert SAC specs back to old

0f1b5b3

codeclimate bot reviewed Sep 16, 2019

View reviewed changes

slm_lab/lib/optimizer.py Show resolved Hide resolved

slm_lab/lib/optimizer.py Show resolved Hide resolved

kengz added 4 commits September 22, 2019 11:02

tune sac harder

d1d9e59

Revert "tune sac harder"

63bb516

This reverts commit d1d9e59.

update benchmark tables

e5988f0

add legend

3be6fd6

kengz merged commit bc4c61c into master Sep 23, 2019

kengz deleted the lookahead branch September 23, 2019 01:40

kengz changed the title ~~Add Lookahead optimizer~~ Add Lookahead+RAdam optimizer Sep 23, 2019

kengz added the result experiment result upload label Sep 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Lookahead+RAdam optimizer #416

Add Lookahead+RAdam optimizer #416

kengz commented Sep 16, 2019 •

edited

Loading

Add Lookahead+RAdam optimizer #416

Add Lookahead+RAdam optimizer #416

Conversation

kengz commented Sep 16, 2019 • edited Loading

Experiment Result

Abstract

Methods

To Reproduce

Results

New Roboschool benchmark

Old Roboschool benchmark

New Humanoid benchmark

Old Humanoid benchmark

kengz commented Sep 16, 2019 •

edited

Loading