A2C cartpole benchmark #180

lgraesser · 2018-09-23T06:52:43Z

Experiment Result

See PR #180 for an example

The data files to upload below are all created automatically in the data/ folder.

Github does not support .json and .csv upload, but .txt. Please rename those files .txt before uploading.

Abstract

Benchmark experiment for A2C on the Cartpole environment

Methods

generalized advantage estimation for the advantage
random search to sweep hyper-parameters

To Reproduce

JSON spec: a2c_gae_mlp_separate_cartpole_spec.txt
git SHA (contained in the file above): 7254254

Results

All the results contributed will be added to the benchmark, and made publicly available on Dropbox.

1. full experiment data zip: (please find our contact in README and request a "Dropbox file request" to upload it to the public benchmark folder.)
2. experiment graph:
3. max fitness score and experiment_df: 1.20, a2c_gae_mlp_separate_cartpole_experiment_df.txt
4. best trial JSON spec: a2c_gae_mlp_separate_cartpole_t33_spec.txt
5. best trial graph:
6. best session graph (optional):

Discussion (optional)

Looking at the experiment graph, we can tell some effects from the hyperparameters. Going from left to right columns:

entropy_coef (this encourages exploration with larger value): the larger the entropy coef, the slower the convergence. For such a simple environment as cartpole, agent does not need much exploration, so low coef leads to faster learning. Some entropy is still beneficial. Convergence speed peaks at entropy_coef=0.02
lambda (of GAE): performance is not sensitive to it, but high value can cause some drop in strength.
training_frequency (per episode for OnPolicyReplay): as seen, training cannot happen too frequently (high variance) or infrequently (slow learning). There is an optimal frequency for maximum convergence speed, which is 2 for this experiment.
hid_layers_activation: tanh works the best overall.
lr_decay_frequency (lr = learning rate): the strength is concentrated at high value for frequency at 2k; the speed graph has a peak at 2k (one outlier point); the stability graph has a noticeable bump peaking at 4k to 6k. Overall it seems that if decay frequency is too high, learning rate drops very low very quickly, and learning slows down. If lr drops too slow, lr stays high, so learning is unstable. There is an optimal point in between. The effect can be quite noisy.

kengz and others added 23 commits September 22, 2018 13:17

add benchmark spec files

fc506e7

Merge remote-tracking branch 'origin/master' into bench

e8c535e

make dqn polyak

53f5e47

specify resources

b3eb92e

update demo spec

d3d1f8a

Updating a2c gae specs

637f345

Update max cpus for a2c cartpole

490b9e3

speedrun

a077c6e

lower trial

52b21f8

try save deepcopy of spec and info_space

2b83133

lower trial

a45a507

Updated a2c mlp specs

f569a05

Switch nstep separate to linear decay

7254254

ramp up trial. problem confirmed fixed

712629b

update lunar a2c search specs

cdf9a01

cleanup cartpole and lunar benchmark specs

3083da6

Merge branch 'master' into bench

af8f4a7

Merge branch 'bench' of https://github.com/kengz/SLM-Lab into bench

01cf593

Merge branch 'bench' of https://github.com/kengz/SLM-Lab into bench

3d128d5

update demo spec to good params

0b41b90

Merge branch 'bench' of https://github.com/kengz/SLM-Lab into bench

2417984

placeholder for experiment link

3bc3d81

Merge branch 'master' into a2c-cartpole-bench

30fc448

kengz added this to Done in v2.x Research and Engineering Sep 23, 2018

kengz moved this from Done to In progress in v2.x Research and Engineering Sep 23, 2018

kengz merged commit 3ec2f01 into master Sep 23, 2018

v2.x Research and Engineering automation moved this from In progress to Done Sep 23, 2018

kengz added the result experiment result upload label Sep 23, 2018

kengz deleted the a2c-cartpole-bench branch September 23, 2018 07:50

kengz mentioned this pull request Sep 24, 2018

DQN Boltzmann CER cartpole benchmark #184

Merged

6 tasks

This was referenced Sep 24, 2018

A2C with nstep returns cartpole benchmark #185

Merged

Large search spec for a2c pendulum, updated lunar benchmark spec #186

Merged

kengz mentioned this pull request Sep 28, 2018

First HydraDQN solution for CartPole and 2DBall #190

Merged

lgraesser mentioned this pull request Sep 28, 2018

DQN lunar benchmark #191

Merged

6 tasks

kengz mentioned this pull request Sep 30, 2018

DDQN Boltzmann CartPole benchmark #195

Merged

6 tasks

kengz changed the title ~~Benchmark result for A2C cartpole~~ A2C cartpole benchmark Sep 30, 2018

This was referenced Oct 1, 2018

REINFORCE cartpole benchmark #200

Merged

A2C GAE SIL cartpole benchmark #201

Merged

lgraesser mentioned this pull request Oct 1, 2018

Double DQN Lunar Lander benchmark #203

Merged

6 tasks

This was referenced Oct 2, 2018

A3C GAE cartpole benchmark #204

Merged

A3C Nstep Cartpole benchmark #205

Merged

This was referenced Oct 3, 2018

Mountain car DQN benchmark #208

Merged

DDQN mountain bench #209

Merged

This was referenced Oct 5, 2018

PPO cartpole benchmark #211

Merged

PPO SIL cartpole benchmark #212

Merged

DQN Boltzmann cartpole benchmark #213

Merged

DDQN Boltzmann cartpole benchmark #214

Merged

This was referenced Oct 25, 2018

DQN Boltzmann mountain car benchmark #219

Merged

DDQN Boltzmann mountain car benchmark #220

Merged

kengz mentioned this pull request Nov 11, 2018

REINFORCE LunarLander benchmark #232

Merged

6 tasks

This was referenced Dec 10, 2018

DQN Lunar Benchmark #250

Merged

DDQN Lunar Benchmark #251

Merged

sgillen mentioned this pull request Jan 30, 2019

update README to fix debugging link #277

Merged

allan-avatar1 mentioned this pull request Feb 15, 2019

added sudo #286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A2C cartpole benchmark #180

A2C cartpole benchmark #180

lgraesser commented Sep 23, 2018 •

edited by kengz

A2C cartpole benchmark #180

A2C cartpole benchmark #180

Conversation

lgraesser commented Sep 23, 2018 • edited by kengz

Experiment Result

Abstract

Methods

To Reproduce

Results

Discussion (optional)

lgraesser commented Sep 23, 2018 •

edited by kengz