Please provide citations (e.g., arxiv link, blog post, github repo, etc). Any info on the hyperparameter search process taken in the original work (if available) is a bonus. Please use scientific "e" notation (10e5 instead of 1000000).
Example contribution:
hyperparam name | default value |
---|---|
lr | 25e-5 |
RMSprop momentum | 0.95 |
RMSprop epsilon | 0.01 |
discount factor | 0.99 |
epsilon(-greedy) | 1 annealed to 0.1 over 1 million frames |
minibatch size | 32 |
replay memory size | 10e5 |
weight init | Xavier (Torch default) |
hyperparam name | default value |
---|---|
ADAM lr | 2e-4 |
ADAM momentum beta1 | 0.5 |
minibatch size | 64 or 128 |
image scaling | [-1, 1] |
LeakyReLU slope | 0.2 |
Real labels (label smoothing) | 1 -> [0.7, 1.2] |
Fake labels (label smoothing | 0 -> [0.0, 0.3] |
Weight init | N(0, 0.02) |
Z distribution | n-dim uniform or gaussian (e.g., uniform (-0.2, 0.2) from this implementation |
For Z, sampling from a uniform distribution is simpler, but see the discussion here about interpolation in the latent space; current recommendation is to use a spherical Z and interpolate via a great circle
In the paper, the actor and critic learning rates are reversed. However, to help stabilize the actor network during training, you generally want to encourage the critic network to converge faster; hence the larger initial lr for the critic is suggested here.
hyperparam name | default value |
---|---|
actor lr | 10e-4 |
critic lr | 10e-3 |
critic L2 weight decay | 10e-2 |
discount factor | 0.99 |
target network update tau | 10e-4 |
Ornstein-Uhlenbeck theta | 0.15 |
Ornstein-Uhlenbeck sigma | 0.3 |
minibatch size | 64 on low-dim input, 16 on pixel-input |
replay memory size | 10e5 |
weight init | final layer of actor & critic are uniform(-3 * 10-3, 3 * 10-3) for low-dim input and uniform(-3 * 10-4, 3 * 10-4) for pixel-input; other layers -> Xavier |
- paper
- Correspondence by @muupan with author on hyperparams
- See for more training details
hyperparam name | default value |
---|---|
discount factor | 0.99 |
shared RMSprop eta | 7e-4 |
shared RMSprop alpha | 0.99 |
shared RMSprop epsilon | 0.1 |
A3C entropy regularization beta | 0.01 |
V-network gradients multiplied by | 0.5 |
Weight init | Xavier (Torch default) |
Reward clipping | [-1, 1] on Atari |
# of threads w/ best performance | 16 |