neorl.rl.baselines.acer
Sample Efficient Actor-Critic with Experience Replay (ACER) combines concepts of parallel agents from A2C and provides a replay memory as in DQN. ACER also includes truncated importance sampling with bias correction, stochastic dueling network architectures, and a new trust region policy optimization method.
Original paper: https://arxiv.org/abs/1611.01224
- Multi processing: ✔️
- Discrete spaces: ✔️
- Continuous spaces: ❌
- Mixed Discrete/Continuous spaces: ❌
ACER
neorl.rl.make_env.CreateEnvironment
neorl.utils.neorlcalls.RLLogger
Train an ACER agent to optimize the 5-D discrete sphere function
- ACER can be observed as the parallel version of DQN with additional enhancements. ACER is also restricted to discrete spaces.
- ACER shows sensitivity to
n_steps
,q_coef
, andent_coef
. It is always good to consider tuning these hyperparameters before using for optimization. In particular,n_steps
is considered the most important parameter to tune. - The cost of ACER equals to the
total_timesteps
in thelearn
function, where the original fitness function will be accessedtotal_timesteps
times. - See how ACER is used to solve two common combinatorial problems in
TSP <ex1>
andKP <ex10>
.
Thanks to our fellows in stable-baselines, as we used their standalone RL implementation, which is utilized as a baseline to leverage advanced neuroevolution algorithms.
Hill, Ashley, et al. "Stable baselines." (2018).