Multi-Step Bootstrapping with ReLAx

Example N-step TD3 implementation with ReLAx

The performance versus vanilla 1-step TD is measured by averaging learning curves (for separate evaluation environment) over 4 experiments with random environment seeds.

The results are summarized in the following plot:

The only difference in hyper-parameters settings between N-step TD3 and vanilla TD3 is the presence of multi-step bootstrapping. We can see a substantial advantage of 3-step version in terms of training speed as well as asymptotic performance by looking at the averaged curves. That shows that often N-step TD is the cheapest way of improving the performance of RL actor. Note that from task to task the incremental performance of using N-step TD may vary. For example, early experiments show that for Mujoco's Ant-v2 environment 3-step Bellman update works worse than 1-step version.

Resulting Policy

3_step_td3.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
content/video		content/video
tensorboard_logs		tensorboard_logs
trained_models		trained_models
README.md		README.md
n_step_vs_1_step.png		n_step_vs_1_step.png
nstep_td3.ipynb		nstep_td3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Step Bootstrapping with ReLAx

About

Releases

Packages

Languages

nslyubaykin/nstep_td3

Folders and files

Latest commit

History

Repository files navigation

Multi-Step Bootstrapping with ReLAx

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages