Skip to content
Modular Deep Reinforcement Learning framework in PyTorch.
Python Shell Dockerfile
Branch: master
Clone or download
kengz Merge pull request #415 from kengz/updateunity
Update Unity env installation to use plain git clone
Latest commit 00ce98c Sep 13, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci update docker build May 27, 2019
.github update benchmark file Jul 29, 2019
bin replace submodule with just plain git clone for Unity SLM Env Sep 13, 2019
job split ac roboschool specs to new files Aug 2, 2019
slm_lab improve util RENDER mode and reinforce comment Sep 13, 2019
test
.codeclimate.yml add plotscript to bin Sep 1, 2019
.dockerignore docker copy git Nov 11, 2018
.gitignore replace submodule with just plain git clone for Unity SLM Env Sep 13, 2019
BENCHMARK.md add hardware reference info Sep 1, 2019
CODE_OF_CONDUCT.md Update doc (#202) Oct 1, 2018
CONTRIBUTING.md
Dockerfile add dueling DQN to readme May 27, 2019
LICENSE add Laura Graesser to LICENSE Oct 2, 2017
README.md remove enjoy mode typo in README Sep 2, 2019
SLM-Lab-White-Paper.pdf paper (#216) Oct 9, 2018
TUTORIALS.md
environment.yml revert env yml changes Sep 2, 2019
package.json replace submodule with just plain git clone for Unity SLM Env Sep 13, 2019
run_lab.py fix Ray RLIMIT. remove xvfbwrapper in run_lab Jul 19, 2019
setup.py bump version Aug 11, 2019
yarn.lock remove yarn env packages Sep 2, 2019

README.md

SLM Lab

GitHub tag (latest SemVer) CircleCI Maintainability Test Coverage

Modular Deep Reinforcement Learning framework in PyTorch.

ppo beamrider ppo breakout ppo kungfumaster ppo mspacman
BeamRider Breakout KungFuMaster MsPacman
ppo pong ppo qbert ppo seaquest ppo spaceinvaders
Pong Qbert Seaquest Sp.Invaders
sac ant sac halfcheetah sac hopper sac humanoid
Ant HalfCheetah Hopper Humanoid
sac doublependulum sac pendulum sac reacher sac walker
Inv.DoublePendulum InvertedPendulum Reacher Walker
References
Installation How to install SLM Lab
Documentation Usage documentation
Benchmark Benchmark results
Gitter SLM Lab user chatroom

Features

Algorithms

SLM Lab implements a number of canonical RL algorithms with reusable modular components and class-inheritance, with commitment to code quality and performance.

The benchmark results also include complete spec files to enable full reproducibility using SLM Lab.

Below shows the latest benchmark status. See the full benchmark results here.

Algorithm\Benchmark Atari Roboschool
SARSA - -
DQN (Deep Q-Network) -
Double-DQN, Dueling-DQN, PER -
REINFORCE - -
A2C with GAE & n-step (Advantage Actor-Critic)
PPO (Proximal Policy Optimization)
SAC (Soft Actor-Critic)
SIL (Self Imitation Learning)

Due to their standardized design, all the algorithms can be parallelized asynchronously using Hogwild. Hence, SLM Lab also includes A3C, distributed-DQN, distributed-PPO.

Atari benchmark

See the full benchmark results here.

Click on the algorithm to see the result upload Pull Request.

Env. \ Alg. DQN DDQN+PER A2C (GAE) A2C (n-step) PPO
Breakout
graph
425.89
graph
65.04
graph
181.72
graph
389.99
graph
391.32
graph
Pong
graph
20.09
graph
18.34
graph
20.44
graph
20.04
graph
19.66
graph
Qbert
graph
4,787.79
graph
11,673.52
graph
13,328.32
graph
13,259.19
graph
13,691.89
graph
Seaquest
graph
1,118.50
graph
3,751.34
graph
892.68
graph
1,686.08
graph
1,583.04
graph

Roboschool Benchmark

sac ant sac halfcheetah sac hopper sac humanoid
Ant HalfCheetah Hopper Humanoid
sac doublependulum sac pendulum sac reacher sac walker
Inv.DoublePendulum InvertedPendulum Reacher Walker

Roboschool by OpenAI offers free open source robotics simulations with improved physics. Although it mirrors the environments from MuJuCo, its environments' rewards are different.

See the full benchmark results here.

Click on the algorithm to see the result upload Pull Request.

Env. \ Alg. A2C (GAE) A2C (n-step) PPO SAC
RoboschoolAnt
graph
1029.51
graph
1148.76
graph
1931.35
graph
2914.75
graph
RoboschoolAtlasForwardWalk
graph
68.15
graph
73.46
graph
148.81
graph
942.39
graph
RoboschoolHalfCheetah
graph
895.24
graph
409.59
graph
1838.69
graph
2496.54
graph
RoboschoolHopper
graph
286.67
graph
-187.91
graph
2079.22
graph
2251.36
graph
RoboschoolInvertedDoublePendulum
graph
1769.74
graph
486.76
graph
7967.03
graph
8085.04
graph
RoboschoolInvertedPendulum
graph
1000.0
graph
997.54
graph
930.29
graph
941.45
graph
RoboschoolReacher
graph
14.57
graph
-6.18
graph
19.18
graph
19.99
graph
RoboschoolWalker2d
graph
413.26
graph
141.83
graph
1368.25
graph
1894.05
graph

Humanoid environments are significantly harder. Note that due to the number of frames required, we could only run Async-SAC.

Env. \ Alg. A2C (GAE) A2C (n-step) PPO Async-SAC
RoboschoolHumanoid 122.23
graph
-6029.02
graph
1554.03
graph
2621.46
graph
RoboschoolHumanoidFlagrun 93.48
graph
-2079.02
graph
1635.64
graph
1937.77
graph
RoboschoolHumanoidFlagrunHarder -472.34
graph
-24620.71
graph
610.09
graph
280.18
graph

Environments

SLM Lab integrates with multiple environment offerings:

Contributions are welcome to integrate more environments!

Metrics and Experimentation

To facilitate better RL development, SLM Lab also comes with prebuilt metrics and experimentation framework:

  • every run generates metrics, graphs and data for analysis, as well as spec for reproducibility
  • scalable hyperparameter search using Ray tune

Installation

  1. Clone the SLM Lab repo:

    git clone https://github.com/kengz/SLM-Lab.git
  2. Install dependencies (this uses Conda for optimality):

    cd SLM-Lab/
    ./bin/setup

Alternatively, instead of running ./bin/setup, copy-paste from bin/setup_macOS or bin/setup_ubuntu into your terminal and add sudo accordingly to run the installation commands.

Useful reference: Debugging

Hardware Requirements

Non-image based environments can run on a laptop. Only image based environments such as the Atari games benefit from a GPU speedup. For these, we recommend 1 GPU and at least 4 CPUs. This can run a single Atari Trial consisting of 4 Sessions.

For desktop, a reference spec is GTX 1080 GPU, 4 CPUs above 3.0 GHz, and 32 Gb RAM.

For cloud computing, start with an affordable instance of AWS EC2 p2.xlarge with a K80 GPU and 4 CPUs. Use the Deep Learning AMI with Conda when creating an instance.

Quick Start

DQN CartPole

Everything in the lab is ran using a spec file, which contains all the information for the run to be reproducible. These are located in slm_lab/spec/.

Run a quick demo of DQN and CartPole:

conda activate lab
python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

This will launch a Trial in development mode, which enables verbose logging and environment rendering. An example screenshot is shown below.

Next, run it in training mode. The total_reward should converge to 200 within a few minutes.

python run_lab.py slm_lab/spec/demo.json dqn_cartpole train

Tip: All lab command should be ran from within a Conda environment. Run conda activate lab once at the beginning of a new terminal session.

This will run a new Trial in training mode. At the end of it, all the metrics and graphs will be output to the data/ folder.

A2C Atari

Run A2C to solve Atari Pong:

conda activate lab
python run_lab.py slm_lab/spec/benchmark/a2c/a2c_gae_pong.json a2c_gae_pong train

When running on a headless server, prepend a command with xvfb-run -a, for example xvfb-run -a python run_lab.py slm_lab/spec/benchmark/a2c/a2c_gae_pong.json a2c_gae_pong train

Atari Pong ran with dev mode to render the environment

This will run a Trial with multiple Sessions in training mode. In the beginning, the total_reward should be around -21. After about 1 million frames, it should begin to converge to around +21 (perfect score). At the end of it, all the metrics and graphs will be output to the data/ folder.

Below shows a trial graph with multiple sessions:

Enjoy mode

Once a Trial completes with a good model saved into the data/ folder, for example data/a2c_gae_pong_2019_08_01_010727, use the enjoy mode to show the trained agent playing the environment. Use the enjoy@{prename} mode to pick a saved trial-sesison, for example:

python run_lab.py data/a2c_gae_pong_2019_08_01_010727/a2c_gae_pong_spec.json a2c_gae_pong enjoy@a2c_gae_pong_t0_s0

Benchmark

To run a full benchmark, simply pick a file and run it in train mode. For example, for A2C Atari benchmark, the spec file is slm_lab/spec/benchmark/a2c/a2c_atari.json. This file is parametrized to run on a set of environments. Run the benchmark:

python run_lab.py slm_lab/spec/benchmark/a2c/a2c_atari.json a2c_atari train

This will spawn multiple processes to run each environment in its separate Trial, and the data is saved to data/ as usual. See the uploaded benchmark results here.

Experimentation / Hyperparameter search

An Experiment is a hyperparameter search, which samples multiple specs from a search space. Experiment spawns a Trial for each spec, and each Trial runs multiple duplicated Sessions for averaging its results.

Given a spec file in slm_lab/spec/, if it has a search field defining a search space, then it can be ran as an Experiment. For example,

python run_lab.py slm_lab/spec/experimental/ppo/ppo_lam_search.json ppo_breakout search

Deep Reinforcement Learning is highly empirical. The lab enables rapid and massive experimentations, hence it needs a way to quickly analyze data from many trials. The experiment and analytics framework is the scientific method of the lab.

Experiment graph Experiment graph

Segments of the experiment graph summarizing the trials in hyperparameter search.

Multi-trial graph with moving average

The multi-trial experiment graph and its moving average version comparing the trials. These graph show the effect of different GAE λ values of PPO on the Breakout environment. λ= 0.70 performs the best, while λ values closer to 0.90 do not perform as well.

Trial graph with moving average

A trial graph showing average from repeated sessions, and its moving average version.

Session graph with moving average

A session graph showing the total rewards and its moving average version.

This is the end of the quick start tutorial. Continue reading the full documentation to start using SLM Lab.

Read on: Github | Documentation

Design Principles

SLM Lab is created for deep reinforcement learning research and applications. The design was guided by four principles

  • modularity
  • simplicity
  • analytical clarity
  • reproducibility

Modularity

  • makes research easier and more accessible: reuse well-tested components and only focus on the relevant work
  • makes learning deep RL easier: the algorithms are complex; SLM Lab breaks them down into more manageable, digestible components
  • components get reused maximally, which means less code, more tests, and fewer bugs

Simplicity

  • the components are designed to closely correspond to the way papers or books discuss RL
  • modular libraries are not necessarily simple. Simplicity balances modularity to prevent overly complex abstractions that are difficult to understand and use

Analytical clarity

  • hyperparameter search results are automatically analyzed and presented hierarchically in increasingly granular detail
  • it should take less than 1 minute to understand if an experiment yielded a successful result using the experiment graph
  • it should take less than 5 minutes to find and review the top 3 parameter settings using the trial and session graphs

Reproducibility

  • only the spec file and a git SHA are needed to fully reproduce an experiment
  • all the results are recorded in BENCHMARK.md
  • experiment reproduction instructions are submitted to the Lab via result Pull Requests
  • the full experiment datas contributed are public on Dropbox

Citing

If you use SLM Lab in your research, please cite below:

@misc{kenggraesser2017slmlab,
    author = {Wah Loon Keng, Laura Graesser},
    title = {SLM Lab},
    year = {2017},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/kengz/SLM-Lab}},
}

Contributing

SLM Lab is an MIT-licensed open source project. Contributions are very much welcome, no matter if it's a quick bug-fix or new feature addition. Please see CONTRIBUTING.md for more info.

If you have an idea for a new algorithm, environment support, analytics, benchmarking, or new experiment design, let us know.

If you're interested in using the lab for research, teaching or applications, please contact the authors.

You can’t perform that action at this time.