Skip to content
Introducing myself to deep reinforcement learning
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
deeprl fixed cli help for steps per epoch May 3, 2019
imgs updated readme and added new benchmark images Apr 15, 2019
tests updates to deterministic learner and tests Apr 28, 2019
.gitignore added data dir to gitignore Apr 2, 2019
.travis.yml
LICENSE adde license Apr 15, 2019
Makefile added makefile for much more convenient installation and testing May 3, 2019
README.md fixed cli help for steps per epoch May 3, 2019
requirements.txt dealt with dependency issues in install and requirements Apr 14, 2019
setup.py bumped version May 3, 2019

README.md

Intro Deep-RL

Build Status

What is this?

The primary use of this repo is to store my attempts at deep reinforcement-learning algorithms, and benchmark them against the implementations in OpenAI's spinup repository. This is my first attempt at deep reinforcement learning, hence the name.

How can I use this?

Running this

$ deeprl -hid "(64,32)" -n 10 -env 'HalfCheetah-v2' -a ppo --epochs 750 benchmark

and waiting produces this

alt text

With the above line, we have run my PPO implementation against OpenAI's for 10 different random seeds. The top plot shows the history of the episode return. The bottom shows the change in the episode return over training for each implementation and each seed. To get a rough idea if the change in the return is meaningfully different between two implementations, I compare the distributions in the bottom plot with student's t-test to calculate the p-value shown.

I used click to make this nifty command-line-tool, which automatically gives you --help flags

$ deeprl --help
Usage: deeprl [OPTIONS] COMMAND [ARGS]...

  Main entry point

Options:
  -exp, --exp_name TEXT           Prefix added to experiment name
  -n, --num_runs INTEGER          Number of different random seeds to run
                                  [default: 3]
  --epochs INTEGER                Number of epochs  [default: 50]
  -steps, --steps_per_epoch INTEGER
                                  Number of steps per epoch  [default: 4000]
  -env, --env_name TEXT           Environment name  [default: Swimmer-v2]
  -hid, --hidden_sizes TEXT       Hidden sizes for actor and critic MLPs
                                  [default: (64,64)]
  --activation TEXT               Activation to use in actor-critic MLPs
                                  [default: tanh]
  -a, --algo [vpg|trpo|ppo|ddpg|td3|sac]
                                  Algorithm (ie agent) to use  [default: vpg]
  --help                          Show this message and exit.

Commands:
  benchmark  Benchmark tom's implementation against spinup and plot
  plot       plot Logging Results
  run        Run experiment and plot Episode Return

If we then want to know how to use "plot", we can get help on that as well

$ deeprl plot --help
Usage: deeprl plot [OPTIONS]

  plot Logging Results

Options:
  -imp, --implementation [tom|spinup]
                                  Which implementation to run, spinup's or
                                  Tom's
  -v, --value TEXT                Value to plot  [default: AverageEpRet]
  --help                          Show this message and exit.

How do I install this?

git clone git@github.com:henighan/deeprl-intro.git
cd deeprl-intro
make install
deeprl --help

How do I test this?

make test

If you want to use the mujoco environments, you will also need to follow the instructions here.

Other notes

Thus far I have only implemented VPG and PPO for gaussian and categorical policies, (I'm still hunting down a bug in the DDPG agent). I stole spinup's logger, so I could easily plot and compare their and my results side-by-side.

You can’t perform that action at this time.