Spin Class

Spinning up as a student of reinforcement learning.

This is based on advice and algorithms explained by OpenAI at Spinning Up although the code is my own except where otherwise indicated in comments.

This code uses Weights and Biases to track experiments.

Install

Prerequisites

Ubuntu 18.04 (presumably works on 20.04 and 22.04)

Install general prerequisites

sudo apt-get install -y \
    git \
    python3.8 \
    python3.8-dev \
    python3.8-distutils \
    python3.8-venv \
    python3-pip \
    swig \
    swig3.0

Install Mujoco

Mujoco is required for the InvertedPendulum-v2 and HalfCheetah-v2 environments.

Install Mujoco prequisites

sudo apt-get install -y \
    curl \
    libgl1-mesa-dev \
    libgl1-mesa-glx \
    libglew-dev \
    libglfw3-dev \
    libosmesa6-dev \
    net-tools \
    software-properties-common \
    virtualenv \
    wget \
    xpra

sudo wget -O /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf \
  && sudo chmod a+x /usr/local/bin/patchelf

Install Mujoco library

mkdir -p ~/.mujoco

wget -O ~/.mujoco/mjkey.txt https://roboti.us/file/mjkey.txt
wget -O ~/.mujoco/mjpro150_linux.zip https://roboti.us/download/mjpro150_linux.zip
cd ~/.mujoco && unzip ./mjpro150_linux.zip

# libglewosmesa.so from Mujoco 1.5 is incompatible with Python >= 3.7,
 # so get a new version of the library from Mujoco 2.1.0
wget -O ~/.mujoco/mujoco210-linux-x86_64.tar.gz https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
cd ~/.mujoco && tar -xvzf ./mujoco210-linux-x86_64.tar.gz
cd ~/.mujoco/mjpro150/bin \
  && mv libglewosmesa.so libglewosmesa.old.so \
  && cp ~/.mujoco/mujoco210/bin/libglewosmesa.so . \
  && chmod 775 libglewosmesa.so

Set the library path and update bash configuration to automatically load library path.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mjpro150/bin:/usr/lib/nvidia
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mjpro150/bin:/usr/lib/nvidia' >> ~/.bashrc

Set up virtual environment

From the directory where you want this repository to go, run:

git clone https://github.com/expz/spin-class.git
cd spin-class
virtualenv --python=/usr/bin/python3.8 venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

(Note that it doesn't work to run these commands and then move the directory later, because the virtual environment is tied to the exact full path.)

Log into Weights and Biases

This requires a free account with Weights and Biases.

wandb login

Algorithms

Vanilla Policy Gradient

This is implemented for CartPole-v0, InvertedPendulum-v2, FrozenLake-v1 and HalfCheetah-v2.

This will create projects vpg-cartpole, vpg-invertedpendulum, vpg-frozenlake and vpg-halfcheetah in the Weights and Biases UI.

Run reproducibility test

This runs the same algorithm with the same settings with different random seeds to test how consistent its performance is. From the root directory of this repository run

source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env cartpole

for CartPole-v0 or

source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env invertedpendulum

for InvertedPendulum-v2 or

source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env frozenlake

for FrozenLake-v1 (slippery). Use --env frozenlake-nonslippery for the non-slippery version. Use

source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env halfcheetah

for HalfCheetah-v2. VPG does not completely solve the HalfCheetah environment, but it should be able to achieve consistent forward motion.

If you would like to just run for one seed, add the --num-seeds 1 flag.

To use the GPU, add the --device cuda:0 or --device cuda_random flag.

Run a hyperparameter search ("sweep")

From the Weights and Biases UI, create a new sweep in the desired project. Copy and paste the contents of spin_class/vpg_cartpole_sweep.yaml or spin_class/vpg_invpen_sweep.yaml for cartpole or inverted pendulum environments respctively into the settings box and create the sweep.

Then for each search process you would like to start, from the root directory of this repository, run

source venv/bin/activate

and then copy, paste and run the sweep command from the Weights and Biases UI.

To use the GPU, add

  - "--device"
  - "cuda:random"

before the "${args}" entry in the command section of the yaml.

Double Deep Q-Learning (DDQN)

This is implemented for CartPole-v0 and FrozenLake-v1 (slippery and non-slippery). To use it, use the --algo ddqn flag.

Distributional Q-Learning (C51)

This is implemented for CartPole-v0 and Frozenlake-v1 (slippery and non-slippery). To use it, use the --algo c51 flag.

Deep Deterministic Policy Gradient (DDPG)

This is implemented for InvertedPendulum-v2 and HalfCheetah-v2. To use it, use the --algo ddpg flag.

Twin Delayed Deep Deterministic Policy Gradient (TD3)

This is implemented for InvertedPendulum-v2 and HalfCheetah-v2. To use it, use the --algo td3 flag.

Proximal Policy Optimization (PPO)

This is implemented for CartPole-v0. To use it, use the --algo ppo flag.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
spin_class		spin_class
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
c51.ipynb		c51.ipynb
ddpg.ipynb		ddpg.ipynb
ddqn.ipynb		ddqn.ipynb
jupyter.sh		jupyter.sh
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py
vpg.ipynb		vpg.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spin Class

Install

Prerequisites

Install general prerequisites

Install Mujoco

Install Mujoco prequisites

Install Mujoco library

Set up virtual environment

Log into Weights and Biases

Algorithms

Vanilla Policy Gradient

Run reproducibility test

Run a hyperparameter search ("sweep")

Double Deep Q-Learning (DDQN)

Distributional Q-Learning (C51)

Deep Deterministic Policy Gradient (DDPG)

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Proximal Policy Optimization (PPO)

About

Releases

Packages

Languages

License

expz/spin-class

Folders and files

Latest commit

History

Repository files navigation

Spin Class

Install

Prerequisites

Install general prerequisites

Install Mujoco

Install Mujoco prequisites

Install Mujoco library

Set up virtual environment

Log into Weights and Biases

Algorithms

Vanilla Policy Gradient

Run reproducibility test

Run a hyperparameter search ("sweep")

Double Deep Q-Learning (DDQN)

Distributional Q-Learning (C51)

Deep Deterministic Policy Gradient (DDPG)

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Proximal Policy Optimization (PPO)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages