Spinning up as a student of reinforcement learning.
This is based on advice and algorithms explained by OpenAI at Spinning Up although the code is my own except where otherwise indicated in comments.
This code uses Weights and Biases to track experiments.
- Ubuntu 18.04 (presumably works on 20.04 and 22.04)
sudo apt-get install -y \
git \
python3.8 \
python3.8-dev \
python3.8-distutils \
python3.8-venv \
python3-pip \
swig \
swig3.0
Mujoco is required for the InvertedPendulum-v2 and HalfCheetah-v2 environments.
sudo apt-get install -y \
curl \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libglfw3-dev \
libosmesa6-dev \
net-tools \
software-properties-common \
virtualenv \
wget \
xpra
sudo wget -O /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf \
&& sudo chmod a+x /usr/local/bin/patchelf
mkdir -p ~/.mujoco
wget -O ~/.mujoco/mjkey.txt https://roboti.us/file/mjkey.txt
wget -O ~/.mujoco/mjpro150_linux.zip https://roboti.us/download/mjpro150_linux.zip
cd ~/.mujoco && unzip ./mjpro150_linux.zip
# libglewosmesa.so from Mujoco 1.5 is incompatible with Python >= 3.7,
# so get a new version of the library from Mujoco 2.1.0
wget -O ~/.mujoco/mujoco210-linux-x86_64.tar.gz https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
cd ~/.mujoco && tar -xvzf ./mujoco210-linux-x86_64.tar.gz
cd ~/.mujoco/mjpro150/bin \
&& mv libglewosmesa.so libglewosmesa.old.so \
&& cp ~/.mujoco/mujoco210/bin/libglewosmesa.so . \
&& chmod 775 libglewosmesa.so
Set the library path and update bash configuration to automatically load library path.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mjpro150/bin:/usr/lib/nvidia
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mjpro150/bin:/usr/lib/nvidia' >> ~/.bashrc
From the directory where you want this repository to go, run:
git clone https://github.com/expz/spin-class.git
cd spin-class
virtualenv --python=/usr/bin/python3.8 venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
(Note that it doesn't work to run these commands and then move the directory later, because the virtual environment is tied to the exact full path.)
This requires a free account with Weights and Biases.
wandb login
This is implemented for CartPole-v0, InvertedPendulum-v2, FrozenLake-v1 and HalfCheetah-v2.
This will create projects vpg-cartpole
, vpg-invertedpendulum
, vpg-frozenlake
and vpg-halfcheetah
in the Weights and Biases UI.
This runs the same algorithm with the same settings with different random seeds to test how consistent its performance is. From the root directory of this repository run
source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env cartpole
for CartPole-v0 or
source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env invertedpendulum
for InvertedPendulum-v2 or
source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env frozenlake
for FrozenLake-v1 (slippery). Use --env frozenlake-nonslippery
for the non-slippery version. Use
source venv/bin/activate
python spin_class/reproducibility.py --algo vpg --env halfcheetah
for HalfCheetah-v2. VPG does not completely solve the HalfCheetah environment, but it should be able to achieve consistent forward motion.
If you would like to just run for one seed, add the --num-seeds 1
flag.
To use the GPU, add the --device cuda:0
or --device cuda_random
flag.
From the Weights and Biases UI, create a new sweep in the desired project. Copy and paste the contents of spin_class/vpg_cartpole_sweep.yaml
or spin_class/vpg_invpen_sweep.yaml
for cartpole or inverted pendulum environments respctively into the settings box and create the sweep.
Then for each search process you would like to start, from the root directory of this repository, run
source venv/bin/activate
and then copy, paste and run the sweep command from the Weights and Biases UI.
To use the GPU, add
- "--device"
- "cuda:random"
before the "${args}"
entry in the command
section of the yaml.
This is implemented for CartPole-v0 and FrozenLake-v1 (slippery and non-slippery). To use it, use the --algo ddqn
flag.
This is implemented for CartPole-v0 and Frozenlake-v1 (slippery and non-slippery). To use it, use the --algo c51
flag.
This is implemented for InvertedPendulum-v2 and HalfCheetah-v2. To use it, use the --algo ddpg
flag.
This is implemented for InvertedPendulum-v2 and HalfCheetah-v2. To use it, use the --algo td3
flag.
This is implemented for CartPole-v0. To use it, use the --algo ppo
flag.