Skip to content

jmcoholich/isaacgym

Repository files navigation

Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion

Planning and control architecture

Isaacgym

Modified by Jeremiah Coholich for use in training on the Unitree Aliengo robot for the project Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion. Original code from NVIDIA: https://developer.nvidia.com/isaac-gym (Preview Release 2)

Models are trained with my fork of the rl_games repo, which includes support for logging with Weights and Biases, among other things.

rl_games fork: https://github.com/jmcoholich/rl_games

This README contains instructions for installing both my modified versions of isaacgym and the rl_games library.

The full documentation for IsaacGym can be found in ~/isaacgym/docs/

Features

Here is list of features I have added:

Prereqs

  • Ubuntu 18.04 or 20.04.
  • Python 3.6, 3.7 or 3.8.
  • Minimum NVIDIA driver version: 460.32
    • Note: Even if you have no NVIDIA gpu, you will need to install an NVIDIA driver in order to run Isaacgym (I haven't found a better workaround).

To install an NVIDIA driver

sudo apt update
sudo apt install nvidia-driver-470

To install IsaacGym + RL_Games locally

cd ~
git clone git@github.com:jmcoholich/isaacgym.git
cd isaacgym
conda env create -f python/rlgpu_conda_env.yml
conda activate rlgpu
cd python
pip install -e .
cd ~
git clone git@github.com:jmcoholich/rl_games.git
cd rl_games
pip install -e .

To test installation

cd ~/isaacgym/python/examples
python joint_monkey.py

Known Issues

ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory can be solved by adding the following to your ~/.bashrc file:

export LD_LIBRARY_PATH=/home/username/miniforge3/envs/rlgpu/lib

miniforge3 is used because I am using mamba instead of conda (highly recommended).

Utilities like nano or watch result in segmentation fault.

cd /home/username/miniforge3/envs/rlgpu/lib
rm libtinfo*
rm libncursesw*

To run training

Baseline end-to-end RL policy

python rlg_train.py --cfg_env 12_F --seed 0 --device 0 --headless --cfg_train 12

To playback trained end-to-end policy

python rlg_train.py --play --checkpoint <run ID>

Proposed method

python rlg_train.py --cfg_env 12_H_new_sota --seed 0 --device 1 --cfg_train 12 --headless

Playback trained policy

python rlg_train.py --play --checkpoint <run ID>

Generalization/ High-level evaluation

NOTE: The final evals require about 30 Gb VRAM.

To evaluate the flat and hierarchical policy and generate all figures, run:

bash generate_results.bash <hierarchical policy id> <flat policy id> <name of plots folder>

To generate paper results from the saved checkpoints, run

bash generate_results.bash 123750 1234750 <name of plots folder> 1 50

To just generate the plots from saved checkpoint evaluations, run

python generate_all_plots.py --h_id 123750 --f_id 1234750 --save_dir plots/post_acc --no_regen

All Evaluation

To evaluate a flat RL policy run

python evaluate_policy.py --id <run_id> --flat_policy True

To evaluate the propose hierarchical method run

python evaluate_policy.py --id <run_id> --flat_policy False

The evaluate_policy.py script is parallelized across gpus and will automatically scale to use all available gpus.

Commands breakdown

As mentioned in the paper, the high level does not require training.

The high level policy takes three hyperparameters:

  • The desired direction of travel. This number is given as a multiple of pi, so --des_dir 0.0 corresponds to forward while --des_dir 1.0 is backwards.
  • The coeffient for the desired direction term of the optimization. This balances the directional term of the cost function with the value function for footsteps. --des_dir_coef 1.0 corresponds to a coeffient of 1. --des_dir_coef 0.0 means the agent will just pick the highest value footstep targets (which are basically in-place)
  • --box_len is the size of the box around the current footsteps that the high level policy will search in

By default, the stepping stones (ss) in the training environment are a sort of curriculum that becomes more challenging as the robot progresses in the x-direction and in the plus or minus y-direction. However, during evaluation we would like to test on terrain of uniform difficulty, so we use the following arguments. Terrain parameters and arguments: --add_ss needed to pass custom stepping stone terrain parameters --ss_infill value in (0.0, 1.0] which represents the fraction of infill. Lower is harder --ss_height_var This is a positive float representing range of variation of stepping stone heights. Higher is harder. --no_ss removes stepping stones from the environment

Other parameters:
--footstep_targets_in_place stops the footstep target generation from occuring that the low-level policy is trained on.
--two_ahead_opt means the high-level jointly optimizes over the next footstep targets and the pair of targets after that. It has better performance and should be on by default, BUT ONLY WHEN --des_dir 0.0 TODO change this
--plot_values The locations where the feet make contact with the ground are plotted with dots. Each leg has its own color. The next footstep targets are marked with red columns.

To walk forward on flat ground:

python rlg_train.py --play --checkpoint <run ID> --num_envs 1 --no_ss --plot_values --des_dir_coef 1.0 --des_dir 0.0 --footstep_targets_in_place --two_ahead_opt

To walk forward on stepping stones: python rlg_train.py --play --checkpoint <run ID> --num_envs 1 --add_ss --ss_infill 0.75 --ss_height_var 0.1 --plot_values --des_dir_coef 1.0 --des_dir 0.0 --footstep_targets_in_place --two_ahead_opt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages