Modified by Jeremiah Coholich for use in training on the Unitree Aliengo robot for the project Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion. Original code from NVIDIA: https://developer.nvidia.com/isaac-gym (Preview Release 2)
Models are trained with my fork of the rl_games repo, which includes support for logging with Weights and Biases, among other things.
rl_games fork: https://github.com/jmcoholich/rl_games
This README contains instructions for installing both my modified versions of isaacgym and the rl_games library.
The full documentation for IsaacGym can be found in ~/isaacgym/docs/
Here is list of features I have added:
- fast vectorized analytical inverse kinematics for the Aliengo quadruped
- multi-GPU policy evaluation and data gathering pipeline
- procedural terrain generation
- logging with Weights and Biases
- my value-function footstep optimization method
- scripts for generating videos from simulation cameras (vs screencap)
- Augumented Random Search as an alternative to PPO
- Ubuntu 18.04 or 20.04.
- Python 3.6, 3.7 or 3.8.
- Minimum NVIDIA driver version: 460.32
- Note: Even if you have no NVIDIA gpu, you will need to install an NVIDIA driver in order to run Isaacgym (I haven't found a better workaround).
To install an NVIDIA driver
sudo apt update
sudo apt install nvidia-driver-470
cd ~
git clone git@github.com:jmcoholich/isaacgym.git
cd isaacgym
conda env create -f python/rlgpu_conda_env.yml
conda activate rlgpu
cd python
pip install -e .
cd ~
git clone git@github.com:jmcoholich/rl_games.git
cd rl_games
pip install -e .
cd ~/isaacgym/python/examples
python joint_monkey.py
ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory can be solved by adding the following to your ~/.bashrc file:
export LD_LIBRARY_PATH=/home/username/miniforge3/envs/rlgpu/lib
miniforge3 is used because I am using mamba instead of conda (highly recommended).
Utilities like nano or watch result in segmentation fault.
cd /home/username/miniforge3/envs/rlgpu/lib
rm libtinfo*
rm libncursesw*
python rlg_train.py --cfg_env 12_F --seed 0 --device 0 --headless --cfg_train 12
To playback trained end-to-end policy
python rlg_train.py --play --checkpoint <run ID>
python rlg_train.py --cfg_env 12_H_new_sota --seed 0 --device 1 --cfg_train 12 --headless
Playback trained policy
python rlg_train.py --play --checkpoint <run ID>
NOTE: The final evals require about 30 Gb VRAM.
To evaluate the flat and hierarchical policy and generate all figures, run:
bash generate_results.bash <hierarchical policy id> <flat policy id> <name of plots folder>
To generate paper results from the saved checkpoints, run
bash generate_results.bash 123750 1234750 <name of plots folder> 1 50
To just generate the plots from saved checkpoint evaluations, run
python generate_all_plots.py --h_id 123750 --f_id 1234750 --save_dir plots/post_acc --no_regen
To evaluate a flat RL policy run
python evaluate_policy.py --id <run_id> --flat_policy True
To evaluate the propose hierarchical method run
python evaluate_policy.py --id <run_id> --flat_policy False
The evaluate_policy.py script is parallelized across gpus and will automatically scale to use all available gpus.
As mentioned in the paper, the high level does not require training.
The high level policy takes three hyperparameters:
- The desired direction of travel. This number is given as a multiple of pi, so
--des_dir 0.0corresponds to forward while--des_dir 1.0is backwards. - The coeffient for the desired direction term of the optimization. This balances the directional term of the cost function with the value function for footsteps.
--des_dir_coef 1.0corresponds to a coeffient of 1.--des_dir_coef 0.0means the agent will just pick the highest value footstep targets (which are basically in-place) --box_lenis the size of the box around the current footsteps that the high level policy will search in
By default, the stepping stones (ss) in the training environment are a sort of curriculum that becomes more challenging as the robot progresses in the x-direction and in the plus or minus y-direction. However, during evaluation we would like to test on terrain of uniform difficulty, so we use the following arguments.
Terrain parameters and arguments:
--add_ss needed to pass custom stepping stone terrain parameters
--ss_infill value in (0.0, 1.0] which represents the fraction of infill. Lower is harder
--ss_height_var This is a positive float representing range of variation of stepping stone heights. Higher is harder.
--no_ss removes stepping stones from the environment
Other parameters:
--footstep_targets_in_place stops the footstep target generation from occuring that the low-level policy is trained on.
--two_ahead_opt means the high-level jointly optimizes over the next footstep targets and the pair of targets after that. It has better performance and should be on by default, BUT ONLY WHEN --des_dir 0.0 TODO change this
--plot_values The locations where the feet make contact with the ground are plotted with dots. Each leg has its own color. The next footstep targets are marked with red columns.
To walk forward on flat ground:
python rlg_train.py --play --checkpoint <run ID> --num_envs 1 --no_ss --plot_values --des_dir_coef 1.0 --des_dir 0.0 --footstep_targets_in_place --two_ahead_opt
To walk forward on stepping stones:
python rlg_train.py --play --checkpoint <run ID> --num_envs 1 --add_ss --ss_infill 0.75 --ss_height_var 0.1 --plot_values --des_dir_coef 1.0 --des_dir 0.0 --footstep_targets_in_place --two_ahead_opt
