Skip to content
/ rlpy Public

A pytorch-version implementation of RL algorithms. Now it collects TRPO, ClipPPO, A2C, GAIL and ADCV.

Notifications You must be signed in to change notification settings

lx10077/rlpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch implementation of reinforcement learning algorithms

Important notes

  • To run mujoco environments, first install mujoco-py and suggested modified version of gym which supports mujoco 1.50.
  • Make sure the version of Pytorch is at least 0.4.0.
  • If you have a GPU, you are recommended to set the OMP_NUM_THREADS to 1, since PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. (This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):
export OMP_NUM_THREADS=1
  • Code structure: Agent collects samples; Trainer facilitates learning and training; Evaluator tests trained models in new environments. All examples are placed under config file.
  • After training several agents on one environment, you can plot the training process in one figure by
python utils/plot.py --env-name <ENVIRONMENT_NAME> --algo <ALGORITHM1,...,ALGORITHMn>  --x_len <ITERATION_NUM> --save_data

Policy Gradient Methods

Example

python config/pg/ppo_gym.py --env-name Hopper-v2 --max-iter-num 1000 --gpu

Reference

Results

We test the policy gradient codes in these Mujoco environments with default parameters.

Generative Adversarial Imitation Learning

To save trajectory

If you want to do GAIL but without existing expert trajectories, TrajGiver will help us generate it. However, make sure the export policy has been generated and saved (i.e. train a TRPO or PPO agent on the same environment) such that TrajGiver would automatically first find the export directory, then load the policy network and running states, and eventually run the well-trained policy on desired environment.

To do imitation learning

python config/gail/gail_gym.py --env-name Hopper-v2 --max-iter-num 1000  --gpu

Action Dependent Control Variate

Example

python config/adcv/v_gym.py --env-name Walker2d-v2 --max-iter-num 1000 --variate mlp --opt minvar --gpu

Results

About

A pytorch-version implementation of RL algorithms. Now it collects TRPO, ClipPPO, A2C, GAIL and ADCV.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages