PyTorch implementation of reinforcement learning algorithms

Important notes

To run mujoco environments, first install mujoco-py and suggested modified version of gym which supports mujoco 1.50.
Make sure the version of Pytorch is at least 0.4.0.
If you have a GPU, you are recommended to set the OMP_NUM_THREADS to 1, since PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. (This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):

export OMP_NUM_THREADS=1

Code structure: Agent collects samples; Trainer facilitates learning and training; Evaluator tests trained models in new environments. All examples are placed under config file.
After training several agents on one environment, you can plot the training process in one figure by

python utils/plot.py --env-name <ENVIRONMENT_NAME> --algo <ALGORITHM1,...,ALGORITHMn>  --x_len <ITERATION_NUM> --save_data

Policy Gradient Methods

Example

python config/pg/ppo_gym.py --env-name Hopper-v2 --max-iter-num 1000 --gpu

Reference

Results

We test the policy gradient codes in these Mujoco environments with default parameters.

Generative Adversarial Imitation Learning

GAIL -> config/gail/gail_gym.py

To save trajectory

If you want to do GAIL but without existing expert trajectories, TrajGiver will help us generate it. However, make sure the export policy has been generated and saved (i.e. train a TRPO or PPO agent on the same environment) such that TrajGiver would automatically first find the export directory, then load the policy network and running states, and eventually run the well-trained policy on desired environment.

To do imitation learning

python config/gail/gail_gym.py --env-name Hopper-v2 --max-iter-num 1000  --gpu

Action Dependent Control Variate

ADCV -> config/adcv/v_gym.py

Example

python config/adcv/v_gym.py --env-name Walker2d-v2 --max-iter-num 1000 --variate mlp --opt minvar --gpu

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
asset/fig		asset/fig
config		config
core		core
envs		envs
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch implementation of reinforcement learning algorithms

Important notes

Policy Gradient Methods

Example

Reference

Results

Generative Adversarial Imitation Learning

To save trajectory

To do imitation learning

Action Dependent Control Variate

Example

Results

About

Releases

Packages

Languages

lx10077/rlpy

Folders and files

Latest commit

History

Repository files navigation

PyTorch implementation of reinforcement learning algorithms

Important notes

Policy Gradient Methods

Example

Reference

Results

Generative Adversarial Imitation Learning

To save trajectory

To do imitation learning

Action Dependent Control Variate

Example

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages