Deep RL Toolkit is a flexible and high-efficient reinforcement learning framework. RLToolkit is developed for practitioners with the following advantages:
-
Reproducible. We provide algorithms that stably reproduce the result of many influential reinforcement learning algorithms.
-
Extensible. Build new algorithms quickly by inheriting the abstract class in the framework.
-
Reusable. Algorithms provided in the repository could be directly adapted to a new task by defining a forward network and training mechanism will be built automatically.
-
Elastic: allows to elastically and automatically allocate computing resources on the cloud.
-
Lightweight: the core codes <1,000 lines (check Demo).
-
Stable: much more stable than Stable Baselines 3 by utilizing various ensemble methods.
RLToolkit implements the following model-free deep reinforcement learning (DRL) algorithms:
- OpenAI Gym
- Atari
- MuJoCo
- PyBullet
For the details of DRL algorithms, please check out the educational webpage OpenAI Spinning Up.
If you want to learn more about deep reinforcemnet learning, please read the deep-rl-class and run the examples.
git clone https://github.com/jianzhnie/deep-rl-toolkit.git
# Run the DQN algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ddqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_dqn
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo dueling_ddqn
# Run the C51 algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo c51
# Run the DDPG algorithm on the Pendulum-v1 environment
python examples/cleanrl/cleanrl_runner.py --env Pendulum-v0 --algo ddpg
# Run the PPO algorithm on the CartPole-v0 environment
python examples/cleanrl/cleanrl_runner.py --env CartPole-v0 --algo ppo
- Deep Q-Network (DQN) (V. Mnih et al. 2015)
- Double DQN (DDQN) (H. Van Hasselt et al. 2015)
- Advantage Actor Critic (A2C)
- Vanilla Policy Gradient (VPG)
- Natural Policy Gradient (NPG) (S. Kakade et al. 2002)
- Trust Region Policy Optimization (TRPO) (J. Schulman et al. 2015)
- Proximal Policy Optimization (PPO) (J. Schulman et al. 2017)
- Deep Deterministic Policy Gradient (DDPG) (T. Lillicrap et al. 2015)
- Twin Delayed DDPG (TD3) (S. Fujimoto et al. 2018)
- Soft Actor-Critic (SAC) (T. Haarnoja et al. 2018)
- SAC with automatic entropy adjustment (SAC-AEA) (T. Haarnoja et al. 2018)
-
rllib
-
coach
-
Pearl
-
tianshou
-
stable-baselines3
-
PARL
-
openrl
-
cleanrl