This repository contains minimalistic implementations of several (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2. The repository is constantly being updated and new algorithms will be added.
- MPO
- Hybrid-MPO
- WD3 / AWD3
-
Install package via pip:
pip install git+https://github.com/jspieler/reinforcement-learning.git
-
Run algorithms for
OpenAI gym
environments, e.g. DDPG on thePendulum-v1
environment for 150 episodes using PyTorch:python rl_algorithms/PyTorch/train_agent.py --agent DDPG --env Pendulum-v1 --seed 1234 --ep 150
If you want to use custom parameters for the algorithm instead of the default one, you can add the argument
--config /path/to/config.yaml
. Seeconfig.yaml
for an example. -
Alternatively, here is a quick example of how to train DDPG on the
Pendulum-v1
environment using PyTorch:import gym from rl_algorithms.PyTorch.agents import DDPG from rl_algorithms.PyTorch.train_agent import set_seeds, train env = gym.make("Pendulum-v1") agent = DDPG(env) set_seeds(env, seed=1234) train(agent, env, num_episodes=150, filename="ddpg_pendulum_v1_rewards.png")
Paper: Continuous control with deep reinforcement learning
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2
Note: Implementation is not exactly the same as described in the original paper since specific implementation details are not included (actions are already included in the first layer of the critic network, different weight initialization, no batch normalization, etc.).
Paper: Addressing Function Approximation Error in Actor-Critic Methods
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2
Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / Soft Actor-Critic Algorithms and Applications
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2
Paper: Maximum a Posteriori Policy Optimisation
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented
Paper: Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented