Skip to content

Implementation of (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2

License

Notifications You must be signed in to change notification settings

jspieler/reinforcement-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning

This repository contains minimalistic implementations of several (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2. The repository is constantly being updated and new algorithms will be added.

Algorithms

Implemented

Planned

Quickstart

  1. Install package via pip:

    pip install git+https://github.com/jspieler/reinforcement-learning.git
    
  2. Run algorithms for OpenAI gym environments, e.g. DDPG on the Pendulum-v1 environment for 150 episodes using PyTorch:

    python rl_algorithms/PyTorch/train_agent.py --agent DDPG --env Pendulum-v1 --seed 1234 --ep 150
    

    If you want to use custom parameters for the algorithm instead of the default one, you can add the argument --config /path/to/config.yaml. See config.yaml for an example.

  3. Alternatively, here is a quick example of how to train DDPG on the Pendulum-v1 environment using PyTorch:

    import gym 
    
    from rl_algorithms.PyTorch.agents import DDPG
    from rl_algorithms.PyTorch.train_agent import set_seeds, train
    
    env = gym.make("Pendulum-v1")
    agent = DDPG(env)
    set_seeds(env, seed=1234)
    train(agent, env, num_episodes=150, filename="ddpg_pendulum_v1_rewards.png")
    

Further information

Deep Deterministic Policy Gradient (DDPG)

Paper: Continuous control with deep reinforcement learning
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2

Note: Implementation is not exactly the same as described in the original paper since specific implementation details are not included (actions are already included in the first layer of the critic network, different weight initialization, no batch normalization, etc.).


Twin-Delayed Deep Deterministic Policy Gradient (TD3)

Paper: Addressing Function Approximation Error in Actor-Critic Methods
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2


Soft Actor-Critic (SAC)

Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / Soft Actor-Critic Algorithms and Applications
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2


Maximum a Posteriori Policy Optimisation (MPO)

Paper: Maximum a Posteriori Policy Optimisation
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented


Hybrid Maximum a Posteriori Policy Optimization (Hybrid-MPO)

Paper: Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented

About

Implementation of (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages