GitHub - natetsang/open-rl: Implementations of a large collection of reinforcement learning algorithms.

Open RL is code repository that contains minimalistic implementations of a wide collection of reinforcement learning algorithms. The purpose of this repo is to make RL more approachable and easier to learn. As such, code in this repo is optimized for readability and consistency between algorithms.

Compared to machine learning, RL is still rather niche. As such, finding resources for learning RL is a bit more difficult. While implementations broadly exist for two algorithms, Q-networks and vanilla policy gradients, it's much more difficult to find easy-to-follow implementations of others. For many of the algorithms implemented here, no simple implementations appear to exist whatsoever. Interestingly, it's not just state-of-the-art algorithms that haven't been re-implemented in an easy-to-follow way. It's also hard to find clear implementations of foundational algorithms like multi-armed bandits. It's for these reasons why open-rl was created! Happy learning!

Algorithms

In this repo you will find implementations for the following algorithms.

Model free learning

Policy-based methods

	Discrete	Continuous
REINFORCE	✔️	✖️
REINFORCE w/ baseline	✔️	✖️
VPG	✔️	✔️

Value-based methods

	Discrete	Continuous
DQN	✔️	✖️
Double DQN	✔️	✖️
Dueling DQN	✔️	✖️
DRQN (for POMDPs)	✔️	✖️

Actor-critic methods

	Discrete	Continuous
A2C	✔️	✖️
A3C	✔️	✖️
DDPG	✖️	✔️
TD3	✖️	✔️
SAC	✖️	✔️
PPO	✖️	✔️

Bandits

Multi-armed bandits

	Discrete	Continuous
Pure Exploration	✔️	✖️
Epsilon Greedy	✔️	✖️
Thompson Sampling - Bernoulli	✔️	✖️
Thompson Sampling - Gaussian	✔️	✖️
Upper Confidence Bounds (UCB)	✔️	✖️

Contextual bandits

	Discrete	Continuous
Linear UCB	✔️	✖️
Linear Thompson Sampling	✖️	✖️
Neural-network approach	✔️	✖️

Model-based learning

	Discrete	Continuous
Dyna-Q	✔️	✖️
Deep Dyna-Q	✔️	✖️
Monte-Carlo Tree Search (MCTS)	✔️	✖️
MB + Model Predictive Control	✖️	✔️
Model-Based Policy Opitmization (MBPO)	✖️	✔️

Offline (batch) learning

	Discrete	Continuous
Conservative Q-learning (CQL)	✔️	✖️
Model-Based Offline Reinforcement Learning (MOReL)	✔️	✖️
Model-Based Offline Policy Optimization (MOPO)	✖️	✔️

Other

	Discrete	Continuous
Behavioral Cloning	✔️	✖️
Imitation Learning	✔️	✖️

Installation

Make sure you have Python 3.7 or higher installed
Clone the repo

# Clone repo from github
git clone --depth 1 https://github.com/natetsang/open-rl

# Navigate to root folder
cd open-rl

Create a virtual environment (Windows 10). Showing instructions from virtualenv but there are other options too!

# If not already installed, you might need to run this next line
pip install virtualenv 

# Create virtual environment called 'venv' in the root of the project
virtualenv venv

# Activate environment
venv\Scripts\activate

Download requirements

pip install -r requirements.txt

Contributing

If you're interested in contributing to open-rl, please fork the repo and make a pull request. Any support is much appreciated!

Citation

If you use this code, please cite it as follows:

@misc{Open-RL,
author = {Tsang, Nate},
title = {{Open-RL: Minimalistic implementations of reinforcment learning algorithms}},
url = {https://github.com/natetsang/open-rl},
year = {2021}
}

Acknowledgements

This repo would not be possible without the following (tremendous) resources, which were relied upon heavily when learning RL. I highly recommend going through these to learn more.

CS285 @ UC Berkeley - taught by Sergey Levine
Grokking Deep RL book by @mimoralea
More to be added soon!

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
openrl		openrl
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
logo.PNG		logo.PNG
requirements.txt		requirements.txt
todos.txt		todos.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algorithms

Model free learning

Policy-based methods

Value-based methods

Actor-critic methods

Bandits

Multi-armed bandits

Contextual bandits

Model-based learning

Offline (batch) learning

Other

Installation

Contributing

Citation

Acknowledgements

About

Releases 2

Packages

Contributors 3

Languages

License

natetsang/open-rl

Folders and files

Latest commit

History

Repository files navigation

Algorithms

Model free learning

Policy-based methods

Value-based methods

Actor-critic methods

Bandits

Multi-armed bandits

Contextual bandits

Model-based learning

Offline (batch) learning

Other

Installation

Contributing

Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages