RL Algorithms

This table displays the rl algorithms that are implemented in the stable baselines project, along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.

Name	Refactored [1]	Recurrent	`Box`	`Discrete`	Multi Processing
A2C	✔️	✔️	✔️	✔️	✔️
ACER	✔️	✔️	❌ [5]	✔️	✔️
ACKTR	✔️	✔️	❌ [5]	✔️	✔️
DDPG	✔️	✔️	✔️	❌	❌
DQN	✔️	❌	❌	✔️	❌
GAIL [2]	✔️	✔️	✔️	✔️	✔️ [4]
PPO1	✔️	✔️	✔️	✔️	✔️ [4]
PPO2	✔️	✔️	✔️	✔️	✔️
TRPO	✔️	✔️	✔️	✔️	✔️ [4]

[1]	Whether or not the algorithm has be refactored to fit the `BaseRLModel` class.

[2]	Only implemented for TRPO.

[3]	Only implemented for DDPG.

[4]	(1, 2, 3) Multi Processing with MPI.

[5]	(1, 2) TODO, in project scope.

Actions gym.spaces:

Box: A N-dimensional box that containes every point in the action space.
Discrete: A list of possible actions, where each timestep only one of the actions can be used.
MultiDiscrete: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary: A list of possible actions, where each timestep any of the actions can be used in any combination.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algos.rst

algos.rst

RL Algorithms

Files

algos.rst

Latest commit

History

algos.rst

File metadata and controls

RL Algorithms