This table displays the rl algorithms that are implemented in the stable baselines project,
along with some useful characteristics: support for recurrent policies, discrete/continuous actions, multiprocessing.
Name |
Refactored [1] |
Recurrent |
Box |
Discrete |
Multi Processing |
A2C |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
ACER |
✔️ |
✔️ |
❌ [5] |
✔️ |
✔️ |
ACKTR |
✔️ |
✔️ |
❌ [5] |
✔️ |
✔️ |
DDPG |
✔️ |
✔️ |
✔️ |
❌ |
❌ |
DQN |
✔️ |
❌ |
❌ |
✔️ |
❌ |
GAIL [2] |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ [4] |
PPO1 |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ [4] |
PPO2 |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ |
TRPO |
✔️ |
✔️ |
✔️ |
✔️ |
✔️ [4] |
[1] | Whether or not the algorithm has be refactored to fit the BaseRLModel class. |
[2] | Only implemented for TRPO. |
[3] | Only implemented for DDPG. |
[4] | (1, 2, 3) Multi Processing with MPI. |
[5] | (1, 2) TODO, in project scope. |
Actions gym.spaces
:
Box
: A N-dimensional box that containes every point in the action
space.
Discrete
: A list of possible actions, where each timestep only
one of the actions can be used.
MultiDiscrete
: A list of possible actions, where each timestep only one action of each discrete set can be used.
MultiBinary
: A list of possible actions, where each timestep any of the actions can be used in any combination.