Skip to content

Latest commit

 

History

History
76 lines (51 loc) · 1.74 KB

trpo.rst

File metadata and controls

76 lines (51 loc) · 1.74 KB
.. automodule:: stable_baselines.trpo_mpi


TRPO

Trust Region Policy Optimization (TRPO) is an iterative approach for optimizing policies with guaranteed monotonic improvement.

Note

TRPO requires :ref:`OpenMPI <openmpi>`. If OpenMPI isn't enabled, then TRPO isn't imported into the stable_baselines module.

Notes

Can I use?

  • Recurrent policies: ❌
  • Multi processing: ✔️ (using MPI)
  • Gym spaces:
Space Action Observation
Discrete ✔️ ✔️
Box ✔️ ✔️
MultiDiscrete ✔️ ✔️
MultiBinary ✔️ ✔️

Example

import gym

from stable_baselines.common.policies import MlpPolicy
from stable_baselines import TRPO

env = gym.make('CartPole-v1')

model = TRPO(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=25000)
model.save("trpo_cartpole")

del model # remove to demonstrate saving and loading

model = TRPO.load("trpo_cartpole")

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Parameters

.. autoclass:: TRPO
  :members:
  :inherited-members: