Skip to content

Latest commit

 

History

History
121 lines (80 loc) · 5.76 KB

feature_overview.rst

File metadata and controls

121 lines (80 loc) · 5.76 KB

image

The most popular deep-learning frameworks: PyTorch and TensorFlow (tf1.x/2.x static-graph/eager/traced).

image

Highly distributed learning: Our RLlib algorithms (such as our "PPO" or "IMPALA") allow you to set the num_workers config parameter, such that your workloads can run on 100s of CPUs/nodes thus parallelizing and speeding up learning.

image

Vectorized (batched) and remote (parallel) environments: RLlib auto-vectorizes your gym.Envs via the num_envs_per_worker config. Environment workers can then batch and thus significantly speedup the action computing forward pass. On top of that, RLlib offers the remote_worker_envs config to create single environments (within a vectorized one) as ray Actors, thus parallelizing even the env stepping process.

image

Multi-agent RL (MARL): Convert your (custom) gym.Envs into a multi-agent one via a few simple steps and start training your agents in any of the following fashions:
1) Cooperative with shared or separate policies and/or value functions.
2) Adversarial scenarios using self-play and league-based training.
3) Independent learning of neutral/co-existing agents.

image

External simulators: Don't have your simulation running as a gym.Env in python? No problem! RLlib supports an external environment API and comes with a pluggable, off-the-shelve client/ server setup that allows you to run 100s of independent simulators on the "outside" (e.g. a Windows cloud) connecting to a central RLlib Policy-Server that learns and serves actions. Alternatively, actions can be computed on the client side to save on network traffic.

image

Offline RL and imitation learning/behavior cloning: You don't have a simulator for your particular problem, but tons of historic data recorded by a legacy (maybe non-RL/ML) system? This branch of reinforcement learning is for you! RLlib's comes with several offline RL algorithms (CQL, MARWIL, and DQfD), allowing you to either purely behavior-clone your existing system or learn how to further improve over it.