The most popular deep-learning frameworks: PyTorch and TensorFlow (tf1.x/2.x static-graph/eager/traced).
Highly distributed learning: Our RLlib algorithms (such as our "PPO" or "IMPALA") allow you to set the num_workers
config parameter, such that your workloads can run on 100s of CPUs/nodes thus parallelizing and speeding up learning.
Vectorized (batched) and remote (parallel) environments: RLlib auto-vectorizes your gym.Envs
via the num_envs_per_worker
config. Environment workers can then batch and thus significantly speedup the action computing forward pass. On top of that, RLlib offers the remote_worker_envs
config to create single environments (within a vectorized one) as ray Actors, thus parallelizing even the env stepping process.
gym.Envs
into a multi-agent one via a few simple steps and start training your agents in any of the following fashions:1) Cooperative with shared or separate policies and/or value functions.
2) Adversarial scenarios using self-play and league-based training.
3) Independent learning of neutral/co-existing agents.
External simulators: Don't have your simulation running as a gym.Env in python? No problem! RLlib supports an external environment API and comes with a pluggable, off-the-shelve client/ server setup that allows you to run 100s of independent simulators on the "outside" (e.g. a Windows cloud) connecting to a central RLlib Policy-Server that learns and serves actions. Alternatively, actions can be computed on the client side to save on network traffic.
Offline RL and imitation learning/behavior cloning: You don't have a simulator for your particular problem, but tons of historic data recorded by a legacy (maybe non-RL/ML) system? This branch of reinforcement learning is for you! RLlib's comes with several offline RL algorithms (CQL, MARWIL, and DQfD), allowing you to either purely behavior-clone your existing system or learn how to further improve over it.