Skip to content

Latest commit

 

History

History
252 lines (179 loc) · 10.1 KB

index.rst

File metadata and controls

252 lines (179 loc) · 10.1 KB

RLlib: Industry-Grade Reinforcement Learning

image

RLlib is an open-source library for reinforcement learning (RL), offering support for production-level, highly distributed RL workloads while maintaining unified and simple APIs for a large variety of industry applications. Whether you would like to train your agents in a multi-agent setup, purely from offline (historic) datasets, or using externally connected simulators, RLlib offers a simple solution for each of your decision making needs.

If you either have your problem coded (in python) as an RL environment or own lots of pre-recorded, historic behavioral data to learn from, you will be up and running in only a few days.

RLlib is already used in production by industry leaders in many different verticals, such as climate control, industrial control, manufacturing and logistics, finance, gaming, automobile, robotics, boat design, and many others.

RLlib in 60 seconds

It only takes a few steps to get your first RLlib workload up and running on your laptop.

RLlib does not automatically install a deep-learning framework, but supports TensorFlow (both 1.x with static-graph and 2.x with eager mode) as well as PyTorch. Depending on your needs, make sure to install either TensorFlow or PyTorch (or both, as shown below):

pip install "ray[rllib]" tensorflow torch

For installation on computers running Apple Silicon (such as M1), please follow instructions here <https://docs.ray.io/en/latest/installation.html#m1-mac-apple-silicon-support>._ To be able to run our Atari examples, you should also install pip install "gym[atari]" "gym[accept-rom-license]" atari_py.

This is all you need to start coding against RLlib. Here is an example of running a PPO Algorithm on the Taxi domain. We first create a config for the algorithm, which sets the right environment, and defines all training parameters we want. Next, we build the algorithm and train it for a total of 5 iterations. A training iteration includes parallel sample collection by the environment workers, as well as loss calculation on the collected batch and a model update. As a last step, we evaluate the trained Algorithm:

doc_code/rllib_in_60s.py

Note that you can use any Farama-Foundation Gymnasium environment as env. In rollouts you can for instance specify the number of parallel workers to collect samples from the environment. The framework config lets you choose between "tf2", "tf" and "torch" for execution. You can also tweak RLlib's default model config,and set up a separate config for evaluation.

If you want to learn more about the RLlib training API, you can learn more about it here. Also, see here for a simple example on how to write an action inference loop after training.

If you want to get a quick preview of which algorithms and environments RLlib supports, click on the dropdowns below:

RLlib Algorithms

  • High-throughput architectures
    • pytorch tensorflow Distributed Prioritized Experience Replay (Ape-X) <apex>
    • pytorch tensorflow Importance Weighted Actor-Learner Architecture (IMPALA) <impala>
    • pytorch tensorflow Asynchronous Proximal Policy Optimization (APPO) <appo>
    • pytorch Decentralized Distributed Proximal Policy Optimization (DD-PPO) <ddppo>
  • Gradient-based
    • pytorch tensorflow Advantage Actor-Critic (A2C, A3C) <a3c>
    • pytorch tensorflow Deep Deterministic Policy Gradients (DDPG, TD3) <ddpg>
    • pytorch tensorflow Deep Q Networks (DQN, Rainbow, Parametric DQN) <dqn>
    • pytorch tensorflow Policy Gradients <pg>
    • pytorch tensorflow Proximal Policy Optimization (PPO) <ppo>
    • pytorch tensorflow Soft Actor Critic (SAC) <sac>
    • pytorch Slate Q-Learning (SlateQ) <slateq>
  • Derivative-free
    • pytorch tensorflow Augmented Random Search (ARS) <ars>
    • pytorch tensorflow Evolution Strategies <es>
  • Model-based / Meta-learning / Offline
    • pytorch Single-Player AlphaZero (AlphaZero) <alphazero>
    • pytorch tensorflow Model-Agnostic Meta-Learning (MAML) <maml>
    • pytorch Model-Based Meta-Policy-Optimization (MBMPO) <mbmpo>
    • pytorch Dreamer (DREAMER) <dreamer>
    • pytorch Conservative Q-Learning (CQL) <cql>
  • Multi-agent
    • pytorch QMIX Monotonic Value Factorisation (QMIX, VDN, IQN) <qmix>
    • tensorflow Multi-Agent Deep Deterministic Policy Gradient (MADDPG) <maddpg>
  • Offline
    • pytorch tensorflow Advantage Re-Weighted Imitation Learning (MARWIL) <marwil>
  • Contextual bandits
    • pytorch Linear Upper Confidence Bound (LinUCB) <lin-ucb>
    • pytorch Linear Thompson Sampling (LinTS) <lints>
  • Exploration-based plug-ins (can be combined with any algo)
    • pytorch Curiosity (ICM: Intrinsic Curiosity Module) <curiosity>

Feature Overview

RLlib Key Concepts ^^^ Learn more about the core concepts of RLlib, such as environments, algorithms and policies. +++ .. link-button:: rllib-core-concepts :type: ref :text: Key Concepts :classes: btn-outline-info btn-block ---

RLlib Algorithms ^^^ Check out the many available RL algorithms of RLlib for model-free and model-based RL, on-policy and off-policy training, multi-agent RL, and more. +++ .. link-button:: rllib-algorithms-doc :type: ref :text: Algorithms :classes: btn-outline-info btn-block ---

RLlib Environments ^^^ Get started with environments supported by RLlib, such as Farama foundation's Gymnasium, Petting Zoo, and many custom formats for vectorized and multi-agent environments. +++ .. link-button:: rllib-environments-doc :type: ref :text: Environments :classes: btn-outline-info btn-block

The following is a summary of RLlib's most striking features. Click on the images below to see an example script for each of the listed features:

Customizing RLlib

RLlib provides simple APIs to customize all aspects of your training- and experimental workflows. For example, you may code your own environments in python using Farama-Foundation's gymnasium or DeepMind's OpenSpiel, provide custom TensorFlow/Keras- or , Torch models, write your own policy- and loss definitions, or define custom exploratory behavior.

Via mapping one or more agents in your environments to (one or more) policies, multi-agent RL (MARL) becomes an easy-to-use low-level primitive for our users.

RLlib's API stack: Built on top of Ray, RLlib offers off-the-shelf, highly distributed algorithms, policies, loss functions, and default models (including the option to auto-wrap a neural network with an LSTM or an attention net). Furthermore, our library comes with a built-in Server/Client setup, allowing you to connect hundreds of external simulators (clients) via the network to an RLlib server process, which provides learning functionality and serves action queries. User customizations are realized via sub-classing the existing abstractions and - by overriding certain methods in those sub-classes - define custom behavior.

RLlib's API stack: Built on top of Ray, RLlib offers off-the-shelf, highly distributed algorithms, policies, loss functions, and default models (including the option to auto-wrap a neural network with an LSTM or an attention net). Furthermore, our library comes with a built-in Server/Client setup, allowing you to connect hundreds of external simulators (clients) via the network to an RLlib server process, which provides learning functionality and serves action queries. User customizations are realized via sub-classing the existing abstractions and - by overriding certain methods in those sub-classes - define custom behavior.