Skip to content

openpsi-project/srl

Repository files navigation

SRL (ReaLly Scalable RL): Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

SRL is an efficient, scalable and extensible distributed Reinforcement Learning system. SRL supports running several state-of-the-art RL algorithms on some common environments with one simple configuration file, and also exposes general APIs for users to develop their self-defined environments, policies and algorithms. SRL even allows users to implement new system components to support their algorithm designs, if current system architecture is not sufficient.

For algorithm developers

  • Our support for multi-agent training goes beyond classic MAPPO experiments. To unleash full control over your agents, checkout our experiment configuration doc.

  • We provide a quick start for algorithm developers. Now users could migrate their environment, write customized policy and trainer without knowing the details of system implementation.

Terminology

RL system components:

Scheduler related:

  • experiment_name(-e), name as registered by a experiment configuration.
  • trial_name(-f), name given when launching an experiment.

Code Structure

  • api: Development api for algorithm and environments.
  • apps: Main entry.
  • base: The base library including anything unrelated to the RL logic; e.g. networking utils, general data structures & algorithms.
  • codespace: where developers should place their code
  • distributed: Directory for distributed system.
  • legacy: Implementation of classic algorithm / environments.
  • local: A local version of the distributed system.
  • scripts: Scripts for developers.

Getting Start

See code-style.md for guide on development. See cluster.md for description on our cluster.

Prerequisite

  1. Ask the administrators for an account on the cluster.
  2. Setup your VPN. Ask the administrators for details.
  3. On your PC, add the following lines to your ~/.ssh/config
Host prod 
    HostName 10.210.14.4
    User {YOUR_USER_NAME}

First, sync the repo to frlcpu001:

scripts/sync_repo prod

Alternatively you can check out the repo on the server. Make sure to sync or checkout the code to /home so that it is visible on all nodes.

To run a mini experiment:

python3 -m apps.main start -e my-atari-exp -f $(whoami)-test --mode slurm --wandb_mode offline

This runs the experiment my-atari-exp with a trial name username-test. Mode should be slurm unless you are running the code on you PC or with in a container. You can also config your wandb api key on a terminal to allow --wandb_mode online:

# Get your WANDB api key from: https://wandb.ai/authorize
echo "export WANDB_API_KEY=< set your WANDB_API_KEY here>" >> ~/.profile
# Set wandb Host to our proxy.
echo 'export WANDB_BASE_URL="http://proxy.newfrl.com:8081"' >> ~/.profile

By default, experiments timeout after 3 days. You could change this value in your experiment configurations.

System

System documentation is moved to system documentation.

Monitoring

We use both wandb and Prometheus. Run wandb init and use --wandb_mode online to use the former.

The login of prometheus is the same as the cluster.

Checkout W&B configuration on how to customize your wandb_run setup.

Checkout optimize_your_experiment.md on how to improve the efficiency of your experiment.