Skip to content
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture.
Python C++ Shell Starlark
Branch: master
Clone or download

Latest commit

lespeholt Merge pull request #9 from CStanKonrad/pr_run_local
Add Num. actors parameter validation to run_local.sh
Latest commit eff7aaa Mar 26, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
agents Use logger in SAC. Mar 23, 2020
atari internal change. Feb 18, 2020
common Use logger in SAC. Mar 23, 2020
dmlab Refactor policies interfaces. Mar 18, 2020
docker Internal change. Mar 18, 2020
docs Add architecture to README Mar 23, 2020
football Update football networks.py. Mar 18, 2020
gcp Fix Google Cloud scripts. Feb 2, 2020
grpc Add readme to grpc/ Mar 18, 2020
tests Add ProgressLogger class, so that we can unify logging between Learners. Mar 23, 2020
LICENSE Project import generated by Copybara. Oct 14, 2019
README.md Add architecture to README Mar 23, 2020
__init__.py Project import generated by Copybara. Oct 14, 2019
run_local.sh Restrictions for number of actors Mar 25, 2020
stop_local.sh Internal change. Mar 18, 2020

README.md

SEED

This repository contains an implementation of distributed reinforcement learning agent where both training and inference are performed on the learner.

Architecture

Two agents are implemented:

The code is already interfaced with the following environments:

However, any reinforcement learning environment using the gym API can be used.

For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work.

Bibtex

@article{espeholt2019seed,
    title={SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference},
    author={Lasse Espeholt and Rapha{\"e}l Marinier and Piotr Stanczyk and Ke Wang and Marcin Michalski},
    year={2019},
    eprint={1910.06591},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Prerequisites

There are a few steps you need to take before playing with SEED. Instructions below assume you run the Ubuntu distribution.

apt-get install git
  • Clone SEED git repository:
git clone https://github.com/google-research/seed_rl.git
cd seed_rl

Local Machine Training on a Single Level

To easily start with SEED we provide a way of running it on a local machine. You just need to run one of the following commands:

./run_local.sh [Game] [Agent] [Num. actors]
./run_local.sh atari r2d2 4
./run_local.sh football vtrace 4
./run_local.sh dmlab vtrace 4

It will build a Docker image using SEED source code and start the training inside the Docker image.

Distributed Training using AI Platform

Note that training with AI Platform results in charges for using compute resources.

The first step is to configure GCP and a Cloud project you will use for training:

gcloud auth login
gcloud config set project [YOUR_PROJECT]

Then you just need to execute one of the provided scenarios:

gcp/train_[scenario_name].sh

This will build the Docker image, push it to the repository which AI Platform can access and start the training process on the Cloud. Follow output of the command for progress. You can also view the running training jobs at https://console.cloud.google.com/ml/jobs

DeepMind Lab Level Cache

By default majority of DeepMind Lab's CPU usage is generated by creating new scenarios. This cost can be eliminated by enabling level cache. To enable it, set the level_cache_dir flag in the dmlab/config.py. As there are many unique episodes it is a good idea to share the same cache across multiple experiments. For AI Platform you can add --level_cache_dir=gs://${BUCKET_NAME}/dmlab_cache to the list of parameters passed in gcp/submit.sh to the experiment.

Baseline data on ATARI-57

We provide baseline training data for SEED's R2D2 trained on ATARI games in the form of training curves (checkpoints and Tensorboard event files coming soon). We provide data for 4 independent seeds run up to 40e9 environment frames.

The hyperparameters and evaluation procedure are the same as in section A.3.1 in the paper.

Training curves

Training curves are available on this page.

Checkpoints and Tensorboard event files

Checkpoints and tensorboard event files can be downloaded individually here or as a single (70GBs) zip file.

You can’t perform that action at this time.