Skip to content

2020.06.0

Compare
Choose a tag to compare
@ryanjulian ryanjulian released this 23 Jun 23:01
· 324 commits to master since this release

The Reinforcement Learning Working Group is proud to announce the 2020.06 release of garage.

As always, we are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.

Summary

Please see the CHANGELOG for detailed information on the changes in this release.

This released focused primarily on adding first-class support for meta-RL and multi-task RL. To achieve this, we rewrote the sampling API and subsystem completely, adding a Sampler API which is now multi-environment and multi-agent aware. We also added a library of baseline meta-RL and multi-task algorithms which reach state-of-the-art performance: MAML, PEARL, RL2, MTPPO, MTTRPO, MTSAC, Task Embeddings.

Highlights in this release:

  • First-class support for meta-RL and multi-task RL, demonstrated using the MetaWorld benchmark
  • More PyTorch algorithms, including MAML, SAC, MTSAC, PEARL, PPO, and TRPO (97% test coverage)
  • More TensorFlow meta-RL algorithms, including RL2 and Task Embeddings (95% test coverage)
  • All-new Sampler API, with first-class support for multiple agents and environments
  • All-new experiment definition decorator @wrap_experiment, which replaces the old run_experiment function
  • Continued improvements to quality and test coverage. Garage now has 90% overall test coverage
  • Simplified and updated the Docker containers, adding better support for CUDA/nvidia-docker2 and removing the complex docker-compose based system

Read below for more information on what's new in this release. See Looking forward for more information on what to expect in the next release.

First-class support for meta-RL and MTRL

We added first-class support for meta-RL and multi-task RL, including state-of-the-art performing versions of the following baseline algorithms:

We also added explicit support for meta-task sampling and evaluation.

New Sampler API

The new Sampler API allows you to define a custom worker or rollout function for your algorithm, to control the algorithm's sampling behavior. These Workers are agnostic of the sampling parallelization backend used. This makes it easy to customize sampling behavior without forcing you to write your own sampler.

For example, you can define one Worker and use it to collect samples inside the local process, or alternatively use it to collect many samples in parallel using multiprocessing, without ever having to interact with multiprocessing code and synchronization. Both RL2 and PEARL define custom workers, which allow them to implement the special sampling procedure necessary for these meta-RL algorithms.

The sampler is also aware of multiple policies and environments, allowing you to customize it for use with multi-task/meta-RL or multi-agent RL.

Currently-available sampling backends are:

  • LocalSampler - collects samples serially within the main optimization process
  • MultiprocessingSampler - collects samples in parallel across multiple processors using the Python standard library's multiprocessing library
  • RaySampler - collect samples in parallel using a ray cluster (that cluster can just be your local machine, of course)

The API for defining a new Sampler backend is small and well-defined. If you have a new bright idea for a parallel sampler backend, send us a PR!

New Experiment Definition API

We added the @wrap_experiment decorator, which defines the new standard way of declaring an experiment and its hyperparameters in garage. In short, an experiment is a function, and a hyperparameters are the arguments to that function. You can wrap your experiment function with @wrap_experiment to set experiment meta-data such as snapshot schedules and log directories.

Calling your experiment function runs the experiment.

wrap_experiment has features such as saving the current git context, automatically naming experiments, and automatically saving the hyperparameters of any experiment function it decorates. Take a look at the examples/ directory for hands-on examples of how to use it.

Improvements to quality and test coverage

Overall test coverage increased from 85% to 90% since v2019.10, and we expect this to keep climbing. We also now define standard benchmarks for all algorithms in the separate benchmarks directory.

Why we skipped 2020.02

Our focus on adding meta- and multi-task RL support required changing around and generalizing many APIs in garage. Around January 2020, this support existed, and we were in the process of polishing it for the February 2020 release. Around this time, our development was impacted by the COVID-19 pandemic, forcing many members of the garage core maintainers team to socially isolate in their homes, slowing down communication, and overall the development of garage. Rather than rushing to release the software during stressful times, the team decided to skip the February 2020 release and put together a much more polished version for this release milestone.

We intend to return to our regularly-scheduled release cadence for 2020.09.

Who should use this release, and how

Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features should use the release branch and tags.

Platform support

This release has been tested extensively on Ubuntu 18.04 and 20.04. We have also used it successfully on Ubuntu 16.04 and macOS 10.13, 10.14, and 10.15.

Maintenance Plan

We plan on supporting this branch until at least February 2021. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.

We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2020.06) from contributors wishing see these enhancements to this version of the software.

Hotfixes

We will post backwards-compatible hotfixes for this release to the branch release-2020.06. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2020.06.1, the second would be tagged v2020.06.2, etc.

We will not add new features, nor remove existing features from the branch release-2020.06 unless it is absolutely necessary for the integrity of the software.

Next release

We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around late September 2020, e.g. v2020.00.

Looking forward

The next release of garage will focus primarily on two goals: meta- and multi-task RL algorithms (and associated toolkit support) and stable, well-defined component APIs for fundamental RL abstractions such as Policy, QFunction, ValueFunction, Sampler, ReplayBuffer, Optimizer, etc.

Complete documentation

We are working feverishly to document garage and its APIs, to give the toolkit a full user manual, how-tos, tutorials, per-algorithm documentation and baseline curves, and a reference guide motivating the design and usage of all APIs.

Stable and well-defined component APIs

The toolkit has gotten mature-enough that most components have a fully-described formal API or an informal API which all components of that type implement, and large-enough that we have faith that our existing components cover most current RL use cases.

Now we will turn to formalizing the major component APIs and ensuring that the components in garage all conform to these APIs This will allow us to simplify lots of logic throughout the toolkit, and will make it easier to mix components defined outside garage with those defined inside garage.

More flexible packaging

We intend on removing hard dependencies on TensorFlow, PyTorch, and OpenAI Gym. Instead, garage will detect what software you have installed and activate features accordingly. This will make it much easier to mix-and-match garage features you'd like to take advantage of, without having to install a giant list of all possible garage dependencies into your project.

More algorithms and training environments

We plan on adding more multi-task and meta-RL methods, such as PCGrad and ProMP. We also plan to add better support for gameplay domains and associated DQN-family algorithms, and will start adding first-class support for imitation learning.

For training environments, we are actively working on adding PyBullet support.

What about TensorFlow 2.0 support?

Given the uncertainty about the future of TensorFlow, and frequent reports of performance regressions when using TF2, core maintainers have paused work on moving the TensorFlow tree to use the new TF2 eager execution semantics. Note that garage can be installed using TensorFlow 2, but will still make use of the Graph APIs under tf.compat.v1. We are also focusing new algorithm development on the PyTorch tree, but will continue to perform proactive maintenance and usability improvements in the TensorFlow tree.

We'll revisit this decision after the next release (v2020.09), when we hope the future of TensorFlow APIs is more clear. We suggest those who really need eager execution APIs today should instead focus on garage.torch.

Users who are eager to add garage support for TF2 are welcome to become contributors and start sending us Pull Requests.

Contributors to this release