Skip to content

neale/alf

 
 

Repository files navigation

ALF

ALF-logo

CI

Agent Learning Framework (ALF) is a reinforcement learning framework emphasizing on the flexibility and easiness of implementing complex algorithms involving many different components. ALF is built on PyTorch. The development of previous version based on Tensorflow 2.1 has stopped as of Feb 2020.

Algorithms

Algorithm Type Reference
A2C On-policy RL OpenAI Baselines: ACKTR & A2C
PPO On-policy RL Schulman et al. "Proximal Policy Optimization Algorithms" arXiv:1707.06347
DDPG Off-policy RL Lillicrap et al. "Continuous control with deep reinforcement learning" arXiv:1509.02971
SAC Off-policy RL Haarnoja et al. "Soft Actor-Critic Algorithms and Applications" arXiv:1812.05905
OAC Off-policy RL Ciosek et al. "Better Exploration with Optimistic Actor-Critic" arXiv:1910.12807
HER Off-policy RL Andrychowicz et al. "Hindsight Experience Replay" arXiv:1707.01495
TAAC Off-policy RL Yu et al. "TAAC: Temporally Abstract Actor-Critic for Continuous Control" arXiv:2104.06521
DIAYN Intrinsic motivation/Exploration Eysenbach et al. "Diversity is All You Need: Learning Diverse Skills without a Reward Function" arXiv:1802.06070
ICM Intrinsic motivation/Exploration Pathak et al. "Curiosity-driven Exploration by Self-supervised Prediction" arXiv:1705.05363
RND Intrinsic motivation/Exploration Burda et al. "Exploration by Random Network Distillation" arXiv:1810.12894
MuZero Model-based RL Schrittwieser et al. "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" arXiv:1911.08265
MERLIN Unsupervised learning Wayne et al. "Unsupervised Predictive Memory in a Goal-Directed Agent"arXiv:1803.10760
Amortized SVGD General Feng et al. "Learning to Draw Samples with Amortized Stein Variational Gradient Descent" arXiv:1707.06626
HyperNetwork General Ratzlaff and Fuxin. "HyperGAN: A Generative Model for Diverse, Performant Neural Networks." arXiv:1901.11058
MCTS General Grill et al. "Monte-Carlo tree search as regularized policy optimization" arXiv:2007.12509
MINE General Belghazi et al. "Mutual Information Neural Estimation" arXiv:1801.04062
ParticleVI General Liu and Wang. "Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm." arXiv:1608.04471
Liu et al. "Understanding and accelerating particle-based variational inference." arXiv:1807.01750
SVGD optimizer General Liu et al. "Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm." arXiv:1608.04471
VAE General Higgins et al. "beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework" ICLR2017

Installation

Python3.7 is currently supported by ALF. Note that some pip packages (e.g., pybullet) need python dev files, so make sure python3.7-dev is installed:

sudo apt install -y python3.7-dev

Virtualenv is recommended for the installation. After creating and activating a virtual env, you can run the following commands to install ALF:

git clone https://github.com/HorizonRobotics/alf
cd alf
pip install -e .

Documentation

You can read the ALF documentation here.

Examples

All the examples below are trained on a single machine Intel(R) Core(TM) i9-7960X CPU @ 2.80GHz with 32 CPUs and one RTX 2080Ti GPU.

You can train any .gin file under alf/examples using the following command:

cd alf/examples; python -m alf.bin.train --gin_file=GIN_FILE --root_dir=LOG_DIR
  • GIN_FILE is the file of gin configuration. You can find sample gin configuration files for different tasks under directory alf/examples (note that some of the examples have not been converted to use the latest pytorch version of ALF).
  • LOG_DIR is the directory when you want to store the training results. Note that if you want to train from scratch, a new value for LOG_DIR need to be used. Othewise, it is assumed to resume the training from a previous checkpoint (if any).

Or alternatively, train any _conf.py file under alf/examples as follows:

cd alf/examples; python -m alf.bin.train --conf=CONF_FILE --root_dir=LOG_DIR
  • CONF_FILE follows ALF configuration file format (basically python). Note that we are in the process of converting all .gin examples to _conf.py examples, because of the flexibility of ALF configuration.

During training, you can use tensorboard to show the progress of training:

tensorboard --logdir=LOG_DIR

After training, you can visualize the trained model using the following command:

python -m alf.bin.play --root_dir=LOG_DIR

Troubleshooting: if an error says that no configuration file is found, then probably you are not under alf/examples.

A2C

  • Cart pole. The training score took only 30 seconds to reach 200, using 8 environments.

    breakout-training-curve cartpole-video

  • Atari games. Need to install python package atari-py for atari game environments. The evaluation score (by taking argmax of the policy) took 1.5 hours to reach 800 on Breakout, using 64 environments.

    breakout-training-curve breakout-playing-screen

  • Simple navigation with visual input. Follow the instruction at SocialRobot to install the environment.

    simple-navigation-curve simple0navigation-video

PPO

  • PR2 grasping state only. Follow the instruction at SocialRobot to install the environment.

    ppo-pr2-curve pr2-video

  • Humanoid. Learning to walk using the pybullet Humanoid environment. Need to install python pybullet>=2.5.0 for the environment. The evaluation score reaches 3k in 50M steps, using 96 parallel environments.

    Humanoid-training-curve Humanoid-video

DDPG

  • FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This example reproduces the performance of vanilla DDPG reported in the OpenAI's Robotics environment paper. Our implementation doesn't use MPI, but obtains (evaluation) performance on par with the original implementation. (The original MPI implementation has 19 workers, each worker containing 2 environments for rollout and sampling a minibatch of size 256 from its replay buffer for computing gradients. All the workers' gradients will be summed together for a centralized optimizer step. Our implementation simply samples a minibatch of size 5000 from a common replay buffer per optimizer step.) The training took about 1 hour with 38 (19*2) parallel environments on a single GPU.

    ddpg-fetchslide-training-curve

SAC

  • Bipedal Walker.

    bipedal-walker-training-curve bipedal-walker-video

  • FetchReach (sparse rewards). Need to install the MuJoCo simulator first. The training took about 20 minutes with 20 parallel environments on a single GPU.

    sac-fetchreach-training-curve

  • FetchSlide (sparse rewards). Need to install the MuJoCo simulator first. This is the same task with the DDPG example above, but with SAC as the learning algorithm. Also it has only 20 (instead of 38) parallel environments to improve sample efficiency. The training took about 2 hours on a single GPU.

    sac-fetchslide-training-curve

  • Fetch Environments (sparse rewards) w/ Action Repeat. We are able to achieve even better performance than reported by DDPG + Hindsight Experience Replay in some cases simply by using SAC + Action Repeat with length 3 timesteps. See this note to view learning curves, videos, and more details.

ICM

  • Super Mario. Playing Super Mario only using intrinsic reward. Python package gym-retro>=0.7.0 is required for this experiment and also a suitable SuperMarioBros-Nes rom should be obtained and imported (roms are not included in gym-retro). See this doc on how to import roms.

    super-mario-training-curve super-mario-video

RND

  • Montezuma's Revenge. Training the hard exploration game Montezuma's Revenge with intrinsic rewards generated by RND. A lucky agent can get an episodic score of 6600 in 160M frames (40M steps with frame_skip=4). A normal agent would get an episodic score of 4000~6000 in the same number of frames. The training took about 6.5 hours with 128 parallel environments on a single GPU.

mrevenge-training-curvemrevenge-video

DIAYN

  • Pendulum. Learning diverse skills without external rewards.

    Discriminator loss Skills learned with DIAYN

Merlin

  • Collect Good Objects. Learn to collect good objects and avoid bad objects. DeepmindLab is required, Follow the instruction at DeepmindLab to install the environment.

    room-collect-good-objects-training-curve room-collect-good-objects

MISC

MuZero

  • 6x6 Go. It took about a day to train a reasonable agent to play 6x6 go using one GPU.

    6x6-go

Contribute to ALF

You can follow the guideline here.

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Other 0.8%