Skip to content

Latest commit

 

History

History
 
 

baselines

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Baselines

In this folder we include some simple baselines with runnable examples on bsuite.

Agents and installation instructions

To make the main bsuite library independent of machine learning frameworks, we don't include all of the agent dependencies by default. See below for installation instructions. Note that installation of these dependencies is optional.

We recommend using Python's venv virtual environment system to manage your dependencies and avoid version conflicts.

TensorFlow agents

The below agents are built using TensorFlow 2, trfl, and Sonnet 2.

To install these dependencies, run:

pip install bsuite[baselines]
  • actor_critic: A feed-forward implementation of the advantage actor-critic (A2C) algorithm, with TD(lambda).
  • actor_critic_rnn: A recurrent version of the above agent.
  • dqn: An implementation of the deep Q-networks (DQN) algorithm.
  • boot_dqn: An implementation of the Bootstrapped DQN with randomized priors algorithm described in Osband et al. 2018.

JAX agents

The below agents are built using JAX, rlax, and Haiku.

To install these dependencies, run:

pip install bsuite[baselines_jax]
  • actor_critic: A feed-forward implementation of the advantage actor-critic (A2C) algorithm, with TD(lambda).
  • actor_critic_rnn: A recurrent version of the above agent.
  • dqn: An implementation of the deep Q-networks (DQN) algorithm.

Third-party agents

Additionally, we provide examples of running existing external baselines from other codebases, which introduce their own dependencies.

  • dopamine_dqn: An implementation of DQN from Dopamine.

    pip install dopamine-rl
  • openai_dqn and openai_ppo: Implementation of DQN and PPO from OpenAI baselines.

    pip install git+https://github.com/openai/baselines

Running the baselines

Inside each agent folder is a run.py file which will run the agent against a single bsuite environment, or the entire behavior suite by passing --bsuite_id=SWEEP. For example. from the baselines/dqn folder, you could run:

python3 run.py --bsuite_id=SWEEP