Skip to content

reinforcement learning algorithms from the book by Sutton and Barto

Notifications You must be signed in to change notification settings

idsc-frazzoli/subare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ch.ethz.idsc.subare Build Status

Library for reinforcement learning in Java, version 0.3.8

Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.

Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:

  • the algorithms are implemented separate from the problem scenarios
  • the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries

Algorithms

  • Iterative Policy Evaluation (parallel, in 4.1, p.59)
  • Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
  • Action-Value Iteration to determine Q*(s,a) (parallel)
  • First Visit Policy Evaluation (in 5.1, p.74)
  • Monte Carlo Exploring Starts (in 5.3, p.79)
  • Contant-alpha Monte Carlo
  • Tabular Temporal Difference (in 6.1, p.96)
  • Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
  • Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
  • Expected Sarsa (in 6.6, p.107)
  • Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
  • n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
  • n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
  • Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
  • Tabular Dyna-Q (in 8.2, p.133)
  • Prioritized Sweeping (in 8.4, p.137)
  • Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
  • True Online Sarsa (in 12.8, p.309)

Gallery

prisonersdilemma

Prisoner's Dilemma

gambler_exact

Exact Gambler

Examples

4.1 Gridworld

AV-Iteration q(s,a)

gridworld_qsa_avi

TabularQPlan

gridworld_qsa_rstqp

Monte Carlo

gridworld_qsa_mces

Q-Learning

gridworld_qsa_qlearning

Expected-Sarsa

gridworld_qsa_expected

Sarsa

gridworld_qsa_original

3-step Q-Learning

gridworld_qsa_qlearning3

3-step E-Sarsa

gridworld_qsa_expected3

3-step Sarsa

gridworld_qsa_original3

OTrue Online Sarsa

gridworld_tos_original

ETrue Online Sarsa

gridworld_tos_expected

QTrue Online Sarsa

gridworld_tos_qlearning

4.2: Jack's car rental

Value Iteration v(s)

carrental_vi_true

4.4: Gambler's problem

Value Iteration v(s)

gambler_sv

Action Value Iteration and optimal policy

gambler_avi

Monte Carlo q(s,a)

gambler_qsa_mces

ESarsa q(s,a)

gambler_qsa_esarsa

QLearning q(s,a)

gambler_qsa_qlearn

5.1 Blackjack

Monte Carlo Exploring Starts

blackjack_mces

5.2 Wireloop

AV-Iteration

wire5_avi

TabularQPlan

wire5_qsa_rstqp

Q-Learning

wire5_qsa_qlearning

E-Sarsa

wire5_qsa_expected

Sarsa

wire5_qsa_original

Monte Carlo

wire5_mces

5.8 Racetrack

paths obtained using value iteration

track 1

track1

track 2

track2

6.5 Windygrid

Action Value Iteration

windygrid_qsa_avi

TabularQPlan

windygrid_qsa_rstqp

6.6 Cliffwalk

Action Value Iteration

cliffwalk_qsa_avi

Q-Learning

cliffwalk_qsa_qlearning

TabularQPlan

cliffwalk_qsa_rstqp

Expected Sarsa

cliffwalk_qsa_expected

8.1 Dynamaze

Action Value Iteration

maze5_qsa_avi

Prioritized sweeping

maze2_ps_qlearning


Additional Examples

Repeated Prisoner's dilemma

Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

opts

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

ucbs

Integration

Specify dependency and repository of the tensor library in the pom.xml file of your maven project:

<dependencies>
  <dependency>
    <groupId>ch.ethz.idsc</groupId>
    <artifactId>subare</artifactId>
    <version>0.3.8</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>subare-mvn-repo</id>
    <url>https://raw.github.com/idsc-frazzoli/subare/mvn-repo/</url>
    <snapshots>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
    </snapshots>
  </repository>
</repositories>

The source code is attached to every release.

Contributors

Jan Hakenberg, Christian Fluri

Publications

References


ethz300