Skip to content

Latest commit

 

History

History
360 lines (200 loc) · 9.19 KB

README.md

File metadata and controls

360 lines (200 loc) · 9.19 KB

ch.ethz.idsc.subare Build Status

Library for reinforcement learning in Java, version 0.3.8

Repository includes algorithms, examples, and exercises from the 2nd edition of Reinforcement Learning: An Introduction by Richard S. Sutton, and Andrew G. Barto.

Our implementation is inspired by the python code by Shangtong Zhang, but differs from the reference in two aspects:

  • the algorithms are implemented separate from the problem scenarios
  • the math is in exact precision which reproduces symmetries in the results in case the problem features symmetries

Algorithms

  • Iterative Policy Evaluation (parallel, in 4.1, p.59)
  • Value Iteration to determine V*(s) (parallel, in 4.4, p.65)
  • Action-Value Iteration to determine Q*(s,a) (parallel)
  • First Visit Policy Evaluation (in 5.1, p.74)
  • Monte Carlo Exploring Starts (in 5.3, p.79)
  • Contant-alpha Monte Carlo
  • Tabular Temporal Difference (in 6.1, p.96)
  • Sarsa: An on-policy TD control algorithm (in 6.4, p.104)
  • Q-learning: An off-policy TD control algorithm (in 6.5, p.105)
  • Expected Sarsa (in 6.6, p.107)
  • Double Sarsa, Double Expected Sarsa, Double Q-Learning (in 6.7, p.109)
  • n-step Temporal Difference for estimating V(s) (in 7.1, p.115)
  • n-step Sarsa, n-step Expected Sarsa, n-step Q-Learning (in 7.2, p.118)
  • Random-sample one-step tabular Q-planning (parallel, in 8.1, p.131)
  • Tabular Dyna-Q (in 8.2, p.133)
  • Prioritized Sweeping (in 8.4, p.137)
  • Semi-gradient Tabular Temporal Difference (in 9.3, p.164)
  • True Online Sarsa (in 12.8, p.309)

Gallery

prisonersdilemma

Prisoner's Dilemma

gambler_exact

Exact Gambler

Examples

4.1 Gridworld

AV-Iteration q(s,a)

gridworld_qsa_avi

TabularQPlan

gridworld_qsa_rstqp

Monte Carlo

gridworld_qsa_mces

Q-Learning

gridworld_qsa_qlearning

Expected-Sarsa

gridworld_qsa_expected

Sarsa

gridworld_qsa_original

3-step Q-Learning

gridworld_qsa_qlearning3

3-step E-Sarsa

gridworld_qsa_expected3

3-step Sarsa

gridworld_qsa_original3

OTrue Online Sarsa

gridworld_tos_original

ETrue Online Sarsa

gridworld_tos_expected

QTrue Online Sarsa

gridworld_tos_qlearning

4.2: Jack's car rental

Value Iteration v(s)

carrental_vi_true

4.4: Gambler's problem

Value Iteration v(s)

gambler_sv

Action Value Iteration and optimal policy

gambler_avi

Monte Carlo q(s,a)

gambler_qsa_mces

ESarsa q(s,a)

gambler_qsa_esarsa

QLearning q(s,a)

gambler_qsa_qlearn

5.1 Blackjack

Monte Carlo Exploring Starts

blackjack_mces

5.2 Wireloop

AV-Iteration

wire5_avi

TabularQPlan

wire5_qsa_rstqp

Q-Learning

wire5_qsa_qlearning

E-Sarsa

wire5_qsa_expected

Sarsa

wire5_qsa_original

Monte Carlo

wire5_mces

5.8 Racetrack

paths obtained using value iteration

track 1

track1

track 2

track2

6.5 Windygrid

Action Value Iteration

windygrid_qsa_avi

TabularQPlan

windygrid_qsa_rstqp

6.6 Cliffwalk

Action Value Iteration

cliffwalk_qsa_avi

Q-Learning

cliffwalk_qsa_qlearning

TabularQPlan

cliffwalk_qsa_rstqp

Expected Sarsa

cliffwalk_qsa_expected

8.1 Dynamaze

Action Value Iteration

maze5_qsa_avi

Prioritized sweeping

maze2_ps_qlearning


Additional Examples

Repeated Prisoner's dilemma

Exact expected reward of two adversarial optimistic agents depending on their initial configuration:

opts

Exact expected reward of two adversarial Upper-Confidence-Bound agents depending on their initial configuration:

ucbs

Integration

Specify dependency and repository of the tensor library in the pom.xml file of your maven project:

<dependencies>
  <dependency>
    <groupId>ch.ethz.idsc</groupId>
    <artifactId>subare</artifactId>
    <version>0.3.8</version>
  </dependency>
</dependencies>

<repositories>
  <repository>
    <id>subare-mvn-repo</id>
    <url>https://raw.github.com/idsc-frazzoli/subare/mvn-repo/</url>
    <snapshots>
      <enabled>true</enabled>
      <updatePolicy>always</updatePolicy>
    </snapshots>
  </repository>
</repositories>

The source code is attached to every release.

Contributors

Jan Hakenberg, Christian Fluri

Publications

References


ethz300