Skip to content
/ Easy21-RL Public

RL algorithms applied from scratch on the Easy21 card game

Notifications You must be signed in to change notification settings

zj-0/Easy21-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Easy 21 RL

Different RL algorithms implemented from scratch to the Easy 21 card game

This is based the Easy21 assignment from David Silver's RL Course

Monte Carlo Control

$V^*(s) = max_a \ Q^{\ast}(s,a)$

$\text{With } \epsilon \text{-greedy exploration strategy: }\epsilon = N_0 / (N_0 + N(s_t)) \text{, where } N_0 = 100 \text{ is a constant} $

$\text{With a time-varying scalar step-size of } \alpha_{t} = 1/N(s_t, a_t) $

For 10,000,000 episodes:

monte-carlo

To use: run monte_carlo.py

Sarsa($\lambda$)

$With \ parameter \ values \ λ ∈ \{0, 0.1, 0.2, ..., 1\}$

$\text{With } \epsilon \text{-greedy exploration strategy: }\epsilon = N_0 / (N_0 + N(s_t)) \text{, where } N_0 = 100 \text{ is a constant} $

$\text{With a time-varying scalar step-size of } \alpha_{t} = 1/N(s_t, a_t) $

For 10,000 episodes:

td_mse_lambda

td_mse_ep

To use: run sarsa.py

Sarsa($\lambda$) with Linear Function Approximation

$\text{Binary feature vector }\phi(s, a) \text{ with 3 ∗ 6 ∗ 2 = 36 features } $

$\text{Dealer(s) = } \lbrace{[1, 4], [4, 7], [7, 10]}\rbrace $

$\text{Player(s) = } \lbrace[1, 6], [4, 9], [7, 12], [10, 15], [13, 18], [16, 21]\rbrace $

$a = \lbrace{\text{hit}, \text{stick}}\rbrace $

$\text{With parameter values }\lambda \in \lbrace{0, 0.1, 0.2, ..., 1}\rbrace $

$\text{With } \epsilon \text{-greedy exploration strategy: }\epsilon = 0.05 $

$\text{With a constant step-size of } \alpha_{t} = 0.01 $

For 10,000 episodes:

td_linear_mse_l

td_linear_mse_ep

To use: run sarsa_linear.py

Dependencies

numpy, tqdm, matplotlib, pandas

About

RL algorithms applied from scratch on the Easy21 card game

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages