GitHub - meiji163/bandits: Multi-armed bandit algorithms

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
submissions		submissions
weights		weights
.gitignore		.gitignore
README.md		README.md
data.tar.xz		data.tar.xz
janken.py		janken.py
train.py		train.py

Repository files navigation

Multi-Armed Bandit Algorithms

RNN (LSTM or GRU)

guesses the opponent policy distribution based on sequence of previous moves
difficult to learn all possible opponent strategies (deeper network?)

UCB (Upper Confidence Bound)¹

balance exploitation/exploration
predictable by opponent
needs high exploration constant and/or frequent resets

PUCB (Predictor + Upper Confidence Bound)²

predictor (ideally) detects changes in opponent stategy

SER4 (Successive Elimation Rounds with Randomized Round-Robin and Resets)³

runs several random trials to find move with highest mean reward
assumes constant oppponent distribution
bad against high variance strategies

EXP3.R (EXP3 with Resets)^{3 4}

updates probabilities based on mean rewards and prior
reset based on detection of maximum mean reward drift
good against exploitation-biased strategies

Bayesian (Thompson Sampling)⁵

use beta distribution to model reward probabilities and update based on observations
assumes constant opponent distribution

References

1: https://link.springer.com/article/10.1023/A:1013689704352

2: https://link.springer.com/article/10.1007%2Fs10472-011-9258-6

3: https://link.springer.com/article/10.1007/s41060-017-0050-5

4: https://cseweb.ucsd.edu/~yfreund/papers/bandits.pdf

5: https://arxiv.org/pdf/1707.02038.pdf

About

Multi-armed bandit algorithms

reinforcement-learning-algorithms multi-armed-bandit

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%