Skip to content
Play with the solutions to the multi-armed-bandit problem.
Branch: master
Clone or download
Lilian Weng
Lilian Weng Update README
Latest commit 6818c91 Jan 27, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information. Update README Jan 27, 2018 init code Jan 26, 2018 init code Jan 26, 2018
results_K10_N5000.png readme update Jan 26, 2018 add random seed Jan 27, 2018


This repo is set up for a blog post I wrote on "The Multi-Armed Bandit Problem and Its Solutions".

The result of a small experiment on solving a Bernoulli bandit with K = 10 slot machines, each with a randomly initialized reward probability.

Alt text

  • (Left) The plot of time step vs the cumulative regrets.
  • (Middle) The plot of true reward probability vs estimated probability.
  • (Right) The fraction of each action is picked during the 5000-step run.
You can’t perform that action at this time.