# Thompson Sampling
Algorithm to analyze [multi armed bandit problems](multi_armed_bandit.ipynb). It uses the [beta distribution](../../beta_distribution.ipynb) to determine which `arm` to play. 

Below is an example of a `bernoulli multi-armed bandit`

In [17]:
# this code was adapted from the first reference listed below
import numpy as np

print("Begin Thompson sampling demo ")
print("Goal is to maximize payout from three machines")

N = 3 # number of machines
means = np.array([0.3, 0.7, 0.5]) # probability that each machine will pay out
print(f"Machines pay out with probs {means}")

probs = np.zeros(N) # probabilities generated at each trial will go here
S = np.zeros(N, dtype=np.int) # stores cummulative number of successes
F = np.zeros(N, dtype=np.int) # stores cummulative number of failures
rnd = np.random.RandomState(7)

for trial in range(10):
    print("\ntrial " + str(trial))

    for i in range(N): 
        probs[i] = rnd.beta(S[i] + 1, F[i] + 1)

    print("sampling probs =  ", end="")
    for i in range(N):
        print(f"{probs[i]:0.4f} ", end="")
    print()

    machine = np.argmax(probs)
    print("playing machine " + str(machine), end="")

    p = rnd.random_sample()  # [0.0, 1.0)

    if p < means[machine]:
        print(" -- win")
        S[machine] += 1
    else:
        print(" -- lose")
        F[machine] += 1

    print("final success vector: ", end="")
    print(S)
    print("final failure vector: ", end="")
    print(F)

Begin Thompson sampling demo 
Goal is to maximize payout from three machines
Machines pay out with probs [0.3 0.7 0.5]

trial 0
sampling probs =  0.0891 0.8743 0.3494 
playing machine 1 -- win
final success vector: [0 1 0]
final failure vector: [0 0 0]

trial 1
sampling probs =  0.1862 0.3810 0.0398 
playing machine 1 -- lose
final success vector: [0 1 0]
final failure vector: [0 1 0]

trial 2
sampling probs =  0.2957 0.4906 0.2945 
playing machine 1 -- win
final success vector: [0 2 0]
final failure vector: [0 1 0]

trial 3
sampling probs =  0.5661 0.5842 0.3787 
playing machine 1 -- win
final success vector: [0 3 0]
final failure vector: [0 1 0]

trial 4
sampling probs =  0.1958 0.5250 0.9966 
playing machine 2 -- win
final success vector: [0 3 1]
final failure vector: [0 1 0]

trial 5
sampling probs =  0.9278 0.8285 0.5601 
playing machine 0 -- lose
final success vector: [0 3 1]
final failure vector: [1 1 0]

trial 6
sampling probs =  0.6043 0.8013 0.8184 
playing machine 2 -- win
f

### References:
* [Visual Studio Magazine](https://visualstudiomagazine.com/articles/2019/06/01/thompson-sampling.aspx)