# The Thompson Sampling model
Here is a simple example of using Reinforcement Learning through the Thompson Sampling model. We aim to build a model that helps us identify, among five slot machines gaming, the one that offers us the highest probability of winning.

In [1]:
# 1. Importing the libraries
import numpy as np


In [2]:
# 2. Setting conversion rates and the number of samples
conversionRates = [0.15, 0.04, 0.13, 0.11, 0.05]
N = 10000 # is the number of attempts 
d = len(conversionRates) # is the number of slot machines

In [3]:
# Creating the dataset.
X = np.zeros((N, d))
for i in range(N):
    for j in range(d):
        # if the random float is smaller, then that means you will win if you play this certain machine at this certain timestep.
        if np.random.rand() < conversionRates[j]:
            X[i][j] = 1


In [4]:
# Making arrays to count our losses and wins
nPosReward = np.zeros(d)
nNegReward = np.zeros(d)


Thompson Sampling uses a distribution function (distributions will be explained further in this chapter), called Beta, that takes two arguments. For simplicity's sake, let's say that the higher the first argument is, the better our slot machine is, and the higher the second argument is, the worse our slot machine is.

$x = {\beta(a, b)} $

In [5]:
# Taking our best slot machine through beta distribution and updating its losses and wins
for i in range(N):
    selected = 0
    maxRandom = 0

    for j in range(d):
        randomBeta = np.random.beta(nPosReward[j] + 1, nNegReward[j] + 1)
        if randomBeta > maxRandom:
            maxRandom = randomBeta
            selected = j

    if X[i][selected] == 1:
        nPosReward[selected] += 1
    else:
        nNegReward[selected] += 1

In [6]:
# Showing which slot machine is considered the best
nSelected = nPosReward + nNegReward 
for i in range(d):
    print('Machine number ' + str(i + 1) + ' was selected ' + str(nSelected[i]) + ' times')
print('Conclusion: Best machine is machine number ' + str(np.argmax(nSelected) + 1))


Machine number 1 was selected 8646.0 times
Machine number 2 was selected 61.0 times
Machine number 3 was selected 942.0 times
Machine number 4 was selected 289.0 times
Machine number 5 was selected 62.0 times
Conclusion: Best machine is machine number 1
