For this analysis we will use two variations of the Thompson Sampling (TS) allgorithm. <br>In the first experiment, we will allow the server to allocate views randomly for both buttons A and B in order to monitor the current behaviour of the users and then allow the algorithm to take over and assign the views according to the collected data. <br>This minimum exploration period ensures that we monitor current user behavior without relying on historical data at all. For this reason we begin with uninformative beta priors.

For this reason we run the thompson_server.py with these setting. Initialize both bandits withusing initial_exploration=True and for bandit A we set a_prior=1 and b_prior=1.

In [4]:
import pandas as pd
import numpy as np
import os
from pathlib import Path
#os.chdir(Path.cwd().parent)

results = pd.read_csv("data/simulation_results.csv")
results = pd.Series(results.values[0], index=results.columns)
results

algorithm          Thompson
true_ctr_a            0.072
true_ctr_b            0.102
bandit_ctr_a       0.048309
bandit_ctr_b       0.100334
button_a_clicks          10
button_a_views          207
button_b_clicks         180
button_b_views         1794
total_regret         14.102
a_alpha                  11
a_beta                  198
b_alpha                 181
b_beta                 1615
dtype: object

In [5]:
N_SAMPLES = 10000
MDE = 0.02

sample_a = np.random.beta(results.loc["a_alpha"], results.loc["a_beta"], N_SAMPLES)
sample_b = np.random.beta(results.loc["b_alpha"], results.loc["b_beta"], N_SAMPLES)

diff = sample_b > (sample_a + MDE)
diff.mean()

np.float64(0.9424)

![Cumulative Reward](../data/figures/cumulative_reward.png)

![Posterior Grid](../data/figures/posterior_grid.png)

<hr>

For the second experiment we will make use of our prior belierfs about the current ctr of button A and we will allow the argorithm to handle exploration. We believe  the ctr of A to be around 7% so we will initialize banditA with a_prior = 6 and b_prior = 78 and minimum_exploration = False

In [None]:
results = pd.read_csv("data/simulation_results.csv")
results = pd.Series(results.values[0], index=results.columns)
results

algorithm          Thompson
true_ctr_a            0.072
true_ctr_b            0.102
bandit_ctr_a         0.0125
bandit_ctr_b          0.102
button_a_clicks           1
button_a_views           80
button_b_clicks         255
button_b_views         2500
total_regret           7.16
a_alpha                   7
a_beta                  157
b_alpha                 256
b_beta                 2246
dtype: object

In [13]:
N_SAMPLES = 10000
MDE = 0.02

sample_a = np.random.beta(results.loc["a_alpha"], results.loc["a_beta"], N_SAMPLES)
sample_b = np.random.beta(results.loc["b_alpha"], results.loc["b_beta"], N_SAMPLES)

diff = sample_b > (sample_a + MDE)
diff.mean()

np.float64(0.9814)

In [11]:
os.chdir("ab_testing")
os.getcwd()

'/home/fotis/pylab/repos/ab_testing'

![Cumulative Reward](../data/figures/cumulative_reward.png)

![Posterior Grid](../data/figures/posterior_grid.png)