# AB Testing Simulation

In [7]:
from scipy.stats import beta,binom
import numpy as np


First we will define the current probability of selecting A, for simplicity we will call it a click on a website button. <br> Lets say the probability of currently getting a click is 5%.

In [8]:
p = .05

Next we will define the lift, or the increase from A to B, as 1%.  <br>
B is our new feature, or the redesign of button A.

In [9]:
lift = 1.01

Because getting a click is binary ("click" or "no click"), we are sampling from a binomial distribution, 0 or 1.<br>
We will create a sample of a million observations of visitors to our website, where getting a click ("1") on our button A, has the probability of 5%, and we'll call this set A.

In [10]:
A = binom.rvs(1,p, size = 1000000)

We do the same for B, except for this example we are designating B as a better design, or something more clickable because it's getting a 1% increase.

In [11]:
B = binom.rvs(1,p*lift, size = 1000000)

Since no click is equal to zero and one click is equal to one, we can simply take the sum of the sample in order to get the total amount of clicks.

In [12]:
A_clicked = sum(A)

The difference of our total sample and the observations which recieved clicks provides us with the amount not clicked.

In [15]:
A_not_clicked = len(A) - A_clicked

We want to estimate the probability of getting a click based on the data we have from our simulation, and then we can compare the estimated probability of A to B and vice versa.  The Bayesian method commonly uses the beta distribution as a conjugate prior for estimating the probability for binomial distributions.

In [45]:
Abeta = beta.rvs(A_clicked+1, A_not_clicked+1 , size=1000000)

Next, we give B, our new design, the same treatment we gave A.

In [46]:
B_clicked = sum(B)

In [47]:
B_not_clicked = len(B) - B_clicked

In [48]:
Bbeta = beta.rvs(B_clicked+1, B_not_clicked+1, size=1000000)

Finally, we can calculate the probability that our original design A has a greater amount of clicks than B, and also calculate the probability that our new design B has a greater amount of clicks than our original design A.

In [44]:
print "P(A > B): " + str(sum(Abeta > Bbeta) / float(len(A)))
print "P(B > A): " + str(sum(Bbeta > Abeta) / float(len(A)))

P(A > B): 0.117405
P(B > A): 0.882595


Therefore, our probability estimates show our B design is better at conversion than our A design.   We know that this is true becuase of the way we simulated each sample at the beginning.