<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Statistical Testing: Calculating sample sizes for A/B testing

---



Before running an A/B test to compare Facebook sign ups for our game, it is a good idea to determine how many users will be needed to evaluate which sign up form performs best. 



### Calculating necessary sample sizes given
    - null hypothesis
    - expected effect size
    - false positive rate
    - false negative rate.
    
First, we'll import the usual Python modules:

In [1]:
import numpy as np
import scipy.stats

Imagine we have a click through rate of 5% with the original design. Call this p_a for probability(A). 

Suppose in addition that we decide that the click through rate must increase to at least 7% to make changing the design worthwhile. Call this p_b. 

Finally, we'll calculate the average click through rate, p, assuming that our sample sizes will be equal.

In [2]:
p_a = .05 # assume we have a base click rate of 5% for our original design (A group)
p_b = .07 # we want to detect an increase in click rate to 7%, otherwise not worth changing the design

p = (p_a + p_b)/2.

In addition to these two values, we'll need to decide on false positive and false negative rates. 

We can use these to look up values from the Normal distribution (results are labeled Z below). Here we chose 5% false positive rate (also called Type I error rate) and 80% power, equivalent to a 20% false negative rate (or Type II error rate). 

These rates are fairly standard, but completely arbitrary. 

These choices mean that we expect to falsely say that B is an improvement 5% of the time when actually it is no better than A, and we expect to falsely say B is *not* and improvement 20% of the time when actually it is better than A.

In [3]:
Z8 = scipy.stats.norm.ppf(.8) # we will need this to ensure 80% power (20% false negative rate)
Z95 = scipy.stats.norm.ppf(1 - .05) # we will need this for 5% false positive rate (95% confidence level), one-tailed
Z975 = scipy.stats.norm.ppf(1 - .025) # 5% false positive rate for two-tailed case

ES = abs(p_b - p_a)/np.sqrt(p*(1-p))

num_tails = 1 # presumably we are testing design b because we think it will improve the click rate...

if num_tails == 2:
    n = 2*((Z975 + Z8)/ES)**2  # two-tailed
else:
    n = 2*((Z95 + Z8)/ES)**2 # one-tailed

print('You need', round(n), 'samples in each group to get a 5% false positive and 20% false negative rate given effect size')

You need 1743.0 samples in each group to get a 5% false positive and 20% false negative rate given effect size


That's it! We have the sample sizes necessary given our requirements. In this case, we need about 1743 people to experience the A design and 1743 people to experience the B design.
