# Introduction

Simply put, A/B Testing is a conceptual framework used to compare the effects of an exposure on two or more groups of individuals. This conceptual framework is commonly used in data science workflow to answer an experimental question. For example, 

> _Do users spend more time on the application when presented with an in-application engagement opportunity?_

We can imagine several types of engagement opportunities that might be deployed, ranging from a small token, new prompt, notification, and more.

A/B Testing, in and of itself, is **not** a statistical method. Rather, it defines a conceptual framework of steps used to determine if an exposure has an effect on a population. It employs specific statistical concepts, such as measures of central tendency and variation as well as hypothesis testing, to determine if a exposure produces an effect.

# Review of simple probability 

In [164]:
events_control = 10
total_control = 100
prob_control = events_control/total_control
print(prob_control)

0.1


In [165]:
# Binomial distribution for binary events
binomial_mean = events_control/total_control # equivalent to probability 
binomial_std = np.sqrt(((binomial_mean*(1-binomial_mean))/total_control))
print(binomial_std)


0.030000000000000002


In [168]:
# TO use normal, check n*p
print(total_control * prob_control)
print(total_control * (1-prob_control))

10.0
90.0


In [169]:
x = 100
N = 1000
p_hat = x/N

print(N*p_hat)
print(N*(1-p_hat))

SE = np.sqrt((p_hat*(1-p_hat))/N)
print(SE)

print(-1.96*SE)
print(1.96*SE)

100.0
900.0
0.009486832980505138
-0.018594192641790068
0.018594192641790068


In [176]:
# norm ppf for a one-sided t-test, so we need to incrase to 
# 0.975 to represent 0.025 on one side (or 0.05 for two side)
scs.norm.ppf(0.975)

1.959963984540054

In [203]:
def synthetic_data(p_A, p_B, n_A, n_B):
    
    # A
    K = round(n_A*(1-p_A))
    a_arr = np.array([0] * K + [1] * (n_A-K))
    np.random.shuffle(a_arr)
    
    # B
    K = round(n_B*(1-p_B))
    b_arr = np.array([0] * K + [1] * (n_B-K))
    np.random.shuffle(b_arr)
    
    # Organize into dataframe
    df = pd.DataFrame({'Control': a_arr,
                       'Test': b_arr})
        
    return df

In [216]:
testdf = synthetic_data(0.10, 0.15, 1000, 1000)

In [217]:
testdf

Unnamed: 0,Control,Test
0,0,0
1,0,0
2,0,1
3,0,0
4,0,1
...,...,...
995,0,0
996,0,0
997,0,0
998,0,0


In [222]:
testdf_long = testdf.melt()
testdf_long.columns = ['Group', 'Conversion']
testdf_long

Unnamed: 0,Group,Conversion
0,Control,0
1,Control,0
2,Control,0
3,Control,0
4,Control,0
...,...,...
1995,Test,0
1996,Test,0
1997,Test,0
1998,Test,0
