# Designing the experiment

1) Hypothesis: create a null hypothesis to test. There is always a null hypothesis H0 and an alternative hypothesis H1 to be tested.
null hypothesis is to assume that the two groups have the same distribution, depending on the measured outcome, they have similar mean/median or conversion rate.

2) Defining a metric is necessary to be able to measure the performance within the groups. Metrics could be latency (how fast the search algorithm works), 
conversion rate (CR), click through rate (CTR), mean reciprocal rank (MRR) of clicks, time spent on result pages.
we also have to define that the metric measurement is individual or group level.

3) sample size, confidence level, statistical power and effect size. To be able to run a A/B testing, enough data points should be collected to achieve 
a statistically significant results and be able to trust the analysis. It is vital to minimise the type I, II errors which are false positive and
false negative rates; respectively. confidence level of 95% and power of 0.9 is desirable. The effect size is about how big is the difference between 
the groups. To be able to measure small effect sizes of 0.1-0.4 a larger sample is required. To estimate the effect size, previous historical data or pilot runs are required.

4) randomization to make sure both groups are representative of the population and are not biased (gender, geographically ...)
or at least both groups are biased in the same way. These are the factors that should be controlled for.

5) Analysing mean/median of the measured metric such as time spent for each group by using t-test, f-test or z-test. Analysing the difference
for binomial outcomes such as conversion rate, chi-squared and z-test could be used.


In [113]:
import pandas as pd

input= pd.read_csv('AB_input.csv',engine="python",index_col=False, delimiter='\;')

In [147]:
#input.set_index('date_id',inplace=True)

input.head(10)


Unnamed: 0_level_0,group_id,session_result,session_count
date_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-06-25,A,0,763
2018-06-25,B,0,777
2018-06-25,A,1,3597
2018-06-25,B,1,3551
2018-06-26,A,0,694
2018-06-26,B,0,869
2018-06-26,A,1,3353
2018-06-26,B,1,3514
2018-06-27,A,0,626
2018-06-27,B,0,834


## pivoting table for each group

In [115]:
input.info()

<class 'pandas.core.frame.DataFrame'>
Index: 40 entries, 2018-06-25 to 2018-07-04
Data columns (total 3 columns):
group_id          40 non-null object
session_result    40 non-null int64
session_count     40 non-null int64
dtypes: int64(2), object(1)
memory usage: 1.2+ KB


In [116]:
input["session_result"] = input["session_result"].astype("str")


In [117]:
import numpy as np
input_pv=pd.pivot_table(input, values='session_count', index=['group_id'],
                     columns=['session_result'], aggfunc=np.sum)

In [118]:
input_pv


session_result,0,1
group_id,Unnamed: 1_level_1,Unnamed: 2_level_1
A,11072,31357
B,10947,34676


In [119]:
input_pv['session_total']=input_pv['0']+input_pv['1']

In [120]:
input_pv['success_rate']=(input_pv['1']/input_pv['session_total'])*100

In [81]:
ratio=(45623/42429)
ratio

1.0752787008885432

In [121]:
input_pv

session_result,0,1,session_total,success_rate
group_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,11072,31357,42429,73.904641
B,10947,34676,45623,76.005524


In [85]:
input[input['group_id']=='A'].groupby(['session_result'])['session_count'].sum()

session_result
0    11072
1    31357
Name: session_count, dtype: int64

# A/B testing analysis 

Group B has higher conversion rate.

The hypothesis to test are:
 H₀: group A and B, both have the same convertion rate
 H₁: group B has the higher conversion rate

In [122]:
# to test the null hypothesis, we calculate the expected conversion rate if both groups were the same
cr_ex= sum(input_pv['1'])/sum(input_pv['session_total'])
cr_ex

0.7499318584472812

In [123]:
input_pv['1_ex']=(cr_ex*input_pv['session_total']).round().astype(int)
input_pv['0_ex']=input_pv['session_total']-input_pv['1_ex']

input_pv

session_result,0,1,session_total,success_rate,1_ex,0_ex
group_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,11072,31357,42429,73.904641,31819,10610
B,10947,34676,45623,76.005524,34214,11409


In [146]:
from scipy.stats import chi2


ex = np.array([input_pv['1_ex'].iloc[0], input_pv['1_ex'].iloc[1], input_pv['0_ex'].iloc[0], input_pv['0_ex'].iloc[1]])
obs = np.array([input_pv['1'].iloc[0], input_pv['1'].iloc[1], input_pv['0'].iloc[0], input_pv['0'].iloc[1]])

dis = np.sum(np.square(ex-obs)/ex)

pval = chi2.sf(dis, df=1)

print("distance : {0}\np-value: {1}".format(dis,pval))

distance : 51.772202360956705
p-value: 6.232801498429553e-13


I use confidence level of 95% so alpha=0.05.
The p-value above is much smaller than alpha (P-value << alpha). 
This means that we can reject the null hypothesis which was that conversion rate in groups A and are the same.


# Z Test as an altenative approach

In [125]:
from scipy.stats import norm

mu_A = input_pv['success_rate'].iloc[0]/100
mu_B = input_pv['success_rate'].iloc[1]/100

var_B = mu_B * (1-mu_B)
var_A = mu_A * (1-mu_A)

n_A = input_pv['session_total'].iloc[0]
n_B = input_pv['session_total'].iloc[1]

Z = (mu_B - mu_A)/np.sqrt(var_B/n_B + var_A/n_A)
pval = norm.sf(Z)

print("Z-score: {0}\np-value: {1}".format(Z,pval))


Z-score: 7.187913851513395
p-value: 3.289437404838372e-13


The result of Z test shows that the P value << 0.05. Again this means that we are safe to reject the null hypothesis.

# Calculating Statistical Power:

The effect size in the binomial A/B testing can be relative conversion proportion. 

In our case, effect size is:

In [140]:
#effect= (input_pv['success_rate'].iloc[1]/input_pv['success_rate'].iloc[0])-1

effect = (mu_B - mu_A) / (np.sqrt((var_B  + var_A) / 2))
effect


0.04850306233040932

In [143]:
from statsmodels.stats.power import zt_ind_solve_power
power = zt_ind_solve_power(effect, nobs1=42429, ratio=ratio, alpha=0.05,power=None)
power

0.9999999159718144

Power analysis shows the probability of rejecting the null hypothesis H0 when the alternative hypothesis H1 is the hypothesis that is true.