## A / B Testing: Chi-2 with Montana Library case study

In this notebook we perform a Chi square test with data from the Library of Montana University case study, applying a post-hoc correction to perform pairwise tests and find the true winner. 
Scipy approach.

We structure the steps by answering three questions: 
1.   What was the click-through rate for each version?
2.   Which version was the winner?
3.   Do the results seem conclusive?


### load libraries and data: 

In [1]:
import numpy as np
import pandas as pd
#numpy.set_printoptions(suppress=True)
pd.set_option("max_colwidth", 1000)
#pd.set_option("max_rows", 1000)

In [2]:
# Element list Homepage Version 1 - Interact, 5-29-2013.csv
v1 = pd.read_csv("../data_crazy_egg/HomepageVersion1.csv")

# Element list Homepage Version 2 - Connect, 5-29-2013.csv
v2 = pd.read_csv("../data_crazy_egg/HomepageVersion2.csv")

# Element list Homepage Version 3 - Learn, 5-29-2013.csv
v3 = pd.read_csv("../data_crazy_egg/HomepageVersion3.csv")

# Element list Homepage Version 4 - Help, 5-29-2013.csv
v4 = pd.read_csv("../data_crazy_egg/HomepageVersion4.csv")

# Element list Homepage Version 5 - Services, 5-29-2013.csv
v5 = pd.read_csv("../data_crazy_egg/HomepageVersion5.csv")

### 1. What was the click-through rate for each version?

In [3]:
v5.head(20)

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,69,a,FIND,397,True,Homepage Version 5 - Services • http://www.lib.montana.edu/index5.php
1,61,input,s.q,323,True,"created 5-29-2013 • 20 days 4 hours 59 mins • 2064 visits, 1348 clicks"
2,67,a,lib.montana.edu/find/,106,True,
3,62,button,Search,85,True,
4,98,a,Hours,81,True,
5,78,a,REQUEST,57,True,
6,129,area,Montana State University - Home,49,False,
7,87,a,SERVICES,45,True,
8,96,a,News,24,True,
9,76,a,lib.montana.edu/request/,22,True,


In [4]:
# first we get the visits on each page (they are in the last column of the second row, we read them manually)
v1_visits = 10283
v2_visits = 2742
v3_visits = 2747
v4_visits = 3180
v5_visits = 2064

In [5]:
# make a list of the page visits for each version
visits_list = [v1_visits, v2_visits, v3_visits, v4_visits, v5_visits]
visits_list

[10283, 2742, 2747, 3180, 2064]

In [6]:
# then we get the clicks for each version. We know the names for the different versions of the tested buttons 
# from the use case. Different labels are tested: INTERACT, CONNECT, LEARN, HELP and SERVICES

v1_clicks = list(v1[v1['Name'] == 'INTERACT']['No. clicks'])# we filter for only the rows that contain the specified 
#name of the butten and then we filter for the column that shows us the number of clicks
v2_clicks = list(v2[v2['Name'] == 'CONNECT']['No. clicks'])
v3_clicks = list(v3[v3['Name'] == 'LEARN']['No. clicks'])
v4_clicks = list(v4[v4['Name'] == 'HELP']['No. clicks'])
v5_clicks = list(v5[v5['Name'] == 'SERVICES']['No. clicks'])
v3_clicks

[21]

In [7]:
# make a list out of the clicks 
clicks_list = [v1_clicks, v2_clicks, v3_clicks, v4_clicks, v5_clicks]
clicks_list

[[42], [53], [21], [38], [45]]

In [8]:
# flatten the nested list
clicks_list = [element for sublist in clicks_list for element in sublist]

In [9]:
clicks_list

[42, 53, 21, 38, 45]

In [10]:
# get a list of the number of times the visitors have not clicked on the buttons
noclick_list = list()
for item1, item2 in zip(visits_list, clicks_list): # we zip together the lists and iterate over the the items of each list parralely
  noclick_list.append(item1 - item2) # for each pair of the zipped lists we do the operation item1-item2 and append the resuöt to noclick list

noclick_list

[10241, 2689, 2726, 3142, 2019]

In [11]:
# we get the click through rate for each version 
v1_CTR = v1_clicks[0] / noclick_list[0]
v2_CTR = v2_clicks[0] / noclick_list[1]
v3_CTR = v3_clicks[0] / noclick_list[2]
v4_CTR = v4_clicks[0] / noclick_list[3]
v5_CTR = v5_clicks[0] / noclick_list[4]
v5_CTR

0.022288261515601784

In [12]:
# and put them in a list
CTR_list = [v1_CTR, v2_CTR, v3_CTR, v4_CTR, v5_CTR]
CTR_list

[0.004101161995898838,
 0.019709929341762737,
 0.007703595011005136,
 0.0120942075111394,
 0.022288261515601784]

In [13]:
# we make a pandas dataframe wit the CTR for each version
rates = pd.Series(CTR_list)
names = pd.Series(["Interact", "Connect", "Learn", "Help", "Services"])
ctr_df = pd.DataFrame({"rates":rates, "names":names}).sort_values("rates")
ctr_df.sort_values("rates", ascending=False)
ctr_df

Unnamed: 0,rates,names
0,0.004101,Interact
2,0.007704,Learn
3,0.012094,Help
1,0.01971,Connect
4,0.022288,Services


In [14]:
# make contingency table for observed results
observed_3 = pd.DataFrame([clicks_list, noclick_list],
                           columns = ["INTERACT", "CONNECT", "LEARN", "HELP", "SERVICES"],
                           index = ["clicks", "no clicks"]
                            )
observed_3

Unnamed: 0,INTERACT,CONNECT,LEARN,HELP,SERVICES
clicks,42,53,21,38,45
no clicks,10241,2689,2726,3142,2019


--> It seems like **Services** is the winner and **Interact** is the button with the worst Click rate.
--> But we're not at the end here. We have to check for statistical significance of this result, meaning: How big is the chance that we got this differences between click rates for each button only due to chance. 

So we have to test for statistical significance. We do the following. 
1. We test for differences between all of the versions
2. If the differences are statistically significant we drop the worst version and perform the again a chi-square test
3. Until we reach a winner or we get the result, that there is no significant difference between the click rates. 

Null Hypothesis: **Interact, Connect, Learn, Help, Services** have the same ratio of clicks and no-clicks values

Alternative Hypthesis: **Interact, Connect, Learn, Help, Services** do not have the same ratio of clicks and no-clicks

Significance level: **95%** or 0.95

Alpha: 1 - 0.95 = 0.05

To reject the Null Hypothesis p-value needs to be less or equal to alpha (p-value  <= 0.05)

In [15]:
# we load stats from scipy and calculate che pvalue for the chisquare test, which tells us if we can reject 
# Null-Hypothesis
from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed_3)
pvalue

4.852334301093838e-20

In [16]:
print(pvalue <= 0.05)

True


In [17]:
# we put the whole proces of finding the winner into a function which performs the chi-square test and then drops the loser
# until a winner is found or there are no more statistical differences between the tested versions
def a_b_algo (dataframe, no_columns): # dataframe for contingency table has to be sorted from loser to winner 
# I still have to optimize the function to first sort the dataframe from loser to winner
  dataframe_new = dataframe.copy()
  for x in range(no_columns-1): 
    chisq, pvalue, df, expected = stats.chi2_contingency(dataframe_new)
    print(dataframe_new)
    if float(pvalue < 0.01): 
      print(pvalue)
      #ctr_df = ctr_df.iloc[1: , :]
      dataframe_new.drop(columns=dataframe_new.columns[0], axis=1, inplace=True)
    else:  
      print("Null Hypothesis can't be rejected")

In [18]:
a_b_algo(observed_3, 5)

           INTERACT  CONNECT  LEARN  HELP  SERVICES
clicks           42       53     21    38        45
no clicks     10241     2689   2726  3142      2019
4.852334301093838e-20
           CONNECT  LEARN  HELP  SERVICES
clicks          53     21    38        45
no clicks     2689   2726  3142      2019
5.25509870228566e-05
           LEARN  HELP  SERVICES
clicks        21    38        45
no clicks   2726  3142      2019
8.044453904790285e-05
           HELP  SERVICES
clicks       38        45
no clicks  3142      2019
0.007370912499282061


## **Result:** 
We have a clear winner with the **Services** Button. It performs even better than the second best version **Help**. The chance that the differences in CTR between Services and Help is only due to chance is less than one percent. So we can definitely reject the Null-Hypothesis. 