## A / B Testing

Running a Chi square test with data from the Library of Montana University

Resampling approach + scipy approach

### Data reading

The important pieces of information (clicks on each element of interest & visits on each page) are scattered around. Let's collect them:

In [35]:
import pandas as pd
pd.set_option("max_colwidth", 1000)
pd.set_option("max_rows", 1000)

# read data
v1 = pd.read_csv("./CrazyEgg/V1/Interact.csv")
v2 = pd.read_csv("./CrazyEgg/V2/Connect.csv")
v3 = pd.read_csv("./CrazyEgg/V3/Learn.csv")
v4 = pd.read_csv("./CrazyEgg/V4/Help.csv")
v5 = pd.read_csv("./CrazyEgg/V5/Services.csv")

In [36]:
# clicks on each element
v1_clicks = int(v1.loc[v1["Name"]=="INTERACT"]["No. clicks"])
v2_clicks = int(v2.loc[v2["Name"]=="CONNECT"]["No. clicks"])
v3_clicks = int(v3.loc[v3["Name"]=="LEARN"]["No. clicks"])
v4_clicks = int(v4.loc[v4["Name"]=="HELP"]["No. clicks"])
v5_clicks = int(v5.loc[v5["Name"]=="SERVICES"]["No. clicks"])

In [37]:
print(v1_clicks, v2_clicks, v3_clicks, v4_clicks, v5_clicks)

42 53 21 38 45


In [38]:
# visits on each page (they are in the last column of the second row, we read them manually)
v1_visits = 10283
v2_visits = 2742
v3_visits = 2747
v4_visits = 3180
v5_visits = 2064

#### Click Through rate

Defined as clicks / visits

In [39]:
# click-through rates
interact_rate = float(v1_clicks / v1_visits)
connect_rate = float(v2_clicks / v2_visits)
learn_rate = float(v3_clicks / v3_visits)
help_rate = float(v4_clicks / v4_visits)
services_rate = float(v5_clicks / v5_visits)

In [40]:
# CTR from worst to best
rates = pd.Series([interact_rate, connect_rate, learn_rate, help_rate, services_rate])
names = pd.Series(["Interact", "Connect", "Learn", "Help", "Services"])

pd.DataFrame({"rates":rates, "names":names}).sort_values("rates")

Unnamed: 0,rates,names
0,0.004084,Interact
2,0.007645,Learn
3,0.01195,Help
1,0.019329,Connect
4,0.021802,Services


#### Contingency table

For observed values. We note clicks and no-clicks (defined as visits - clicks)

In [41]:
# no-clicks
v1_noclick = v1_visits - v1_clicks
v2_noclick = v2_visits - v2_clicks
v3_noclick = v3_visits - v3_clicks
v4_noclick = v4_visits - v4_clicks
v5_noclick = v5_visits - v5_clicks

In [42]:
# contingency table as a pd.DataFrame creation
clicks = pd.Series([v1_clicks, v2_clicks, v3_clicks, v4_clicks, v5_clicks])
noclicks = pd.Series([v1_noclick, v2_noclick, v3_noclick, v4_noclick, v5_noclick])

observed = pd.DataFrame(data = [clicks, noclicks])
observed.columns = ["Interact", "Connect", "Learn", "Help", "Services"]
observed.index = ["Click", "No-click"]

observed

Unnamed: 0,Interact,Connect,Learn,Help,Services
Click,42,53,21,38,45
No-click,10241,2689,2726,3142,2019


## Resampling approach

In [43]:
visits = [v1_visits, v2_visits, v3_visits, v4_visits, v5_visits]

overall_clicks = clicks.sum()
overall_visits = sum(visits)
print("clicks:", overall_clicks, "|  visits:", overall_visits)

# expected click-through rate
expected_CTR = overall_clicks / overall_visits
expected_CTR

clicks: 199 |  visits: 21016


0.009468976018271793

In [44]:
# expected clicks per version
v1_exp_clicks = v1_visits * expected_CTR
v2_exp_clicks = v2_visits * expected_CTR
v3_exp_clicks = v3_visits * expected_CTR
v4_exp_clicks = v4_visits * expected_CTR
v5_exp_clicks = v5_visits * expected_CTR

In [45]:
# expected no clicks per version
v1_exp_noclick = v1_visits - v1_exp_clicks
v2_exp_noclick = v2_visits - v2_exp_clicks
v3_exp_noclick = v3_visits - v3_exp_clicks
v4_exp_noclick = v4_visits - v4_exp_clicks
v5_exp_noclick = v5_visits - v5_exp_clicks

In [46]:
# contingency table for expected values
exp_clicks = pd.Series([v1_exp_clicks, 
                        v2_exp_clicks, 
                        v3_exp_clicks, 
                        v4_exp_clicks, 
                        v5_exp_clicks])

exp_noclicks = pd.Series([v1_exp_noclick,
                          v2_exp_noclick, 
                          v3_exp_noclick, 
                          v4_exp_noclick, 
                          v5_exp_noclick])

expected = pd.DataFrame(data = [exp_clicks, exp_noclicks])
expected.columns = ["Interact", "Connect", "Learn", "Help", "Services"]
expected.index = ["Click", "No-click"]

expected

Unnamed: 0,Interact,Connect,Learn,Help,Services
Click,97.36948,25.963932,26.011277,30.111344,19.543967
No-click,10185.63052,2716.036068,2720.988723,3149.888656,2044.456033


In [47]:
expected.round()

Unnamed: 0,Interact,Connect,Learn,Help,Services
Click,97.0,26.0,26.0,30.0,20.0
No-click,10186.0,2716.0,2721.0,3150.0,2044.0


In [48]:
observed

Unnamed: 0,Interact,Connect,Learn,Help,Services
Click,42,53,21,38,45
No-click,10241,2689,2726,3142,2019


Function to compute the pearson residuals. Measures the extent to which the actual counts differ from the expected counts.

In [49]:
import numpy as np
def pearson_residuals(obs, exp):
    return (obs - exp) / (np.sqrt(exp))

In [50]:
# observed chi statistic
R = pearson_residuals(observed, expected) # pearson residuals
R = R**2 # pearson residuals squared
chi2observed = R.sum().sum() # sum of squared pearson residuals
chi2observed

96.74323537983281

In [51]:
overall_noclicks = overall_visits - overall_clicks

We create a "box" with as many clicks as we got over all pages, represented as 1's, and as many no-clicks as we got represented as 0's.

In [52]:
import random
box = [1] * overall_clicks
box.extend([0]*overall_noclicks)
random.shuffle(box)
#print(box)

We run the test many times and count how many times the chi square statistic is greater than the observed (i.e. how many times something more extreme happens by chance).

In [53]:
chi_scores = []
visits = [v1_visits, v2_visits, v3_visits, v4_visits, v5_visits]

for sim in range(1000):
    sample_clicks = [sum(random.sample(population=box, k=v)) for v in visits]
    sample_noclicks = [visits[i] - sample_clicks[i] for i in range(5)]

    observed_ = pd.DataFrame([sample_clicks, sample_noclicks],
                             columns = ["Interact", "Connect", "Learn", "Help", "Services"],
                             index = ["Click", "No-click"])

    R = pearson_residuals(observed_, expected)
    R = R**2

    chi_scores.append(R.sum().sum())
    
resampled_p_value = sum(chi_scores >= chi2observed) / len(chi_scores)
resampled_p_value

0.0

## Scipy approach

In [54]:
observed

Unnamed: 0,Interact,Connect,Learn,Help,Services
Click,42,53,21,38,45
No-click,10241,2689,2726,3142,2019


In [55]:
from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed)
pvalue

4.852334301093838e-20

## How do we decide who's the winner?

Read about [Post Hoc Tests](https://alanarnholt.github.io/PDS-Bookdown2/post-hoc-tests-1.html) and find out whether we can declare a clear winner!

In [27]:
# your code here