In [1]:
## A/B Testing - Comparison of Montana State University Library Website Button Performances ##

In [2]:
# Hypothesis testing: It is a statistical analysis method used to test a belief/an argument.
# A/B Testing: It is used to measure the effect of a change between 2 groups or to compare the mean/proportion of 2 groups.
# The main purpose of group comparisons is to test whether possible differences occur by chance.

In [3]:
## PROJECT STEPS ##
# 1. Business Problem
# 2. Data Understanding & Preparing
# 3. A/B Testing (Chi-square Test)

In [4]:
## 1. Business Problem ##

# Montana State University Library has a website that students use to find books and articles.
# On the homepage, under the library image, there is a search bar and 3 large elements/buttons: “Find”, “Request”, “Interact”.
# These buttons provide access to important information and services about the library.

# However, Web Analytics shows that the “Interact” button, ironically, has almost no interaction.
# The way to measure the performance of each of the 3 categories is through click-through rate (CTR).
# This is a common term in online marketing and usually describes the number of clicks divided by the number of times an ad was viewed.

# The main goal of this project is to conduct an A/B Test to test the CTR (click-through rate) of different texts on the “Interact” button on the Montana State University website.
# The website team identified 4 different new versions/texts to test against the “Interact” button: Connect, Learn, Help, Services.

# The metrics to track are:
# Click-through rate (CTR): Number of clicks on the button divided by the total page visits. Selected as a measure of the initial ability of the category title to attract users.
# Drop-off rate for category pages: Percentage of visitors who leave the site from a given page, selected as a measure of the category page’s ability to meet user expectations.
# Homepage-return rate: Percentage of users who navigate from the library homepage to the category page and then return to the homepage. Selected as a measure of the category page’s ability to meet user expectations.

In [5]:
## 2. Data Understanding and Preparing ##

In [6]:
# import libraries

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [7]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 50)
pd.set_option("display.width", 500)
pd.set_option("display.precision", 4)

In [12]:
# import data

interact = pd.read_csv('Interact - Element list Homepage Version 1, 5-29-2013.csv')
help = pd.read_csv('Help - Element list Homepage Version 4 , 5-29-2013.csv')
connect = pd.read_csv('Connect - Element list Homepage Version 2, 5-29-2013.csv')
learn = pd.read_csv('Learn - Element list Homepage Version 3 , 5-29-2013.csv')
services = pd.read_csv('Services - Element list Homepage Version 5 , 5-29-2013.csv')

In [14]:
interact.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,128,area,Montana State University - Home,1291,False,Homepage Version 1 - Interact • http://www...
1,69,a,FIND,842,True,created 5-29-2013 • 20 days 4 hours 21 min...
2,61,input,s.q,508,True,
3,67,a,lib.montana.edu/find/,166,True,
4,78,a,REQUEST,151,True,


In [15]:
help.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,74,a,FIND,631,True,Homepage Version 4 - Help • http://www.lib...
1,66,input,s.q,364,True,created 5-29-2013 • 20 days 4 hours 59 min...
2,72,a,lib.montana.edu/find/,139,True,
3,133,area,Montana State University - Home,122,False,
4,83,a,REQUEST,72,True,


In [16]:
connect.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,74,a,FIND,502,True,Homepage Version 2 - Connect • http://www....
1,66,input,s.q,357,True,created 5-29-2013 • 20 days 7 hours 34 min...
2,72,a,lib.montana.edu/find/,171,True,
3,133,area,Montana State University Libraries - Home,83,False,
4,103,a,Hours,74,True,


In [17]:
learn.head()

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,69,a,FIND,587,True,Homepage Version 3 - Learn • http://www.li...
1,61,input,s.q,325,True,created 5-29-2013 • 20 days 12 hours 21 mi...
2,67,a,lib.montana.edu/find/,142,True,
3,128,area,Montana State University - Home,83,False,
4,98,a,Hours,76,True,


In [20]:
services.head(25)

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information
0,69,a,FIND,397,True,Homepage Version 5 - Services • http://www...
1,61,input,s.q,323,True,created 5-29-2013 • 20 days 4 hours 59 min...
2,67,a,lib.montana.edu/find/,106,True,
3,62,button,Search,85,True,
4,98,a,Hours,81,True,
5,78,a,REQUEST,57,True,
6,129,area,Montana State University - Home,49,False,
7,87,a,SERVICES,45,True,
8,96,a,News,24,True,
9,76,a,lib.montana.edu/request/,22,True,


In [None]:
## Preparing Data ##

In [21]:
# Observed Resulsts

Click = [42, 38, 45, 53, 21] # Number of clicks for each category, for example: number of clicks on services when there is a services button.)
No_click = [10241, 3142, 2019, 2689, 2726] # Number of non-clicks for each category, i.e. remaining clicks)

In [24]:
# Create a Dataframe

observed = pd.DataFrame([Click, No_click],
columns=["Interact", "Help", "Services", "Connect", "Learn"],
index=["Click", "No_click"])

observed

observed_expanded = observed.copy()
observed_expanded

Unnamed: 0,Interact,Help,Services,Connect,Learn
Click,42,38,45,53,21
No_click,10241,3142,2019,2689,2726


In [26]:
# Add the totals of each row
observed_expanded["Total"] = observed_expanded["Interact"] + observed_expanded["Help"] + observed_expanded["Services"] + \
                             observed_expanded["Connect"] + observed_expanded["Learn"]

In [27]:
# Add the totals of each column
observed_expanded.loc["Total", :] = observed_expanded.sum()

In [28]:
observed_expanded

Unnamed: 0,Interact,Help,Services,Connect,Learn,Total
Click,42.0,38.0,45.0,53.0,21.0,199.0
No_click,10241.0,3142.0,2019.0,2689.0,2726.0,20817.0
Total,10283.0,3180.0,2064.0,2742.0,2747.0,21016.0


In [29]:
# Combining all the versions in a dataframe

interact_clicks=interact.loc[[9]]
interact_clicks['visits'] = 10283
interact_clicks['total_clicks']=3714
interact_clicks

help_clicks=help.loc[[7]]
help_clicks['visits'] = 3180
help_clicks['total_clicks']=1717
help_clicks

services_clicks=services.loc[[7]]
services_clicks['visits'] = 2064
services_clicks['total_clicks']=1348
services_clicks

connect_clicks=connect.loc[[6]]
connect_clicks['visits'] = 2742
connect_clicks['total_clicks']=1587
connect_clicks

learn_clicks=learn.loc[[10]]
learn_clicks['visits'] = 2747
learn_clicks['total_clicks']=1652
learn_clicks

all_versions_clicks = pd.concat([interact_clicks,help_clicks,services_clicks,connect_clicks,learn_clicks])
all_versions_clicks

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information,visits,total_clicks
9,87,a,INTERACT,42,True,,10283,3714
7,92,a,HELP,38,True,,3180,1717
7,87,a,SERVICES,45,True,,2064,1348
6,92,a,CONNECT,53,True,,2742,1587
10,87,a,LEARN,21,True,,2747,1652


In [30]:
# We can find the CTR by dividing number of clicks values by visits

all_versions_clicks["CTR"] = (all_versions_clicks["No. clicks"] / all_versions_clicks["visits"]) * 100

all_versions_clicks.sort_values(by="CTR", ascending=False)

Unnamed: 0,Element ID,Tag name,Name,No. clicks,Visible?,Snapshot information,visits,total_clicks,CTR
7,87,a,SERVICES,45,True,,2064,1348,2.1802
6,92,a,CONNECT,53,True,,2742,1587,1.9329
7,92,a,HELP,38,True,,3180,1717,1.195
10,87,a,LEARN,21,True,,2747,1652,0.7645
9,87,a,INTERACT,42,True,,10283,3714,0.4084


In [32]:
# Creating CTR Dataframe

CTR = [0.408441, 1.194969, 2.180233, 1.932896, 0.764470]
columns = ["Interact", "Help", "Services", "Connect", "Learn"]

CTR_df = pd.DataFrame([CTR], 
                    columns = [columns],
                    index = ["CTR"])
CTR_df

Unnamed: 0,Interact,Help,Services,Connect,Learn
CTR,0.4084,1.195,2.1802,1.9329,0.7645


In [None]:
# For CTR:

# “Services” and “Connect” are the best performers.
# “Interact” and “Learn” are the worst performers,
#  services > connect > help > learn > interact

# This tells us that some versions perform better (or worse) than others.
# We can be sure that the best version (services) performs better than the worst version (interact).
# But we can't be sure that the differences between them are statistically significant, we'll test that.

In [None]:
## 3. A/B Testing (Chi-square Test) ##

In [None]:
# A/B Testing Steps:

# 1) Defining Hypotheses
# 2) Testing Hypotheses
# 3) Interpreting the results according to p-value (H0 rejection if p < 0.05)

In [None]:
## TEST 1 ##

In [None]:
# 1) Defining Hypotheses

# Null Hypothesis (H0): CRT(interact) = CRT(help) = CRT(services) = CRT(connect) = CRT(learn)
# There is no statistically significant difference in conversion between the button versions.
# (In other words, no significant performance difference was observed between these buttons. The observed differences are due to chance.)

# Alternative Hypothesis (H1): There is a statistically significant difference in conversion between the button versions.
# (In other words, there is a significant performance difference between these buttons. One or more of them performed better.)

In [44]:
# 2) Testing Hypotheses

from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed, correction=False)

print("chisq =", chisq)
print("pvalue =", pvalue)
print("df =", df)
print("expected =", expected)

chisq = 96.7432353798328
pvalue = 4.852334301093838e-20
df = 4
expected = [[   97.3694804     30.11134374    19.5439665     25.96393224
     26.01127712]
 [10185.6305196   3149.88865626  2044.4560335   2716.03606776
   2720.98872288]]


In [45]:
alpha = 0.05
if pvalue > alpha:
    print("The p-value is larger than alpha. We can not reject the H0 Hypothesis")
else:
    print("The p-value is smaller than alpha. We reject the H0 Hypothesis")

The p-value is smaller than alpha. We reject the H0 Hypothesis


In [None]:
# The p-value is smaller than alpha. We reject the H0 Hypothesis.

# H0: CRT(interact) = CRT(help) = CRT(services) = CRT(connect) = CRT(learn)
# The result indicates that there is statistically significant difference in conversion between the button versions. 
# This means that one or more of the button versions performed better than the others, and the differences observed are unlikely to be due to chance.

In [None]:
## TEST 2 ##

In [47]:
# We rejected the H0 hypothesis, meaning there is a performance difference between the buttons.
# One possible approach to solve this is to narrow down the candidates: eliminate the worst performers and rerun the test.
# In this case, we will leave “Interact”. (services > connect > help > learn > interact)

observed = observed.drop("Interact", axis=1)
observed

Unnamed: 0,Help,Services,Connect,Learn
Click,38,45,53,21
No_click,3142,2019,2689,2726


In [None]:
# 1) Defining Hypotheses

# Null Hypothesis (H0): CRT(help) = CRT(services) = CRT(connect) = CRT(learn)
# There is no statistically significant difference in conversion between the 4 button versions.
# (That is, no significant performance difference was observed between these buttons. The differences seen are due to chance.)

# Alternative Hypothesis (H1):
# There is a Statistically Significant Difference in Conversion between the 4 button versions.
# (That is, there is a significant performance difference between these buttons. )

In [48]:
# 2) Testing Hypotheses

from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed, correction=False)

print("chisq =", chisq)
print("pvalue =", pvalue)
print("df =", df)
print("expected =", expected)

chisq = 22.450979530401828
pvalue = 5.25509870228566e-05
df = 3
expected = [[  46.51635144   30.19174509   40.10938228   40.1825212 ]
 [3133.48364856 2033.80825491 2701.89061772 2706.8174788 ]]


In [49]:
alpha = 0.05
if pvalue > alpha:
    print("The p-value is larger than alpha. We can not reject the H0 Hypothesis")
else:
    print("The p-value is smaller than alpha. We reject the H0 Hypothesis")

The p-value is smaller than alpha. We reject the H0 Hypothesis


In [None]:
# The p-value is smaller than alpha. We reject the H0 Hypothesis.

# H0: CRT(help) = CRT(services) = CRT(connect) = CRT(learn)
# The result indicates that there is statistically significant difference in conversion between the button versions. 
# This means that one or more of the button versions performed better than the others, and the differences observed are unlikely to be due to chance.

In [None]:
## TEST 3 ##

In [51]:
# We rejected the H0 hypotheses, meaning there is a performance difference between the buttons.
# We narrow down the candidates: We eliminate the second worst-performing candidate and rerun the test.
# In this case, we will drop Learn. (services > connect > help > learn > interact)

observed = observed.drop("Learn", axis=1)
observed

Unnamed: 0,Help,Services,Connect
Click,38,45,53
No_click,3142,2019,2689


In [None]:
# 1) Defining Hypotheses

# Null Hypothesis (H0): CRT(help) = CRT(services) = CRT(connect)
# There is no Statistically Significant Difference in conversion between the 3 button versions.
# (That is, no significant performance difference was observed between these buttons. The differences seen are due to chance.)

# Alternative Hypothesis (H1):
# There is Statistically Significant Difference in conversion between the 3 button versions.
# (That is, there is a significant performance difference between these buttons. )

In [52]:
# 2) Testing Hypotheses

from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed, correction=False)

print("chisq =", chisq)
print("pvalue =", pvalue)
print("df =", df)
print("expected =", expected)

chisq = 8.57683071094785
pvalue = 0.013726659948517513
df = 2
expected = [[  54.15477085   35.14951165   46.69571751]
 [3125.84522915 2028.85048835 2695.30428249]]


In [53]:
alpha = 0.05
if pvalue > alpha:
    print("The p-value is larger than alpha. We can not reject the H0 Hypothesis")
else:
    print("The p-value is smaller than alpha. We reject the H0 Hypothesis")

The p-value is smaller than alpha. We reject the H0 Hypothesis


In [None]:
# The p-value is smaller than alpha. We reject the H0 Hypothesis.

# H0: CRT(help) = CRT(services) = CRT(connect)
# The result indicates that there is statistically significant difference in conversion between the button versions.
# This means that one or more of the button versions performed better than the others, and the differences observed are unlikely to be due to chance.

In [None]:
## TEST 4 ##

In [55]:
# We rejected the H0 hypotheses, meaning there is a performance difference between the buttons.
# We narrow down the candidates: We eliminate the third worst-performing candidate and re-run the test.
# In this case we will drop Help. (services > connect > help > learn > interact)

observed = observed.drop("Help", axis=1)
observed

Unnamed: 0,Services,Connect
Click,45,53
No_click,2019,2689


In [None]:
# 1) Defining Hypotheses

# Null Hypothesis (H0): CRT(services) = CRT(connect)
# There is no Statistically Significant Difference in conversion between the 2 button versions.
# (That is, no significant performance difference was observed between these buttons. The differences seen are due to chance.)

# Alternative Hypothesis (H1):
# There is a Statistically Significant Difference in conversion between the 2 button versions.
# (That is, there is a significant performance difference between these buttons. )

In [56]:
# 2) Testing Hypotheses

from scipy import stats
chisq, pvalue, df, expected = stats.chi2_contingency(observed, correction=False)

print("chisq =", chisq)
print("pvalue =", pvalue)
print("df =", df)
print("expected =", expected)

chisq = 0.36064180794754525
pvalue = 0.5481500077135573
df = 1
expected = [[  42.08739076   55.91260924]
 [2021.91260924 2686.08739076]]


In [57]:
alpha = 0.05
if pvalue > alpha:
    print("The p-value is larger than alpha. We can not reject the H0 Hypothesis")
else:
    print("The p-value is smaller than alpha. We reject the H0 Hypothesis")

The p-value is larger than alpha. We can not reject the H0 Hypothesis


In [None]:
# The p-value is larger than alpha. We can not reject the H0 Hypothesis

# H0: CRT(services) = CRT(connect)
# There is no Statistically Significant Difference in conversion between the 2 button versions.
# The observed differences in the conversion rates between the different buttons are likely due to chance, and thus, the null hypothesis (H0) stands.

In [None]:
## Analysis of Results ##

In [None]:
# As a result of the hypothesis tests, the Services and Connect buttons showed a statistically significant difference in performance compared to the other 3 buttons.
# However, there is no statistically significant difference in conversion between the Services and Connect buttons.
# (In other words, no significant difference in performance was observed between these 2 buttons. The observed differences are due to chance.)

## Suggestions:
# The click-through rate (CTR) of Services and Connect is better than the other buttons (and the click-through rate of the Services button is higher.)
# However, these 2 versions have an equal probability of getting clicks, and the observed differences are due to chance. From now on, we can only focus on these two versions.

# For the following reasons, it is highly recommended to change the home page of the library and to present the design with the "Services" button:
# Services shows the best click-through rate (CTR) among all options.
# Since the Homepage-return rate is lower in the Services version, it is better at providing information to the users they want.
# Students stated that they liked the Services version better than Connect.