 If changing link to a button of the app store will improve the click through rate for our download app page??
 
 Let's try to figure out if the customer clicked or not on the download app button on the home page.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

In [2]:
df = pd.read_csv("../input/grocery-website-data-for-ab-test/grocerywebsiteabtestdata.csv")
df.head()

Unnamed: 0,RecordID,IP Address,LoggedInFlag,ServerID,VisitPageFlag
0,1,39.13.114.2,1,2,0
1,2,13.3.25.8,1,1,0
2,3,247.8.211.8,1,1,0
3,4,124.8.220.3,0,3,0
4,5,60.10.192.7,0,2,0


## Data Preparation

Server 1 will contain the data for our treatment group and servers 2 and 3 for the control group.

In [11]:
df["IP Address"].nunique() == df.shape[1]

False

In [17]:
# grouping to make one row per IP address
df_web = df.groupby(["IP Address", "LoggedInFlag", "ServerID"])["VisitPageFlag"].sum().reset_index(name="sum_VisitPageFlag")

In [18]:
df_web

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag
0,0.0.108.2,0,1,0
1,0.0.109.6,1,1,0
2,0.0.111.8,0,3,0
3,0.0.160.9,1,2,0
4,0.0.163.1,0,2,0
...,...,...,...,...
99758,99.9.53.7,1,2,0
99759,99.9.65.2,0,2,0
99760,99.9.79.6,1,2,0
99761,99.9.86.3,0,1,1


In [19]:
# checking if there is IP address with more than 1 visit
df_web["visitFlag"] = df_web["sum_VisitPageFlag"].apply(lambda x: 1 if x!= 0 else 0)
df_web.head()

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag
0,0.0.108.2,0,1,0,0
1,0.0.109.6,1,1,0,0
2,0.0.111.8,0,3,0,0
3,0.0.160.9,1,2,0,0
4,0.0.163.1,0,2,0,0


In [20]:
df_web.tail()

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag
99758,99.9.53.7,1,2,0,0
99759,99.9.65.2,0,2,0,0
99760,99.9.79.6,1,2,0,0
99761,99.9.86.3,0,1,1,1
99762,99.9.86.9,0,1,0,0


## Split groups for control and treatment

In [21]:
df_web["group"] = df_web["ServerID"].map({1:"Treatment", 2:"Control", 3:"Control"})

In [22]:
df_web.head(10)

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag,group
0,0.0.108.2,0,1,0,0,Treatment
1,0.0.109.6,1,1,0,0,Treatment
2,0.0.111.8,0,3,0,0,Control
3,0.0.160.9,1,2,0,0,Control
4,0.0.163.1,0,2,0,0,Control
5,0.0.169.1,1,1,0,0,Treatment
6,0.0.178.9,1,2,0,0,Control
7,0.0.181.9,0,1,1,1,Treatment
8,0.0.185.4,1,3,0,0,Control
9,0.0.192.6,1,3,0,0,Control


In [23]:
df_web = df_web[df_web["LoggedInFlag"] != 1]
df_web

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag,group
0,0.0.108.2,0,1,0,0,Treatment
2,0.0.111.8,0,3,0,0,Control
4,0.0.163.1,0,2,0,0,Control
7,0.0.181.9,0,1,1,1,Treatment
11,0.0.20.3,0,1,0,0,Treatment
...,...,...,...,...,...,...
99746,99.9.206.2,0,1,0,0,Treatment
99748,99.9.215.4,0,3,1,1,Control
99759,99.9.65.2,0,2,0,0,Control
99761,99.9.86.3,0,1,1,1,Treatment


## Result Analysis

In [24]:
treatment = df_web[df_web["group"]=="Treatment"]
control = df_web[df_web["group"]=="Control"]

In [25]:
treatment

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag,group
0,0.0.108.2,0,1,0,0,Treatment
7,0.0.181.9,0,1,1,1,Treatment
11,0.0.20.3,0,1,0,0,Treatment
14,0.0.213.8,0,1,0,0,Treatment
16,0.0.220.4,0,1,1,1,Treatment
...,...,...,...,...,...,...
99741,99.9.175.5,0,1,0,0,Treatment
99744,99.9.199.3,0,1,1,1,Treatment
99746,99.9.206.2,0,1,0,0,Treatment
99761,99.9.86.3,0,1,1,1,Treatment


In [26]:
control

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag,group
2,0.0.111.8,0,3,0,0,Control
4,0.0.163.1,0,2,0,0,Control
13,0.0.209.9,0,3,1,1,Control
17,0.0.228.7,0,2,0,0,Control
19,0.0.46.2,0,2,0,0,Control
...,...,...,...,...,...,...
99735,99.9.142.4,0,3,0,0,Control
99738,99.9.150.3,0,3,0,0,Control
99740,99.9.162.4,0,3,0,0,Control
99748,99.9.215.4,0,3,1,1,Control


In [27]:
ttest_ind(treatment["visitFlag"], control["visitFlag"], equal_var=False)

Ttest_indResult(statistic=11.879472502167134, pvalue=1.781696815610413e-32)

In [28]:
# calculate the differences in means

df_web_mean = df_web.groupby(["group", "visitFlag"])["group"].count().reset_index(name="Count")
df_web_mean

Unnamed: 0,group,visitFlag,Count
0,Control,0,26839
1,Control,1,6131
2,Treatment,0,12696
3,Treatment,1,3847


In [29]:
#  percent differences
df_web.groupby("group").visitFlag.mean()

group
Control      0.185957
Treatment    0.232545
Name: visitFlag, dtype: float64

In [30]:
# crosstab by groups
group = pd.crosstab(df_web_mean["group"], df_web_mean["visitFlag"], values=df_web_mean["Count"], aggfunc=np.sum, margins=True)

group

visitFlag,0,1,All
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,26839,6131,32970
Treatment,12696,3847,16543
All,39535,9978,49513


In [31]:
# Percentage Row
100*group.div(group["All"], axis=0)

visitFlag,0,1,All
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,81.404307,18.595693,100.0
Treatment,76.745451,23.254549,100.0
All,79.847717,20.152283,100.0


The control group percentage of users that clicked on the link is ~19% and in the treatment group ~23%, so 4% jump.

The result of our AB test shows that the company can drive approximately 4% more users to click on the app download if they change the link for the App store / Play store button.