*For this dataset I decided to apply my knowledge of A/B tests and learn how to design an experiment for a product and analyse the results to draw some insights and reccomendations.*

### **Website of the grossery store chain**

<img src="https://i.imgur.com/PPBlwBa.png" width="300px">


**In this use case I am working as an analyst for a large grossery chain. One of the goals our company has, it's to drive more customers to download our mobile app and register for the loyalty program.**

*My manager is curious if changing link to a button of the app store will improve the click through rate for our download app page.*

![link to button for downloading the app](https://i.imgur.com/RXxvVpY.png)

In [1]:
#let's import all the needed libraries for the analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

In [2]:
#upload grossery website data
grossery_web_data_orig = pd.read_csv('../input/grocery-website-data-for-ab-test/grocerywebsiteabtestdata.csv')
grossery_web_data_orig.head()

Unnamed: 0,RecordID,IP Address,LoggedInFlag,ServerID,VisitPageFlag
0,1,39.13.114.2,1,2,0
1,2,13.3.25.8,1,1,0
2,3,247.8.211.8,1,1,0
3,4,124.8.220.3,0,3,0
4,5,60.10.192.7,0,2,0


**To find the answer I will perform a randomized experiment.**

Our unit of diversion in this experiment is IP address. 
The population we're targeting - Website visitors without an account. 
The duration of the test - 1 week. 
The size of our treatment and control groups - 1/3 of the treatment units and 2/3 of control units.

*Let's try to figure out if the customer clicked or not on the download app button on the home page.*

### Data preparation

Server 1 will contain the data for our treatment group and servers 2 and 3 for the control group. So data will be divided as 33% and 67%.

LoggedInFlag (shows if user has profile on our website) and VisitPageFlag (shows if user clicked on our dowlnoad app link).

In [3]:
#grouping to make one row per IP address
grossery_web_data = grossery_web_data_orig.groupby(['IP Address', 'LoggedInFlag', 'ServerID'])['VisitPageFlag'].sum().reset_index(name='sum_VisitPageFlag')

In [4]:
#checking if there is IP address with more than 1 visit
grossery_web_data['visitFlag'] = grossery_web_data['sum_VisitPageFlag'].apply(lambda x: 1 if x !=0 else 0)
grossery_web_data.head()

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag
0,0.0.108.2,0,1,0,0
1,0.0.109.6,1,1,0,0
2,0.0.111.8,0,3,0,0
3,0.0.160.9,1,2,0,0
4,0.0.163.1,0,2,0,0


In [5]:
#creating groups for control and treatment
grossery_web_data['group'] = grossery_web_data.ServerID.map({1:'Treatment', 2:'Control', 3:'Control'})

In [6]:
grossery_web_data.dtypes

IP Address           object
LoggedInFlag          int64
ServerID              int64
sum_VisitPageFlag     int64
visitFlag             int64
group                object
dtype: object

In [7]:
#removing all records where the LoggedInFlag=1, so it filters out all the users with accounts
grossery_web_data = grossery_web_data[grossery_web_data['LoggedInFlag'] != 1]
grossery_web_data

Unnamed: 0,IP Address,LoggedInFlag,ServerID,sum_VisitPageFlag,visitFlag,group
0,0.0.108.2,0,1,0,0,Treatment
2,0.0.111.8,0,3,0,0,Control
4,0.0.163.1,0,2,0,0,Control
7,0.0.181.9,0,1,1,1,Treatment
11,0.0.20.3,0,1,0,0,Treatment
...,...,...,...,...,...,...
99746,99.9.206.2,0,1,0,0,Treatment
99748,99.9.215.4,0,3,1,1,Control
99759,99.9.65.2,0,2,0,0,Control
99761,99.9.86.3,0,1,1,1,Treatment


### Analyzing the result

In [8]:
treatment = grossery_web_data[grossery_web_data['group']=='Treatment']
control = grossery_web_data[grossery_web_data['group']=='Control']

ttest_ind(treatment['visitFlag'], control['visitFlag'], equal_var = False)

Ttest_indResult(statistic=11.879472502167134, pvalue=1.781696815610413e-32)

Looking at the result of p-value seems like it's unlikely that means are the same.

So next let's count number of visits grouped by treatment or control group and see if they clicked on a link.

In [9]:
# let's calculate the differences in means
grossery_web_data_diff_mean = grossery_web_data.groupby(['group', 'visitFlag'])['group'].count().reset_index(name='Count')
grossery_web_data_diff_mean

Unnamed: 0,group,visitFlag,Count
0,Control,0,26839
1,Control,1,6131
2,Treatment,0,12696
3,Treatment,1,3847


It would be nice to see the percent differences.

In [10]:
grossery_web_data.groupby('group').visitFlag.mean()

group
Control      0.185957
Treatment    0.232545
Name: visitFlag, dtype: float64

In [11]:
#crosstab by groups
groupped = pd.crosstab(grossery_web_data_diff_mean['group'], grossery_web_data_diff_mean['visitFlag'], values=grossery_web_data_diff_mean['Count'], aggfunc=np.sum, margins=True)
groupped

visitFlag,0,1,All
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,26839,6131,32970
Treatment,12696,3847,16543
All,39535,9978,49513


In [12]:
#Percentage row

100*groupped.div(groupped['All'], axis=0)

visitFlag,0,1,All
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Control,81.404307,18.595693,100.0
Treatment,76.745451,23.254549,100.0
All,79.847717,20.152283,100.0


In the Control group percentage of users that clicked on the link is ~19% and in the treatment group ~23%, so 4% jump.

**The result of our AB test shows that the company can drive approximately 4% more users to click on the app download if they change the link for the App store / Play store button.**