*For this dataset I decided to apply my knowledge of A/B tests and learn how to design an experiment for a product and analyse the results to draw some insights and reccomendations.*

### **Website of the grossery store chain**

<img src="https://i.imgur.com/PPBlwBa.png" width="300px">


**In this use case I am working as an analyst for a large grossery chain. One of the goals our company has, it's to drive more customers to download our mobile app and register for the loyalty program.**

*My manager is curious if changing link to a button of the app store will improve the click through rate for our download app page.*

![link to button for downloading the app](https://i.imgur.com/RXxvVpY.png)

In [None]:
#let's import all the needed libraries for the analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

In [None]:
#upload grossery website data
grossery_web_data_orig = pd.read_csv('../input/grocery-website-data-for-ab-test/grocerywebsiteabtestdata.csv')
grossery_web_data_orig.head()

**To find the answer I will perform a randomized experiment.**

Our unit of diversion in this experiment is IP address. 
The population we're targeting - Website visitors without an account. 
The duration of the test - 1 week. 
The size of our treatment and control groups - 1/3 of the treatment units and 2/3 of control units.

*Let's try to figure out if the customer clicked or not on the download app button on the home page.*

### Data preparation

Server 1 will contain the data for our treatment group and servers 2 and 3 for the control group. So data will be divided as 33% and 67%.

LoggedInFlag (shows if user has profile on our website) and VisitPageFlag (shows if user clicked on our dowlnoad app link).

In [None]:
#grouping to make one row per IP address
grossery_web_data = grossery_web_data_orig.groupby(['IP Address', 'LoggedInFlag', 'ServerID'])['VisitPageFlag'].sum().reset_index(name='sum_VisitPageFlag')

In [None]:
#checking if there is IP address with more than 1 visit
grossery_web_data['visitFlag'] = grossery_web_data['sum_VisitPageFlag'].apply(lambda x: 1 if x !=0 else 0)
grossery_web_data.head()

In [None]:
#creating groups for control and treatment
grossery_web_data['group'] = grossery_web_data.ServerID.map({1:'Treatment', 2:'Control', 3:'Control'})

In [None]:
grossery_web_data.dtypes

In [None]:
#removing all records where the LoggedInFlag=1, so it filters out all the users with accounts
grossery_web_data = grossery_web_data[grossery_web_data['LoggedInFlag'] != 1]
grossery_web_data

### Analyzing the result

In [None]:
treatment = grossery_web_data[grossery_web_data['group']=='Treatment']
control = grossery_web_data[grossery_web_data['group']=='Control']

ttest_ind(treatment['visitFlag'], control['visitFlag'], equal_var = False)

Looking at the result of p-value seems like it's unlikely that means are the same.

So next let's count number of visits grouped by treatment or control group and see if they clicked on a link.

In [None]:
#let's calculate the differences in means
grossery_web_data_diff_mean = grossery_web_data.groupby(['group', 'visitFlag'])['group'].count().reset_index(name='Count')
grossery_web_data_diff_mean

It would be nice to see the percent differences.

In [None]:
grossery_web_data.groupby('group').visitFlag.mean()

In [None]:
#crosstab by groups
groupped = pd.crosstab(grossery_web_data_diff_mean['group'], grossery_web_data_diff_mean['visitFlag'], values=grossery_web_data_diff_mean['Count'], aggfunc=np.sum, margins=True)
groupped

In [None]:
#Percentage row
100*groupped.div(groupped['All'], axis=0)

In the Control group percentage of users that clicked on the link is ~19% and in the treatment group ~23%, so 4% jump.

**The result of our AB test shows that the company can drive approximately 4% more users to click on the app download if they change the link for the App store / Play store button.**