### A/B Testing for Ecommerce

The goal is to understand the results of an A/B test run by an e-commerce website. The company has developed a new web page in order to try and increase the number of users who "convert," meaning the number of users who decide to pay for the company's product. Objective is to work through this notebook to help the company understand if they should implement this new page, keep the old page, or perhaps run the experiment longer to make their decision.

The data given is as follows:
1. user_id: unique users number
2. timestamp: time
3. group: treatment and control group
4. landing_page: old_page and new_page
5. converted: Sign up status after viewing the page (0-1)

#### Our strategy to solve this is as follows:
1. Load data
2. EDA
3. Decide which test to carry out
4. Define null & alternate hypothesis
5. Run tests & conclude

In [1]:
# Imports
import pandas as pd 
import numpy as np 

#### 1. Load data

In [2]:
data = pd.read_csv("data/ab_data.csv")

In [3]:
data

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1
...,...,...,...,...,...
294475,734608,45:03.4,control,old_page,0
294476,697314,20:29.0,control,old_page,0
294477,715931,40:24.5,treatment,new_page,0
294478,759899,20:29.0,treatment,new_page,0


#### 2. High level EDA

In [4]:
# Shape of data
data.shape

(294480, 5)

In [5]:
# Check data types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294480 entries, 0 to 294479
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294480 non-null  int64 
 1   timestamp     294480 non-null  object
 2   group         294480 non-null  object
 3   landing_page  294480 non-null  object
 4   converted     294480 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [6]:
# Check if any duplicates
data.duplicated().sum()

0

In [16]:
data[data['user_id'].duplicated()].sort_values(by='user_id')

Unnamed: 0,user_id,timestamp,group,landing_page,converted
230259,630052,16:05.2,treatment,new_page,0
251762,630126,16:00.3,treatment,new_page,0
183371,630137,08:49.9,control,old_page,0
255753,630320,27:37.2,treatment,old_page,0
110634,630471,42:51.5,control,old_page,0
...,...,...,...,...,...
243428,945627,43:17.3,treatment,new_page,1
144693,945645,56:13.6,control,old_page,0
142354,945703,40:51.2,control,new_page,0
186960,945797,23:21.8,control,old_page,0


We will check the mismatching between group and landing_page first then we'll delete the duplicates

In [17]:
data.groupby(by=['group','landing_page']).agg(rows=('user_id', 'count'))

Unnamed: 0_level_0,Unnamed: 1_level_0,rows
group,landing_page,Unnamed: 2_level_1
control,new_page,1928
control,old_page,145274
treatment,new_page,145313
treatment,old_page,1965


We can see that there is mismatching or incorrect data. All our control group should have only 'old_page' while all treatment should have 'new_page. We will delete the mismatch data.
Before that, lets take backup of original DF.

In [19]:
data_bkp = data.copy()

In [30]:
print('Shape of data before deleteing mismatches: {0}'.format(data.shape))
data = data[~((data['group']=='control') & (data['landing_page']=='new_page'))]
data = data[~((data['group']=='treatment') & (data['landing_page']=='old_page'))]
print('Shape of data after deleteing mismatches: {0}'.format(data.shape))

Shape of data before deleteing mismatches: (294480, 5)
Shape of data after deleteing mismatches: (290587, 5)


In [32]:
data.groupby(by=['group','landing_page']).agg(rows=('user_id', 'count'))

Unnamed: 0_level_0,Unnamed: 1_level_0,rows
group,landing_page,Unnamed: 2_level_1
control,old_page,145274
treatment,new_page,145313


Check duplicates again

In [33]:
data[data['user_id'].duplicated()].sort_values(by='user_id')

Unnamed: 0,user_id,timestamp,group,landing_page,converted
294478,759899,20:29.0,treatment,new_page,0
2893,773192,55:59.6,treatment,new_page,0


In [34]:
# Lets get rid of them
data.drop_duplicates(subset='user_id', inplace=True)

In [35]:
data[data['user_id'].duplicated()].sort_values(by='user_id')

Unnamed: 0,user_id,timestamp,group,landing_page,converted


In [36]:
# Check if any null
data.isna().any()

user_id         False
timestamp       False
group           False
landing_page    False
converted       False
dtype: bool

In [37]:
# Check number of uniques 
data.nunique()

user_id         290585
timestamp        35991
group                2
landing_page         2
converted            2
dtype: int64

EDA

In [38]:
# Lets see the proportion of group
data_grp_agg = data.groupby(by='group', as_index=False).agg(users=('user_id', 'count'))
data_grp_agg['percentage'] = data_grp_agg['users'] / np.sum(data_grp_agg['users']) * 100
data_grp_agg

Unnamed: 0,group,users,percentage
0,control,145274,49.993634
1,treatment,145311,50.006366


In [39]:
# Lets see the conversion rate within group
data_converted_agg = data[data['converted']==1].groupby(by=['group','converted'], as_index=False).agg(users=('user_id', 'count'))
data_converted_agg['total_users'] = data_grp_agg['users']
data_converted_agg['conversion_rate'] = data_converted_agg['users']/data_grp_agg['users'] * 100
data_converted_agg

Unnamed: 0,group,converted,users,total_users,conversion_rate
0,control,1,17489,145274,12.03863
1,treatment,1,17264,145311,11.880725


Create contingency table

In [42]:
data_agg = data.groupby(by=['group','converted'], as_index=False).agg(users=('user_id', 'count'))
data_agg

Unnamed: 0,group,converted,users
0,control,0,127785
1,control,1,17489
2,treatment,0,128047
3,treatment,1,17264


In [49]:
tbl = pd.crosstab(data_agg.converted, data_agg.group, values=data_agg.users, aggfunc='sum')

In [50]:
tbl

group,control,treatment
converted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,127785,128047
1,17489,17264


We can see that the control group has higher conversion rate. But we cannot conclude this since we still have to assess statistical significance of this observations. For that we will perform statistical test.

#### 3. Decide which test to carry out
<img src="tests.png" width="800" height="400"><br>
Source: https://towardsdatascience.com/a-b-testing-a-complete-guide-to-statistical-testing-e3f1db140499

Since our metric is discrete i.e. 0 or 1 (converted) and we have sufficiently large sample size i.e. 290k, we can go ahead with Pearson's chi-square test

#### 4. Define null and alternate hypothesis

We will assume that our null hypothesis H0 is that the two web pages have same conversion rate.<br>
H0 = Both designs have same conversion rate<br>
Ha = Both designs have different conversion rate<br>

Here we do not need to know which design has more efficacy. By doing statistical tests, we will either reject the H0 or we will fail to reject the H0.

In [40]:
from scipy import stats

In [51]:
# To run the test, we will use contingency table that we have created
tbl

group,control,treatment
converted,Unnamed: 1_level_1,Unnamed: 2_level_1
0,127785,128047
1,17489,17264


#### 5. Run tests

In [58]:
# chi-squared test 
significance, p, dof, expected = stats.chi2_contingency(tbl)
print("Fredoom of degree: {0}".format(dof))
print(expected)

# interpret test-statistic
prob = 0.95
alpha = 1-prob

critical = stats.chi2.ppf(prob, dof)

print('\nProbability: {0} \nCritical: {1} \nSignificance: {2}\n'.format(prob, critical, significance))
if abs(significance) >= critical:
    print('Reject null hypothesis')
else:
    print('Fail to reject null hypothesis')

# interpret p-value

print('\nSignificance: {0}\np: {1}\n'.format(alpha, p))
if p <= alpha:
    print('Reject null hypothesis')
else:
    print('Fail to reject null hypothesis')

Fredoom of degree: 1
[[127899.7125385 127932.2874615]
 [ 17374.2874615  17378.7125385]]

Probability: 0.95 
Critical: 3.841458820694124 
Significance: 1.7053502645115

Fail to reject null hypothesis

Significance: 0.050000000000000044
p: 0.19158976298516012

Fail to reject null hypothesis


Lets also perform Fisher's exact test in case 290k samples is small for you

In [57]:
alpha = 1-prob
odd_ratio, p = stats.fisher_exact(tbl, alternative="two-sided")
print("Odd ration: {0}\np_value: {1}\n".format(odd_ratio, p))

if p <= alpha:
    print('Reject null hypothesis')
else:
    print('Fail to reject null hypothesis')

Odd ration: 0.9851149705891606
p_value: 0.19047607131914907

Fail to reject null hypothesis
