
In this notebook we are going over the approach of executing a simple A/B using data available from Kaggle. To make things more enjoyable let´s create a hypothetical circumstance to illustrate the problem better. 

"Electronic Masters" is an online store for computer products. Consumers can buy devices like monitors, computers, laptops, HDMI cables, home theatre systems, batteries, headphones, etc. The UX design team has been working on a new sales page to increase conversion rates for a distinctive type of battery that costs £4.99. 

The current conversion rate of the product webpage is at 13% on average over the last year. The intent of the product manager is to boost the conversion rate by 2%. The new page, developed by the UX team, would be a success if the conversion rate evolved to 15%. Before replacing the old sales page with the new one, it is wise to try its effectiveness on a smaller group of customers, running a smaller risk of losses if the new page shows a conversion worse than the current page. The dev team has implemented the online test with the new page and provided the dataset for examination.

In [1]:
import pandas as pd
import numpy as np
import math

from statsmodels.stats import api as sms
from scipy.stats import chi2_contingency

In [2]:
df_file = pd.read_csv('./ab_data.csv')

In [3]:
df_file.head(8)

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,2017-01-21 22:11:48.556739,control,old_page,0
1,804228,2017-01-12 08:01:45.159739,control,old_page,0
2,661590,2017-01-11 16:55:06.154213,treatment,new_page,0
3,853541,2017-01-08 18:28:03.143765,treatment,new_page,0
4,864975,2017-01-21 01:52:26.210827,control,old_page,1
5,936923,2017-01-10 15:20:49.083499,control,old_page,0
6,679687,2017-01-19 03:26:46.940749,treatment,new_page,1
7,719014,2017-01-17 01:48:29.539573,control,old_page,0


In [4]:
print(f'-- Shape: {df_file.shape}\n')

print(f'-- Uniques:\n \
      group{df_file.group.unique()}\n \
      landing_page{df_file.landing_page.unique()}\n \
      converted{df_file.converted.unique()}\n')

print(f'-- NAs:\n{df_file.isna().sum()}\n')

print(f'-- Duplicated records:\n{df_file[df_file.user_id.duplicated()].count()[0]}\n')

print(df_file.info())

-- Shape: (294478, 5)

-- Uniques:
       group['control' 'treatment']
       landing_page['old_page' 'new_page']
       converted[0 1]

-- NAs:
user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

-- Duplicated records:
3894

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294478 entries, 0 to 294477
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294478 non-null  int64 
 1   timestamp     294478 non-null  object
 2   group         294478 non-null  object
 3   landing_page  294478 non-null  object
 4   converted     294478 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB
None


# The experiment

## Formulating a hypothesis

We will choose a two-tailed test. The new page can perform both ways, better or worse.

H0: New page conversion is the same.  - The null hypothesis.   
H1: New page conversion is different. - The alternative hypothesis.

Let´s keep the confidence level at 95%, bringing this way a significance level of 5%. That means we can be 95% confident the samples include the true population mean. On the other hand, the significance level is the probability of wrongly rejecting the null hypothesis. In this case, we run a 5% chance of rejecting the null hypothesis when in fact, we should not. 

At a later stage, the significance level will be compared with a calculated metric(p-value) using statistical methods. If the p-value is less than the significance level, it means we can reject the null hypothesis and conclude that the effect is statistically significant with a 95% confidence level.

## Parameters

In [5]:
# Confidence level
confidence_level = 0.95

# Significance Level
significance_level = 0.05

# conversion rates
conversion_a = 0.13  # current conversion
conversion_b = 0.15  # expected conversion for effect measurement

# Effect size for a test comparing two proportions
# 2 * (arcsin(sqrt(prop1)) - arcsin(sqrt(prop2)))

effect_size = sms.proportion_effectsize( conversion_a, conversion_b)

# power

power = 0.8

## Calculating Sample Size

In [6]:
# calculating sample size
# https://www.statsmodels.org/stable/_modules/statsmodels/stats/power.html#NormalIndPower.solve_power

sample_size_n = sms.NormalIndPower().solve_power(effect_size,
                                                power = power,
                                                alpha = significance_level)
sample_size_n = math.ceil(sample_size_n)  # rounding up

print(f'Size for each group: {sample_size_n}')

Size for each group: 4720


## Data preparion

In [7]:
# Data preparation
df_duplicates_list = df_file[['user_id', 'group']].groupby('user_id').count().reset_index().query('group > 1')
print(f'Number of records in the duplicates:{len(df_duplicates_list)}')

Number of records in the duplicates:3894


It matches preview check - 3.894.

Visual checking 'user_id's records in the source file tagged in the duplicated list:

In [8]:
df_file[df_file.user_id.isin(df_duplicates_list.user_id)].sort_values('user_id').head(10)

Unnamed: 0,user_id,timestamp,group,landing_page,converted
230259,630052,2017-01-17 01:16:05.208766,treatment,new_page,0
213114,630052,2017-01-07 12:25:54.089486,treatment,old_page,1
22513,630126,2017-01-14 13:35:54.778695,treatment,old_page,0
251762,630126,2017-01-19 17:16:00.280440,treatment,new_page,0
183371,630137,2017-01-20 02:08:49.893878,control,old_page,0
11792,630137,2017-01-22 14:59:22.051308,control,new_page,0
207211,630320,2017-01-07 18:02:43.626318,control,old_page,0
255753,630320,2017-01-12 05:27:37.181803,treatment,old_page,0
96929,630471,2017-01-07 02:14:17.405726,control,new_page,0
110634,630471,2017-01-23 01:42:51.501851,control,old_page,0


'user_id's more than once appears in both groups(treatment and control) at the same time, opting for removing all of them since the df is large enough for the test.

In [9]:
df = df_file[~df_file.user_id.isin(df_duplicates_list.user_id)]
print(f'Total Number of records\n \
        Before dropping duplicates: {df_file.shape[0]} \n \
        After dropping duplicates: {df.shape[0]} \n \
        Duplicates dropped(both duplicates): {df_file.shape[0] - df.shape[0]}')

Total Number of records
         Before dropping duplicates: 294478 
         After dropping duplicates: 286690 
         Duplicates dropped(both duplicates): 7788


## Drawing Samples

In [10]:
df_control_group = df[df.group == 'control'].sample(n = sample_size_n,
                                                    random_state = 32)

df_treatment_group = df[df.group == 'treatment'].sample(n = sample_size_n,
                                                        random_state = 32)

print(f'Length Control group: {df_control_group.shape[0]}')
print(f'Length Treatment group: {df_treatment_group.shape[0]}')

# Merging sample groups control and treatment
df_control_and_treatment = pd.concat([df_control_group,df_treatment_group])


Length Control group: 4720
Length Treatment group: 4720


## Conversion rates

In [11]:
# all successfull purchases divided by all  
n_converts_in_control = df_control_group.converted[df_control_group.converted == 1].count()
conversion_rate_control = n_converts_in_control / df_control_group.shape[0]


n_converts_in_treatment = df_treatment_group.converted[df_treatment_group.converted == 1].count()
conversion_rate_treatment = n_converts_in_treatment / df_treatment_group.shape[0]

print(f'Conversion rates for each group:\n \
        Control: {conversion_rate_control}\n \
        Treatment: {conversion_rate_treatment}')



Conversion rates for each group:
         Control: 0.11864406779661017
         Treatment: 0.11970338983050847


## Hypothesis Testing

In [12]:
df_summary = df_control_and_treatment[['group','converted']].groupby('group').agg({'converted':['sum','count']})
df_summary.columns = ['converted', 'total']
print(f'{df_summary} \n')

chi_value, p_value, dof, expected = chi2_contingency(df_summary)
print(f'Chi Squared Test - P-Value output: {p_value:.2f}\n')

print("""H0: New page conversion is the same. (null hypothesis)
H1: New page conversion is different. (alternative hypothesis)\n""")

if p_value < significance_level:
    print(f'P-Value is {p_value:.2f}, so it is lower than the Significance Level ({significance_level}),\ntherefore null hypothesis has been rejected.') 
else:
    print(f'P-Value is {p_value:.2f}, so it is higher than the Significance Level ({significance_level}),\ntherefore null hypothesis has failed to be rejected.') 

           converted  total
group                      
control          560   4720
treatment        565   4720 

Chi Squared Test - P-Value output: 0.91

H0: New page conversion is the same. (null hypothesis)
H1: New page conversion is different. (alternative hypothesis)

P-Value is 0.91, so it is higher than the Significance Level (0.05),
therefore null hypothesis has failed to be rejected.


## Conclusion

Considering the proportion test, categorical variables and an enormous number of records, the Chi-Squared test was suitable for the situation.


    Chi Squared Test - P-Value output: 0.91  
    H0: New page conversion is the same. (null hypothesis)  
    H1: New page conversion is different. (alternative hypothesis)  
    P-Value is 0.91, so it is higher than the Significance Level (0.05),
    therefore null hypothesis has failed to be rejected.


It is not possible to say there is a difference between the new design and the previous one, with a 95% confidence level. So the null hypothesis can not be rejected. In this case, we can conclude that both pages yield the same conversion rate.

## Business motivation

This section is a reflection, using the dataset provided on the role conversion rate plays on eCommerce when dealing with a vast amount of page views and conversion to sales.

In [13]:

df_visits_prod_page = df.copy()
df_visits_prod_page['date'] = pd.to_datetime(df_visits_prod_page['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d'))
df_visits_prod_page = df_visits_prod_page[['user_id','date']].groupby('date').count().reset_index()

df_visits_prod_page['current_estim_purchases'] = np.round(df_visits_prod_page['user_id'] * 0.13)
df_visits_prod_page['alternative_estim_purchases'] = np.round(df_visits_prod_page['user_id'] * 0.15)

# product value:
device_value = 4.99

df_visits_prod_page['current_estim_GMV'] = np.round(df_visits_prod_page['current_estim_purchases'] * device_value)
df_visits_prod_page['alternative_estim_GMV'] = np.round(df_visits_prod_page['alternative_estim_purchases'] * device_value)
df_visits_prod_page.columns = ['date', 'n_of_users', 'current_estim_purchases', 'alternative_estim_purchases', 'current_estim_GMV', 'alternative_estim_GMV']


print(f'Total number of days the dataset comprehends: {df_visits_prod_page.date.count()}')
print(f'The average number of visitors to the product page daily: {df_visits_prod_page.n_of_users.sum()}  ')
gmv_current = df_visits_prod_page.current_estim_GMV.sum()
gmv_alternative = df_visits_prod_page.alternative_estim_GMV.sum()
print('\nGMV')
print('The current Gross Merchandise Value on period: {}'.format(gmv_current))
print('Alternative Gross Merchandise Value on period: {}'.format(gmv_alternative))
lift = (gmv_alternative - gmv_current) / gmv_current
print(f'Expected Lift: {lift * 100:.2f}%')
print(f'Difference that would be achieved in monthly sales for the period\nconsidering the target conversion: {((gmv_alternative - gmv_current)/23*30):.2f}')

Total number of days the dataset comprehends: 23
The average number of visitors to the product page daily: 286690  

GMV
The current Gross Merchandise Value on period: 185981.0
Alternative Gross Merchandise Value on period: 214590.0
Expected Lift: 15.38%
Difference that would be achieved in monthly sales for the period
considering the target conversion: 37316.09


Just an insight into the motivation for a slight increase in the conversion rate and the extra revenue it would bring.