# A/B Testing

**Business Case:**

The analysts have found that there is a good amount of increasing traffic from return visitors in the United States on a specific product page but the number of chat inquiries from this specific product page is not increasing proportionally to the traffic. The Product Manager thinks of testing the position of the “click to chat” and its effect to drive more conversions i.e more users using the click to chat with a representative.

The current baseline conversion rate is 5.05% and the product team would be happy if they see a 1% increase in conversion rate.

**Problem Understanding:**

We are testing a change in the position of the “click to chat” button (to make it more visible and forthcoming) on a product page for all “repeat visitors” in the United States. Our evaluation criterion for this test is the conversion rate = (1 if the user clicks on the button else 0).

**Hypothesis:**

The null hypothesis: a change in the position of the “click to chat” button has NO effect on the conversion rate.

The alternate hypothesis: a change in the position of the “click to chat” button will have an effect on the conversion rate.

In [1]:
# Import the libraries
import pandas as pd
import numpy as np
import math
import scipy.stats as stats

### Get the data

A Note: This is a dummy data

In [2]:
df = pd.read_csv("ab_test_dataset.csv",usecols=['COOKIE_ID','CONVERTED','ENGAGEMENT_SCORE','TREATMENT_OR_CONTROL'])

In [3]:
df.head()

Unnamed: 0,COOKIE_ID,CONVERTED,ENGAGEMENT_SCORE,TREATMENT_OR_CONTROL
0,t9M8RVgrnNsFt7eSfH8pp7f,0,1.389313,CONTROL
1,kcNzZvLksRGynu7JWpXOWit,0,1.810723,TREATMENT
2,ad0ABkl1OzRHf08klFDFHKd,0,2.069344,TREATMENT
3,oayDMikU0pdXG0CSfom8e4l,0,1.926706,CONTROL
4,ecYeNU1AQVTaXrQMp25Fb85,0,2.043011,TREATMENT


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40245 entries, 0 to 40244
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   COOKIE_ID             40245 non-null  object 
 1   CONVERTED             40245 non-null  int64  
 2   ENGAGEMENT_SCORE      40245 non-null  float64
 3   TREATMENT_OR_CONTROL  40245 non-null  object 
dtypes: float64(1), int64(1), object(2)
memory usage: 1.2+ MB


The column ‘COOKIE_ID’ is the hashed version of a cookie (the unit of randomization for our experiment), the column ‘CONVERTED’ indicates whether the cookie clicked on the “click to chat” button (1) or not(0) and column ‘TREATMENT_OR_CONTROL’ indicates whether the cookie was from control group or treatment group.

In [5]:
df.groupby('COOKIE_ID')['CONVERTED'].sum().to_frame().reset_index()

Unnamed: 0,COOKIE_ID,CONVERTED
0,002CzVgO1LMjZSTEiNhizge,1
1,005EIE0ijz5bxAowmKojAhL,1
2,00ED0ZIuoZgjWSO1wRDPrnn,1
3,00L7cZvmpovvGq5xJp9Nbu7,2
4,00NRhdpJahq0FCzfkqXciMd,1
...,...,...
39995,zzYwTQRXoyVPYTr7Tow1cTy,0
39996,zzZ5dxxcx32jz9T2uQKBC9C,0
39997,zzcO6ncQm9Na44BdNCClZqH,0
39998,zzq0Bxisrt7oOFPJdwvFJhn,0


In [6]:
df[df.COOKIE_ID == '00L7cZvmpovvGq5xJp9Nbu7']

Unnamed: 0,COOKIE_ID,CONVERTED,ENGAGEMENT_SCORE,TREATMENT_OR_CONTROL
29113,00L7cZvmpovvGq5xJp9Nbu7,1,1.632797,CONTROL
37077,00L7cZvmpovvGq5xJp9Nbu7,1,1.632797,CONTROL


We need to handle this issue of duplicate values

In [7]:
df_pre = df.groupby('COOKIE_ID')['CONVERTED'].sum().to_frame().reset_index()
a= df_pre[df_pre['CONVERTED'] > 1]['COOKIE_ID'].index
df_final = df[~df.index.isin(df_pre[df_pre['CONVERTED'] > 1]['COOKIE_ID'].index)]
df_final.head()

Unnamed: 0,COOKIE_ID,CONVERTED,ENGAGEMENT_SCORE,TREATMENT_OR_CONTROL
0,t9M8RVgrnNsFt7eSfH8pp7f,0,1.389313,CONTROL
1,kcNzZvLksRGynu7JWpXOWit,0,1.810723,TREATMENT
2,ad0ABkl1OzRHf08klFDFHKd,0,2.069344,TREATMENT
4,ecYeNU1AQVTaXrQMp25Fb85,0,2.043011,TREATMENT
5,U5x1Qi5N0IhpDRCCnYXmMYV,0,1.649439,TREATMENT


This dataframe has no duplicates and the cookie is either exposed to treatment or control group.

In [8]:
# pivot calculation ( 1 - cookie clicked on the click to chat button , 0 -
# did not click on the click to chat button)
pd.crosstab(df_final['TREATMENT_OR_CONTROL'] , df_final['CONVERTED'])

CONVERTED,0,1
TREATMENT_OR_CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
CONTROL,18706,1431
TREATMENT,18152,1711


### Sample size calculation

In [9]:
import math

def ABTest_sample_size(p1,mde,alpha,beta, n_side):
    """
    Parameters:
    p1 :"Baseline conversion Rate"
    mde : "Minimum detectable Effect"
    alpha : significance level
    beta : "statistical power"
    n_side:"two tailed t-test or 1 tail t-test"
    """
    p2=p1+mde 
    z_crit=stats.norm.ppf(alpha/n_side)
    z_crit2=stats.norm.ppf(1-beta)
    n_sample=((z_crit*(math.sqrt(2*p1*(1-p1))))+(z_crit2*(math.sqrt((p1*(1-p1))+(p2*(1-p2))))))**2/mde**2
    return math.ceil(n_sample)

#calcualate the sample size using the above function

"""
our base line conversion rate as mentioned in the business case is 5.05%
the mde or minimum detectable effect is 1%
"""
n_required = ABTest_sample_size(0.0505,0.01,0.05,0.80,2)
n_required

7734

We'd need at least 7734 observations for each group.

### Sampling 

In [10]:
from pandas.core.common import random_state

controlGroup_sample = df_final[df_final['TREATMENT_OR_CONTROL']=='CONTROL'].sample(n=n_required,random_state=242)
treatmentGroup_sample = df_final[df_final['TREATMENT_OR_CONTROL']=='TREATMENT'].sample(n=n_required,random_state=242)
ab_testfinal = pd.concat([controlGroup_sample , treatmentGroup_sample],axis=0)
ab_testfinal.reset_index(drop=True,inplace=True)

ab_testfinal.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15468 entries, 0 to 15467
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   COOKIE_ID             15468 non-null  object 
 1   CONVERTED             15468 non-null  int64  
 2   ENGAGEMENT_SCORE      15468 non-null  float64
 3   TREATMENT_OR_CONTROL  15468 non-null  object 
dtypes: float64(1), int64(1), object(2)
memory usage: 483.5+ KB


In [11]:
pd.crosstab(ab_testfinal['TREATMENT_OR_CONTROL'],ab_testfinal['CONVERTED'])

CONVERTED,0,1
TREATMENT_OR_CONTROL,Unnamed: 1_level_1,Unnamed: 2_level_1
CONTROL,7167,567
TREATMENT,7089,645


For the control group, we see the conversion rate at 7.3% [567/(567+7167)]while for the treatment group we see the conversion at 8.3%[645/(645+7089)]. Our treatment group performs better than the control group but the big question here is to analyze whether their results are statistically significant or not.

### Analyzing the results

In [12]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint
controlGroup_results = ab_testfinal[ab_testfinal['TREATMENT_OR_CONTROL'] == 'CONTROL']['CONVERTED']
treatmentGroup_results = ab_testfinal[ab_testfinal['TREATMENT_OR_CONTROL'] == 'TREATMENT']['CONVERTED']


n_control = controlGroup_results.count()
n_treatment = treatmentGroup_results.count()
successes = [controlGroup_results.sum(), treatmentGroup_results.sum()]
nobs = [n_control, n_treatment]

z_statistic, pval = proportions_ztest(successes, nobs=nobs)

# function to calculate the confidence interval and evaluate the practical significance
def ci_calculator(x,N,alpha,n_side):
    """
    x= success cases
    N = total sample size
    alpha = significance level
    n_side = one tail or two tailed test
    """

    _p = x/N
    if ((_p*N < 5) or ((1-_p)*N < 5)):
        raise ValueError('the distribution cannot be assumed as normal')
    else :
        m = stats.norm.ppf(alpha/n_side) * math.sqrt( (_p * (1-_p))/N)
    return f'[{_p+m:.3f} , {_p-m:.3f}]'

control_ci = ci_calculator(567,7734,0.05,2) # 567 conversions and sample size 7734
treatment_ci = ci_calculator(645,7734,0.05,2) # 645 conversions and sample size 7734

print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: {control_ci}')
print(f'ci 95% for treatment group: {treatment_ci}')

p-value: 0.020
ci 95% for control group: [0.068 , 0.079]
ci 95% for treatment group: [0.077 , 0.090]


### Conclusion

The p-value is less than 0.05 (p <0.05) and hence we will reject the null hypothesis and hence we have a statistical significance. Also, the lower bound of the confidence interval for our treatment group is more than the required change (0.0505+0.01=0.0605) and hence we also conclude that there is a practical significance.

**The change that the product manager thought definitely appears to have a positive impact as we observe both statistical and practical significance in the test results.This is a recommendation to launch the change in production!**