***
## A/B Testing

__Scenario:__
 
 `An online store is looking to optimize the layout of their product pages to increase the average time spent by users on the website. They have identified two potential layouts, A (control) and B (test), and want to conduct an A/B test to determine if layout B leads to a statistically significant increase in average time spent on the website compared to layout A.`

__Data:__
 
 `The online store collected data from 500 users who visited their website. They randomly assigned 250 users to the control group (layout A) and 250 users to the test group (layout B). They measured the average time spent on the website in minutes for each user.`

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [7]:
df = pd.read_csv('ab_continuous_parametric.csv')
df

Unnamed: 0,Group,Time_Spent
0,A,3.91437
1,A,5.99735
2,A,5.28298
3,A,3.49371
4,A,4.42140
...,...,...
495,B,8.13576
496,B,8.02597
497,B,6.94615
498,B,7.62316


In [10]:
# A/B Groups & Target Summary Stats
df.groupby("Group")['Time_Spent'].agg(["count", "median", "mean","var", "max", 'min','sum'])

Unnamed: 0_level_0,count,median,mean,var,max,min,sum
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A,250,4.99093,4.97938,1.03938,7.5983,1.76894,1244.84599
B,250,6.82073,6.88666,3.92352,12.91725,1.73712,1721.66406


__From the descriptive statistics above we can tell that:__
* Group B users have a higher mean of time spent than Group A users
* Group B users have a higher variance of time spent than Group A users

__While the null and alternative hypotheses are:__

$H_0 : \mu_A = \mu_B$

$H_1 : \mu_A < \mu_B$

If $ H_1 $ is true, the claim of `layout B leads to a statistically significant increase in average time spent on the website compared to layout A` is also correct. `

***

***
## Test Pipeline


#### __Normality Assumption__
* Apply `Shapiro test`. If normality assumption is met, a _parametric test_ will be used.
    * Check __Homogeneity of Variance__ with `Levene test`. If homogeneous use `independent t-test`, else use `Welch's t-test`.
* If normality assumption is not met, use _non-parametric_ `Mann-Whitney U test`. 




In [13]:
def AB_Test_continuous_data(dataframe, group, target, A, B):
    
    from scipy.stats import shapiro, levene, ttest_ind, mannwhitneyu    
    
    # Set A/B
    groupA = dataframe[dataframe[group] == A][target]
    groupB = dataframe[dataframe[group] == B][target]
    
    # Normality Assumption
    ntA = shapiro(groupA)[1] < 0.05
    ntB = shapiro(groupB)[1] < 0.05
    # H0: Distribution is Normal! = False
    # H1: Distribution is not Normal! = True
    
    if (ntA == False) & (ntB == False): # "H0: Normal Distribution"
        # Parametric Test
        # Assumption: Homogeneity of variances
        leveneTest = levene(groupA, groupB)[1] < 0.05
        # H0: Homogeneity: False
        # H1: Heterogeneous: True
        
        if leveneTest == False:
            # Homogeneity of variances --> independent T test
            test = ttest_ind(groupA, groupB, equal_var=True)[1]
        else:
            # Heterogeneous variances --> Welch's t-test
            test = ttest_ind(groupA, groupB, equal_var=False)[1]
            # H0: M1 == M2 - False
            # H1: M1 != M2 - True
    else:
        # Non-Parametric Test --> Mann - Whitney U test
        test = mannwhitneyu(groupA, groupB)[1] 
        # H0: M1 == M2 - False
        # H1: M1 != M2 - True
        
    # Results
    temp = pd.DataFrame({
        "p-value":[test],
        "AB Hypothesis":[test < 0.05]
    })
    temp["Test Type"] = np.where((ntA == False) & (ntB == False), "Parametric", "Non-Parametric")
    temp["AB Hypothesis"] = np.where(temp["AB Hypothesis"] == False, "Fail to Reject H0", "Reject H0")
    temp["Comment"] = np.where(temp["AB Hypothesis"] == "Fail to Reject H0", "A/B groups are similar!", "A/B groups are not similar!")
    
    # Columns
    if (ntA == False) & (ntB == False):
        temp["Homogeneity"] = np.where(leveneTest == False, "Yes", "No")
        temp = temp[["Test Type", "Homogeneity","AB Hypothesis", "p-value", "Comment"]]
    else:
        temp = temp[["Test Type","AB Hypothesis", "p-value", "Comment"]]
    
    return temp

In [14]:
AB_Test_continuous_data(dataframe=df, group="Group", target="Time_Spent", A='A', B='B')

Unnamed: 0,Test Type,Homogeneity,AB Hypothesis,p-value,Comment
0,Parametric,No,Reject H0,0.0,A/B groups are not similar!
