### To unlock the potential of their new webpage, a company devised an A/B test. With two equally-sized groups, they showed the new page (B) to one group and the old one (A) to the other. By comparing conversion rates and statistically testing the difference, they'll discover if the new page truly drives more paying users.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.stats.api as sms
from scipy.stats import shapiro, levene, mannwhitneyu, ttest_ind

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv(r"D:\cats_dogs\ab_data.csv")
df.head()

Unnamed: 0,user_id,timestamp,group,landing_page,converted
0,851104,11:48.6,control,old_page,0
1,804228,01:45.2,control,old_page,0
2,661590,55:06.2,treatment,new_page,0
3,853541,28:03.1,treatment,new_page,0
4,864975,52:26.2,control,old_page,1


### user_id: unique users number

### timestamp: time

### group: treatment and control group

### landing_page: old_page and new_page

### converted: Sign up status after viewing the page (0-1)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 294480 entries, 0 to 294479
Data columns (total 5 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   user_id       294480 non-null  int64 
 1   timestamp     294480 non-null  object
 2   group         294480 non-null  object
 3   landing_page  294480 non-null  object
 4   converted     294480 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 11.2+ MB


In [4]:
df.describe()

Unnamed: 0,user_id,converted
count,294480.0,294480.0
mean,787973.538896,0.119658
std,91210.917091,0.324562
min,630000.0,0.0
25%,709031.75,0.0
50%,787932.5,0.0
75%,866911.25,0.0
max,945999.0,1.0


In [5]:
df.apply(lambda x: x.nunique())

user_id         290585
timestamp        35993
group                2
landing_page         2
converted            2
dtype: int64

In [6]:
df.isnull().sum()

user_id         0
timestamp       0
group           0
landing_page    0
converted       0
dtype: int64

In [7]:
print(df.shape)
df = df.drop_duplicates(subset= 'user_id', keep= False)
print(df.shape)

(294480, 5)
(286690, 5)


In [8]:
df.groupby('group').count()["converted"]

group
control      143293
treatment    143397
Name: converted, dtype: int64

In [9]:
df.groupby('group').mean()["converted"]

group
control      0.120173
treatment    0.118726
Name: converted, dtype: float64

In [10]:
pd.DataFrame(df.loc[:,'landing_page'].value_counts(normalize = True) * 100)

Unnamed: 0,landing_page
new_page,50.018138
old_page,49.981862


In [11]:
df[((df['group'] == 'control') & (df['landing_page'] == 'new_page')) |((df['group'] == 'treatment') & (df['landing_page'] == 'old_page')) ]

Unnamed: 0,user_id,timestamp,group,landing_page,converted


# AB Test
## Normality Assumption
### H0: The assumption of normal distribution is provided
### H1: The assumption of normal distribution is not provided
### If the p-value is less than 0.05, it is considered significant and a non-parametric test (mannwhitneyu test) will be used. Else a parametric test (t-test)

In [12]:
test_stat, pvalue = shapiro(df.loc[df["landing_page"] == "old_page", "converted"])
print("p-value:",pvalue)
print("test_stat:",test_stat)

p-value: 0.0
test_stat: 0.3792334198951721


In [13]:
test_stat, pvalue = shapiro(df.loc[df["landing_page"] == "new_page", "converted"])
print("p-value:",pvalue)
print("test_stat:",test_stat)

p-value: 0.0
test_stat: 0.37685757875442505


### We reject the null hypothesis and accept the alternative that this data is not normally distributed.

### Variance Homogeneity
### H0: Variances are homogeneous

### H1: Variances are not homogeneous

In [14]:
test_stat, pvalue = levene(df.loc[df["landing_page"] == "new_page", "converted"],
                           df.loc[df["landing_page"] == "old_page", "converted"])
print("p-value:",pvalue)  
print("test_stat:",test_stat)

p-value: 0.2322897281547632
test_stat: 1.4267917566652295


### We accept the null hypothesis that this data has homoscedasticity.

## Hypothesis Testing

### H0: There is not statistically significant difference between the old page and new page

### H1: There is statistically significant difference between the old page and new page

### We use a non-parametric mannwhitneyu test because the data is not normally distributed.

In [16]:
test_stat, pvalue = mannwhitneyu(df.loc[df["landing_page"] == "new_page", "converted"],
                                 df.loc[df["landing_page"] == "old_page", "converted"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 10259026653.0000, p-value = 0.2323


## In Conclusion:
### p-value(0.2323) >0.05

### We fail to reject the null hypothesis and instead accept the null. There is no statistically significant difference between the new page and the old page.