## BUSINESS PROBLEM

Using A/B Testing, differences in purchasing activities between two separate groups, known as Control and Test groups, will be examined.

## ABOUT DATASET

Content: This dataset contains information about a company's website, including the number of ads seen and clicked on by users and earnings .

Context: There are two separate datasets: Control and Test group. These data sets are located on separate pages of "ab_testing.xlsxexcel". MaximumBidding was applied to the Control group and AverageBidding was applied to the test group.

COLUMNS

* impression: Number of ad views
* Click: Number of clicks on the ad displayed
* Purchase: Number of products purchased after ads clicked
* Earning: Earnings after purchased products

## TASK

To perform the A/B test, hypotheses will first be created. Afterwards, an Assumption Check will be made. Depending on the result of the Assumption Check, parametric or non-parametric testing will be applied to the hypothesis. Finally, the evaluation will be made according to the p-value  obtained.

## PREPARING DATA

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

In [2]:
df_maxbid=pd.read_excel("/kaggle/input/ab-test12/ab_testing.xlsx",sheet_name="Control Group")
df_avebid=pd.read_excel("/kaggle/input/ab-test12/ab_testing.xlsx",sheet_name="Test Group")

In [3]:
df_maxbid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


In [4]:
df_maxbid.shape

(40, 4)

In [5]:
df_maxbid.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,101711.449068,20302.157862,45475.942965,85726.690349,99790.701078,115212.816543,147539.336329
Click,40.0,5100.657373,1329.985498,2189.753157,4124.304129,5001.220602,5923.803596,7959.125069
Purchase,40.0,550.894059,134.108201,267.028943,470.095533,531.206307,637.957088,801.79502
Earning,40.0,1908.5683,302.917783,1253.989525,1685.847205,1975.160522,2119.802784,2497.295218


In [6]:
df_maxbid.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.459271,6090.077317,665.211255,2311.277143
1,98050.451926,3382.861786,315.084895,1742.806855
2,82696.023549,4167.96575,458.083738,1797.827447
3,109914.400398,4910.88224,487.090773,1696.229178
4,108457.76263,5987.655811,441.03405,1543.720179


In [7]:
df_avebid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


In [8]:
df_avebid.shape

(40, 4)

In [9]:
df_avebid.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,120512.411758,18807.448712,79033.834921,112691.97077,119291.300775,132050.578933,158605.920483
Click,40.0,3967.549761,923.095073,1836.629861,3376.819024,3931.359804,4660.497911,6019.695079
Purchase,40.0,582.106097,161.152513,311.629515,444.626828,551.355732,699.86236,889.91046
Earning,40.0,2514.890733,282.730852,1939.611243,2280.537426,2544.666107,2761.545405,3171.489708


In [10]:
df_avebid.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.503796,3216.547958,702.160346,1939.611243
1,134775.943363,3635.082422,834.054286,2929.40582
2,107806.620788,3057.14356,422.934258,2526.244877
3,116445.275526,4650.473911,429.033535,2281.428574
4,145082.516838,5201.387724,749.860442,2781.697521


## Defining the Hypothesis of A/B Testing

#H0=M1=M2
#There is no statistically significant difference in the purchasing averages of the Control and Test groups.


#H1= M1 != M2
#There is a statistically significant difference in the purchasing averages of the Control and Test groups.

In [11]:
df_maxbid["group"]="Control_Group"
df_avebid["group"]="Test_Group"

In [12]:
df=pd.concat([df_maxbid,df_avebid],ignore_index=True)

In [13]:
df.head(50)

Unnamed: 0,Impression,Click,Purchase,Earning,group
0,82529.459271,6090.077317,665.211255,2311.277143,Control_Group
1,98050.451926,3382.861786,315.084895,1742.806855,Control_Group
2,82696.023549,4167.96575,458.083738,1797.827447,Control_Group
3,109914.400398,4910.88224,487.090773,1696.229178,Control_Group
4,108457.76263,5987.655811,441.03405,1543.720179,Control_Group
5,77773.6339,4462.206586,519.669656,2081.85185,Control_Group
6,95110.586266,3555.58067,512.928746,1815.006614,Control_Group
7,106649.183075,4358.027043,747.020123,1965.1004,Control_Group
8,122709.716594,5091.558964,745.985682,1651.662991,Control_Group
9,79498.248658,6653.845515,470.501367,2456.30424,Control_Group


In [14]:
df.groupby("group").agg({"Purchase":"mean"})

Unnamed: 0_level_0,Purchase
group,Unnamed: 1_level_1
Control_Group,550.894059
Test_Group,582.106097


## ASSUMPTION CHECKING
****1-NORMALITY ASSUMPTION****

    

* H0: There is an assumption of normal distribution
* H1: There is no assumption of normal distribution

In [15]:
test_stat, pvalue = shapiro(df.loc[df["group"] == "Control_Group", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))#p-value = 0.5891

Test Stat = 0.9773, p-value = 0.5891


In [16]:
test_stat, pvalue = shapiro(df.loc[df["group"] == "Test_Group", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))#p-value = 0.1541

Test Stat = 0.9589, p-value = 0.1541


* H0 cannot be rejected because p-values is greater than 0.05.
* In other words, the Normal distribution assumption is met.

****2-VARIANCE HOMOGENEITY****

* H0: Variances Are Homogeneous
* H1: Variances Are  Not Homogeneous

In [17]:
test_stat, pvalue = levene(df.loc[df["group"] == "Control_Group", "Purchase"],
                           df.loc[df["group"] == "Test_Group", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue)) #p-value = 0.1083

Test Stat = 2.6393, p-value = 0.1083


* H0 cannot be rejected because the p-value is greater than 0.05.
* In other words, the Variance Homogeneity is met

* Since the assumptions are met, parametric testing will be used:



## INDEPENDENT TWO-SAMPLE T TEST:

In [18]:
test_stat, pvalue = ttest_ind(df.loc[df["group"] == "Control_Group", "Purchase"],
                              df.loc[df["group"] == "Test_Group", "Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue)) #p-value = 0.3493

Test Stat = -0.9416, p-value = 0.3493


## RESULT

* H0 cannot be rejected because the p-value is greater than 0.05.
* That is, there is no statistically significant difference in the purchasing averages of the Control and Test groups.