<a href="https://www.kaggle.com/code/osmanacar/a-b-testing-analysis?scriptVersionId=187686767" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**Business Problem**

Facebook introduced new bidding system as "average bidding" instead of "maximum bidding" for users.The main goals by Facebook they want to see if it brings more conversions. So, they want to do A/B Testing. It should be focus for "Purchase" value in this problem

**Dataset Information**

There is two dataset as Control and Test. 

Control dataset for Maximum Bidding

Test dataset for Average Bidding

* Impression: Viewing count
* Click: Click's count on the ad
* Purchase: The number of items purchased
* Earning: The number of monet earning

In [1]:
import pandas as pd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import shapiro, levene, ttest_ind


pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [2]:
df_control = pd.read_excel("/kaggle/input/ab-testing/ab_testing.xlsx", sheet_name="Control Group")
df_control["Group"] = "Control"
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Group
0,82529.45927,6090.07732,665.21125,2311.27714,Control
1,98050.45193,3382.86179,315.08489,1742.80686,Control
2,82696.02355,4167.96575,458.08374,1797.82745,Control
3,109914.4004,4910.88224,487.09077,1696.22918,Control
4,108457.76263,5987.65581,441.03405,1543.72018,Control


In [3]:
df_test = pd.read_excel("/kaggle/input/ab-testing/ab_testing.xlsx", sheet_name="Test Group")
df_test["Group"] = "Test"
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Group
0,120103.5038,3216.54796,702.16035,1939.61124,Test
1,134775.94336,3635.08242,834.05429,2929.40582,Test
2,107806.62079,3057.14356,422.93426,2526.24488,Test
3,116445.27553,4650.47391,429.03353,2281.42857,Test
4,145082.51684,5201.38772,749.86044,2781.69752,Test


In [4]:
df = pd.concat([df_control, df_test], axis = 0)
df

Unnamed: 0,Impression,Click,Purchase,Earning,Group
0,82529.45927,6090.07732,665.21125,2311.27714,Control
1,98050.45193,3382.86179,315.08489,1742.80686,Control
2,82696.02355,4167.96575,458.08374,1797.82745,Control
3,109914.40040,4910.88224,487.09077,1696.22918,Control
4,108457.76263,5987.65581,441.03405,1543.72018,Control
...,...,...,...,...,...
35,79234.91193,6002.21358,382.04712,2277.86398,Test
36,130702.23941,3626.32007,449.82459,2530.84133,Test
37,116481.87337,4702.78247,472.45373,2597.91763,Test
38,79033.83492,4495.42818,425.35910,2595.85788,Test


**Defining A/B Test Hypothesis**

H0: M1 = M2 // Average Bidding is getting conversion more than Maximum Bidding

H1: M1 != M2 // is not!

p < 0.05 H0 is rejected!

p > 0.05 H0 is NOT rejected

Implementing A/B Test Hypotheses

1 - Set hypotheses

2 - Check the hypotheses

    a : Normality Assumption (shapiro)
    b : Homogeneity of Variance (levene)
3 - Run the hypotheses

    a : Apply "ttest_ind" parametric test if assumptions are met
    b : Apply "mannwhitneyu" non-parametric test if assumptions are not met

In [5]:
# Firstly we are calculating Purchase mean() for each two group
df.groupby("Group").agg({"Purchase": "mean"})

Unnamed: 0_level_0,Purchase
Group,Unnamed: 1_level_1
Control,550.89406
Test,582.1061


# Normality Assumption

In [6]:
test_stat, pvalue = shapiro(df.loc[df["Group"] == "Control", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

# p-value : 0.5891 > 0.05 
# H0 is NOT rejected

Test Stat = 0.9773, p-value = 0.5891


In [7]:
test_stat, pvalue = shapiro(df.loc[df["Group"] == "Test", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))
# p-value : 0.1541 > 0.05 
# H0 is NOT rejected

Test Stat = 0.9589, p-value = 0.1541


# Homogeneity of Variance

In [8]:
test_stat, pvalue = levene(df.loc[df["Group"] == "Control", "Purchase"],
                           df.loc[df["Group"] == "Test", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

# p-value: 0.1083 > 0.05
# H0 is NOT rejected

Test Stat = 2.6393, p-value = 0.1083


In [9]:
# Assumptions are met, we can use ttest_ind
test_stat, pvalue = ttest_ind(df.loc[df["Group"] == "Control", "Purchase"],
                              df.loc[df["Group"] == "Test", "Purchase"],
                              equal_var=True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

# equal_var is True because assumptions are met. If assumptions are not met, equal_var was False
# p-value: 0.3493
# H0 is NOT rejected

Test Stat = -0.9416, p-value = 0.3493
