**Work Problem**

**Facebook recently introduced a new bidding type, "average bidding", as an alternative to the existing bidding type called "maximum bidding".**

**One of our clients, bombabomba.com, decided to test this new feature and would like to run an A/B test to see if average bidding brings more conversions than maximum bidding.**


**The A/B test has been running for 1 month and bombabomba.com is now waiting for you to analyse the results of this A/B test.**

**The ultimate success metric for Bombabomba.com is Purchase. Therefore, for statistical tests, the Purchase metric should be focussed on.**

**Data Set Story**

**This data set, which contains a company's website information, includes information such as the number of advertisements seen and clicked by users, as well as information about the earnings from these advertisements.**

**There are two separate data sets as Control and Test group.** 

**These data sets are located in separate sheets of ab_testing.xlsx excel.**

**Maximum Bidding for the control group and Average Bidding for the test group. Bidding has been implemented.**


**impression: Number of ad views
Click: Number of clicks on the displayed advert
Purchase: Number of products purchased after clicked adverts
Earning: Earning after purchased products**


# **Project Tasks**

# **Task 1: Preparing and Analysing Data**

In [1]:
import pandas as pd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import shapiro, levene, ttest_ind
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

dataframe_control = pd.read_excel("/kaggle/input/ab-test12/ab_testing.xlsx" , sheet_name="Control Group")
dataframe_test = pd.read_excel("/kaggle/input/ab-test12/ab_testing.xlsx" , sheet_name="Test Group")

df_control = dataframe_control.copy()
df_test = dataframe_test.copy()



**Step 2: Analyse the control and test group data.**

In [2]:

def check_df(dataframe, head=5):
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Types #####################")
    print(dataframe.dtypes)
    print("##################### Head #####################")
    print(dataframe.head())
    print("##################### Tail #####################")
    print(dataframe.tail())
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("##################### Quantiles #####################")
    print(dataframe.quantile([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df_control)
check_df(df_test)

##################### Shape #####################
(40, 4)
##################### Types #####################
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
##################### Head #####################
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
##################### Tail #####################
     Impression      Click  Purchase    Earning
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.25621 345.04603 1781.35769
37 123678.93423 3649.07379 476.16813 2187.72122
38 101997.49410 4736.35337 474.61354 2254.56383
39 121085.88122 4285.17861 590.40602 1289.30895
##################### NA #####################
Impression    0
Click         0
Purchase      0
Earning       0
dtype: int6

**Step 3: After the analysis, combine the control and test group data using the concat method.**

In [3]:
df_control["group"] = "control"
df_test["group"] = "test"

df = pd.concat([df_control,df_test], axis=0,ignore_index=False)
df.head()


Unnamed: 0,Impression,Click,Purchase,Earning,group
0,82529.45927,6090.07732,665.21125,2311.27714,control
1,98050.45193,3382.86179,315.08489,1742.80686,control
2,82696.02355,4167.96575,458.08374,1797.82745,control
3,109914.4004,4910.88224,487.09077,1696.22918,control
4,108457.76263,5987.65581,441.03405,1543.72018,control


In [4]:
df.tail()

Unnamed: 0,Impression,Click,Purchase,Earning,group
35,79234.91193,6002.21358,382.04712,2277.86398,test
36,130702.23941,3626.32007,449.82459,2530.84133,test
37,116481.87337,4702.78247,472.45373,2597.91763,test
38,79033.83492,4495.42818,425.3591,2595.85788,test
39,102257.45409,4800.06832,521.31073,2967.51839,test


# **Task 2: Defining the Hypothesis of the A/B Test**

**Step 1: Define the hypothesis.**

* **H0 : M1 = M2 (There is no difference between control group and test group purchase averages.)**
 

* **H1 : M1!= M2 (There is a difference between control group and test group purchase averages.)**
 


**Step 2: Analyse the mean purchase (gain) for the control and test group**

In [5]:
df.groupby("group").agg({"Purchase": "mean"})

Unnamed: 0_level_0,Purchase
group,Unnamed: 1_level_1
control,550.89406
test,582.1061


# **TASK 3: Hypothesis Testing**

**Step 1: Make assumption checks before hypothesis testing. These are the Assumption of Normality and Homogeneity of Variance.**

**Test whether the control and test groups separately meet the normality assumption on the Purchase variable**
# Assumption of Normality :

* **H0: The assumption of normal distribution is met.**
 
* **H1: The assumption of normal distribution is not met**

* **p < 0.05 H0 is rejected.**

* **p > 0.05 H0 cannot be rejected.**


**According to the test result, does the normality assumption for the control and test groups hold?**

**Interpret the p-values obtained.**

In [6]:
test_stat, pvalue = shapiro(df.loc[df["group"] == "control", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


**p-value=0.5891
H0 cannot be rejected. The values of the control group fulfil the assumption of normal distribution.**

# Homogeneity of Variance :

* **H0: Variances are homogeneous.**
* **H1: Variances are not homogeneous.**

* **p < 0.05 H0 is rejected.**

* **p > 0.05 H0 cannot be rejected.**

**Test whether homogeneity of variance is achieved for the control and test groups on the Purchase variable.**

**Is the normality assumption met according to the test result? Interpret the p-values obtained.**

In [7]:
test_stat, pvalue = levene(df.loc[df["group"] == "control", "Purchase"],
                           df.loc[df["group"] == "test", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


**p-value=0.1083
H0 cannot be rejected. The values of the Control and Test group satisfy the assumption of variance homogeneity.
Variances are homogeneous.**

**Step 2: Select the appropriate test based on the Assumption of Normality and Homogeneity of Variance results**

**Since the assumptions are met, an independent two sample t test (parametric test) is performed.**

* **H0: M1 = M2 (There is no statistically significant difference between the control group and test group purchase averages)**
* **H1: M1 != M2 (There is a statistically significant difference between the control group and test group purchase averages)**
* **p < 0.05 H0 is rejected.**

* **p > 0.05 H0 cannot be rejected.**

In [8]:
test_stat, pvalue = ttest_ind(df.loc[df["group"] == "control", "Purchase"],
                              df.loc[df["group"] == "test", "Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


**Step 3: Considering the p_value obtained as a result of the test, the control and test group purchase interpret whether there is a statistically significant difference between the averages.**

**p-value=0.3493**
**HO cannot be rejected. There is no statistically significant difference between the control and test group purchase averages.**