# AB Test: Comparing Bidding Methods' Conversion

#### 1) BUSINESS PROBLEM

Facebook recently introduced a new bidding method called "average bidding" as an alternative to the existing "maximum bidding" method. One of our clients, bombabomba.com, has decided to test this new feature and wants to conduct an A/B test to determine if average bidding brings in more conversions compared to maximum bidding.

The A/B test has been running for 1 month, and bombabomba.com is now looking for you to analyze the results of this A/B test. The ultimate success metric for bombabomba.com is "Purchase." Therefore, the focus should be on the Purchase metric for statistical testing.

The dataset used in this project contains information about a company's website, including the number of ad impressions, clicks on ads, and earnings generated from these ads. There are two separate datasets for the control and test groups, which can be found in the ab_testing.xlsx file on different sheets. Maximum Bidding was applied to the control group, while Average Bidding was applied to the test group.

### Dataset Story

This dataset comprises information related to a company's website, encompassing details such as the quantity of ads viewed and clicked by users, alongside revenue figures derived from these interactions. Two distinct datasets are available, designated for the control and test groups, respectively. These datasets are organized on separate pages within the spreadsheet file. Maximum bidding has been implemented for the control group, while the test group is subjected to average bidding.

### Variables

- **Impression**: Number of ad impressions
- **Click**: Number of clicks on ads
- **Purchase**: Number of products purchased after clicking on ads
- **Earning**: Revenue generated from the purchased products

### 2 ) Data Understanding

In [2]:
## Import library and functions

import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
!pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest
from statsmodels.stats.multicomp import MultiComparison

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)



In [3]:
# Load the dataset

df_control = pd.read_excel("ab_testing.xlsx",sheet_name="Control Group")

df_test = pd.read_excel ("ab_testing.xlsx", sheet_name="Test Group")

In [4]:
# Display the datasets first few rows

df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


In [5]:
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [6]:
## The shape of dataset

df_control.shape


(40, 4)

In [7]:
df_test.shape

(40, 4)

In [8]:
# Display information about the dataset

df_test.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


In [9]:
df_control.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB


#### 3 ) Data Preparing and Analysis

In [10]:
## Summary of descriptive statistics for a DataFrames

df_control.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,101711.44907,20302.15786,45475.94296,85726.69035,99790.70108,115212.81654,147539.33633
Click,40.0,5100.65737,1329.9855,2189.75316,4124.30413,5001.2206,5923.8036,7959.12507
Purchase,40.0,550.89406,134.1082,267.02894,470.09553,531.20631,637.95709,801.79502
Earning,40.0,1908.5683,302.91778,1253.98952,1685.8472,1975.16052,2119.80278,2497.29522


In [11]:
df_test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,120512.41176,18807.44871,79033.83492,112691.97077,119291.30077,132050.57893,158605.92048
Click,40.0,3967.54976,923.09507,1836.62986,3376.81902,3931.3598,4660.49791,6019.69508
Purchase,40.0,582.1061,161.15251,311.62952,444.62683,551.35573,699.86236,889.91046
Earning,40.0,2514.89073,282.73085,1939.61124,2280.53743,2544.66611,2761.5454,3171.48971


In [12]:
## The total of missing values.

df_test.isnull().sum().any()

False

In [13]:
df_control.isnull().sum().any()

False

In [17]:
## Check the mean valueof two datasets

df_test["Purchase"].mean()

582.1060966484675

In [18]:
df_control["Purchase"].mean()

550.8940587702316

In [14]:
## Create a new column and Add labels to control and test datasets

df_control["Group"]="C"

df_test["Group"]="T"

In [16]:
# CoNCAT control and test datasets

df= pd.concat([df_control, df_test], axis=0, ignore_index=False)

df

Unnamed: 0,Impression,Click,Purchase,Earning,Group
0,82529.45927,6090.07732,665.21125,2311.27714,C
1,98050.45193,3382.86179,315.08489,1742.80686,C
2,82696.02355,4167.96575,458.08374,1797.82745,C
3,109914.40040,4910.88224,487.09077,1696.22918,C
4,108457.76263,5987.65581,441.03405,1543.72018,C
...,...,...,...,...,...
35,79234.91193,6002.21358,382.04712,2277.86398,T
36,130702.23941,3626.32007,449.82459,2530.84133,T
37,116481.87337,4702.78247,472.45373,2597.91763,T
38,79033.83492,4495.42818,425.35910,2595.85788,T


In [19]:
df["Purchase"].mean()

566.5000777093495

#### Defining the Hypothesis of A/B Testing

In [20]:
#HO: M1=M2 (Control group -(averagebidding) - Test_group)maximumbidding) 

##There is no difference between the purchasing averages of the control group and the test group.

#H1: M1 != M2  ( Has difference)

# P VALUE < 0.05 H0 (Reject)

# P VALUE > 0.05 (cannot be rejected)

In [21]:
### Average of Purchases by control and test groups

df.groupby("Group")["Purchase"].mean()

Group
C   550.89406
T   582.10610
Name: Purchase, dtype: float64

#### Assumption Checking

- Verify the assumptions required for the t-test, including:

**H0** It provides normal distribution.
**H1** It dosen not provide normal distribution.
    
***Normality***: Conduct tests such as the Shapiro Test or Anderson-Darling test to assess normality.

***Homogeneity of variance***: Utilize tests like Levene's or Test Bartlett's Test to evaluate homogeneity of variances.

***1) Normality Check**

In [22]:
# Normality test on the control group

test_stat, pvalue = shapiro(df.loc[df["Group"] == "C", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

#print(f"- Normality is {'not satisfied' if p_value < alpha else 'satisfied'} for the test group")


Test Stat = 0.9773, p-value = 0.5891


0.05891 > 0.05-- HO CANNOT BE REJECTED -- Normality is satisfied for the control group

In [23]:
# Normality test on the test group

test_stat, pvalue = shapiro(df.loc[df["Group"] == "T", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))


Test Stat = 0.9589, p-value = 0.1541


 0.1541 > 0.05 -- 0.05891 > 0.05-- HO CANNOT BE REJECTED -- Normality is satisfied for the test group

***2) Homogeneity of variance Check**

In [24]:
test_stat, pvalue = levene(df.loc[df["Group"] == "C", "Purchase"],
                           df.loc[df["Group"] == "T", "Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


0.1083 > 0.05  HO CANNOT BE REJECTED---- Homogeneity of variance is satisfied

#### Application of the Hypothesis Test

Both tests failed to reject the null hypothesis (H0), indicating that both normality and homogeneity assumptions are satisfied.

When the assumptions are satisfied, the **Independent Two-Sample T-test** will be used.

In [25]:
test_stat, pvalue = ttest_ind(df.loc[df["Group"] == "C", "Purchase"],
                              df.loc[df["Group"] == "T", "Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


0.3493 > 0.05  HO CANNOT BE REJECTED-- 

There is no difference between the purchasing averages of the control group and the test group.