Facebook recently introduced a new bidding type, “average bidding”, as an alternative to its
exisiting bidding type, called “maximum bidding”. One of our clients, bombabomba.com, has
decided to test this new feature and wants to conduct an A/B test to understand if average
bidding brings more conversions than maximum bidding.
In this A/B test, bombabomba.com randomly splits its audience into two equally sized
groups, e.g. the test and the control group. A Facebook ad campaign with “maximum
bidding” is served to “control group” and another campaign with “average bidding” is served
to the “test group”.

The null hypothesis here is this: 
There isn't a meaningful statistical difference between average and maximum bidding for the number of purchase 

If this hypothesis is false it can be inferred that average bidding increases the number of purchase

In [1]:
# 2 SAMPLE INDEPENDENT T-TEST (AB TEST)

# HYPOTHESIS
# H0: There isn't a meaningful statistical difference between average and maximum bidding for the number of purchase 
# H1: ... there is a meaningful statitistical difference

In [2]:
import numpy as np
import pandas as pd
import statsmodels.stats.api as sms
import scipy.stats as stats
from scipy.stats import shapiro
from statsmodels.stats.proportion import proportions_ztest


control = pd.read_excel("ab_testing_data.xlsx", sheet_name="Control Group")
test = pd.read_excel("ab_testing_data.xlsx", sheet_name="Test Group")

In [3]:
control.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.459271,6090.077317,665.211255,2311.277143
1,98050.451926,3382.861786,315.084895,1742.806855
2,82696.023549,4167.96575,458.083738,1797.827447
3,109914.400398,4910.88224,487.090773,1696.229178
4,108457.76263,5987.655811,441.03405,1543.720179


In [4]:
test.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.503796,3216.547958,702.160346,1939.611243
1,134775.943363,3635.082422,834.054286,2929.40582
2,107806.620788,3057.14356,422.934258,2526.244877
3,116445.275526,4650.473911,429.033535,2281.428574
4,145082.516838,5201.387724,749.860442,2781.697521


In [5]:
# Since we are interested in the number of purchases, we are going to create dataframes that have only purchase variable.
purc_control = control[['Purchase']]
purc_test = test[['Purchase']]

In [6]:
type(purc_test)

pandas.core.frame.DataFrame

In [7]:
purc_test.head()

Unnamed: 0,Purchase
0,702.160346
1,834.054286
2,422.934258
3,429.033535
4,749.860442


In [8]:
# It will be checked if there are outliers in the purchase dataframes for control and test groups.
# Outlier functions which were written during previous classes are used. 

def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.25)
    quartile3 = dataframe[variable].quantile(0.75)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit


def has_outliers(dataframe, num_col_names, plot=False):
    variable_names = []
    for col in num_col_names:
        low_limit, up_limit = outlier_thresholds(dataframe, col)
        if dataframe[(dataframe[col] > up_limit) | (dataframe[col] < low_limit)].any(axis=None):
            number_of_outliers = dataframe[(dataframe[col] > up_limit) | (dataframe[col] < low_limit)].shape[0]
            print(col, ":", number_of_outliers)
            variable_names.append(col)
            if plot:
                sns.boxplot(x=dataframe[col])
                plt.show()
    return variable_names

def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

has_outliers(purc_control, ["Purchase"])

[]

In [9]:
has_outliers(purc_test, ["Purchase"])

[]

In [10]:
purc_test.columns[0]

'Purchase'

In [11]:
has_outliers(purc_test, purc_test.columns)

[]

In [12]:
# There aren't any outliers.
# The two dataframes are combined with concat method as seen below.
df = pd.concat([purc_control, purc_test], axis=1)
df.columns = ["Control", "Test"]
df.head()

Unnamed: 0,Control,Test
0,665.211255,702.160346
1,315.084895,834.054286
2,458.083738,422.934258
3,487.090773,429.033535
4,441.03405,749.860442


In [13]:
# Normal Distribution Assumption - Shapiro-Wilks Test
# H0: there is no statistical meaningful difference between the sample and the theoretical normal distributions.
# H1: ...there is a difference

def shapiro_test(data_frame):
    sh = shapiro(data_frame)
    print("T calculation statistics for Shapiro-Wilks Test: " + str(sh[0]))
    print("Calculated p-value for Shapiro-Wilks Test: " + str(sh[1]))
    if sh[1] > 0.05:
        print("The distribution is normal.")
    else:
        print("The distribution is not normal.")

# For the control data:
shapiro_test(purc_control)

T calculation statistics for Shapiro-Wilks Test: 0.9772694110870361
Calculated p-value for Shapiro-Wilks Test: 0.5891125202178955
The distribution is normal.


In [14]:
# For the test data:
shapiro_test(purc_test)

T calculation statistics for Shapiro-Wilks Test: 0.9589452147483826
Calculated p-value for Shapiro-Wilks Test: 0.15413185954093933
The distribution is normal.


In [15]:
# Varience homogeneity assumption - Levene Test
# H0: variances are homogenous
# H1: variances are not homogenous

def levene_test(df1, df2):
    lh = stats.levene(df1, df2)
    print("T calculation statistics for Levene Test Test " + str(lh[0]))
    print("Calculated p-value for Levene Test: " + str(lh[1]))
    if lh[1] > 0.05:
        print("Variances are homogenous.")
    else:
        print("Variances are not homogenous.")

levene_test(purc_control["Purchase"], purc_test["Purchase"])

T calculation statistics for Levene Test Test 2.6392694728747363
Calculated p-value for Levene Test: 0.10828588271874791
Variances are homogenous.


In [16]:
# Calculation of p-value for 2 SAMPLE INDEPENDENT T-TEST (AB TEST)
test_stat, pvalue = stats.ttest_ind(df["Control"], df["Test"], equal_var=True)
# Since the varience homogeneity test is satisfied equal_var is returned as True
print('Test statistics = %.4f, p-value = %.4f' % (test_stat, pvalue))
if pvalue < 0.05:
    print("Hypothesis is rejected.")
else:
    print("Hypothesis is accepted.")


Test statistics = -0.9416, p-value = 0.3493
Hypothesis is accepted.


As a result there isn't a meaningful statistical difference between average and maximum bidding in terms of number of purchases. 

In [17]:
#click/impression - Conversion Rate
print(f"Website average click through rate in maximum bidding: {round((control.Click / control.Impression).mean()*100, 2)} %")
print(f"Website average click through rate in average bidding: {round((test.Click / test.Impression).mean()*100, 2)} %")

Website average click through rate in maximum bidding: 5.36 %
Website average click through rate in average bidding: 3.42 %


In [18]:
# Number of total impression and clicks are created in the form of arrays for control and test groups.
number_of_impressions = np.array([control.Impression.sum(), test.Impression.sum()])
number_of_clicks = np.array([control.Click.sum(), test.Click.sum()])

In [19]:
number_of_impressions

array([4068457.96270789, 4820496.47030138])

In [20]:
number_of_clicks

array([204026.29490309, 158701.99043224])

In [21]:
# Conversion rate for this problem equals to click / impression 
# H0: there is no statistical meaningful difference between the conversion rates of test and control data
# H1: ...there is a difference
test_stat, pvalue = proportions_ztest(count=number_of_clicks, nobs=number_of_impressions)
if pvalue < 0.05:
    print("Hypothesis is rejected.")
else:
    print("Hypothesis is accepted.")


Hypothesis is rejected.


As a result there is a meaningful statistical difference between website click through rates of test and control data.
Despite the fact that maximum bidding gives greater conversion rate in terms of purchase, statistically there is no difference and this brings lower cost per clicks. 

In [22]:
# Let's create a function to automate the ab test process.

def ab_test(df1, df2):
    # Outlier check and if there are outliers, replace them with upper or lower thresholds:
    a = has_outliers(df1, df1.columns)
    b = has_outliers(df2, df2.columns)
    if a == []:
        print("Control dataframe has no outliers.")
        pass
    else:
        replace_with_thresholds(df1, df1.columns[0])
        print("Control dataframe has outliers and outliers has been replaced with upper or lower thresholds.")
    if b == []:
        print("Test dataframe has no outliers.")
        pass
    else:
        replace_with_thresholds(df2, df2.columns[0])
        print("Test dataframe has outliers and outliers has been replaced with upper or lower thresholds.")
        
    sh1 = shapiro(df1)
    sh2 = shapiro(df2)
    if sh1[1] > 0.05 and sh2[1] > 0.05:
        head1 = df1.columns[0]
        head2 = df2.columns[0]
        lh = stats.levene(df1[head1], df2[head2])
        if lh[1] > 0.05:
            test_stat, pvalue = stats.ttest_ind(df1, df2, equal_var=True)
            if pvalue < 0.05:
                print("Hypothesis is rejected.")
            else:
                print("Hypothesis is accepted.")
        else:
            test_stat, pvalue = stats.ttest_ind(df1, df2, equal_var=False)
            if pvalue < 0.05:
                print("Hypothesis is rejected.")
            else:
                print("Hypothesis is accepted.")
    else:
        test_stat, pvalue = stats.mannwhitneyu(df1, df2)
        if pvalue < 0.05:
            print("Hypothesis is rejected.")
        else:
            print("Hypothesis is accepted.")

In [23]:
# Lastly we are going to compare the earnings between the two datasets. 
# Total and mean values of earning are calculated as seen below:

print("Mean value of earning for control group: " + str(control["Earning"].mean()))
print("Mean value of earning for test group: " + str(test["Earning"].mean()))

Mean value of earning for control group: 1908.568299802749
Mean value of earning for test group: 2514.8907326506173


In [24]:
print(f"Total value of earning for control group: {control.Earning.sum()}")
print(f"Total value of earning for test group: {test.Earning.sum()}")

Total value of earning for control group: 76342.73199210996
Total value of earning for test group: 100595.6293060247


In [25]:
earn_control = control[["Earning"]]
earn_test = test[["Earning"]]

In [26]:
# H0: there is no statistical meaningful difference between earnings of control and test data sets.
# H1: ...there is a difference

ab_test(earn_control, earn_test)

Control dataframe has no outliers.
Test dataframe has no outliers.
Hypothesis is rejected.


Hypothesis is rejected. That means there is a statistical meaningful difference between the average and maximum bidding in terms of earnings. 

And since mathematically earning rates are better in favors of average bidding. 

Based on the observations we gathered above and this last earning comparison I definitely recommend average bidding.  