--------------------------------------------------------------------------------------------------------------------------
# Comparison of AB Test and Conversion of Bidding Methods
--------------------------------------------------------------------------------------------------------------------------

## TODO 1:  Data Prepare & Analyze

In [1]:
# import

import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# !pip install statsmodels
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

In [10]:
# Step 1: Read the dataset consisting of control and test group data named ab_testing_data.xlsx. 
# Assign the control and test group data to separate variables.

df_control = pd.read_excel("ab_testing.xlsx", sheet_name="Control Group")
df_test = pd.read_excel("ab_testing.xlsx", sheet_name="Test Group")
df_control["group"] = "Control"
df_test["group"] = "Test"

print("-"*50, "\n df_control head: \n", df_control.head())
print("-"*50, "\n df_test head: \n", df_test.head())

-------------------------------------------------- 
 df_control head: 
       Impression        Click    Purchase      Earning    group
0   82529.459271  6090.077317  665.211255  2311.277143  Control
1   98050.451926  3382.861786  315.084895  1742.806855  Control
2   82696.023549  4167.965750  458.083738  1797.827447  Control
3  109914.400398  4910.882240  487.090773  1696.229178  Control
4  108457.762630  5987.655811  441.034050  1543.720179  Control
-------------------------------------------------- 
 df_test head: 
       Impression        Click    Purchase      Earning group
0  120103.503796  3216.547958  702.160346  1939.611243  Test
1  134775.943363  3635.082422  834.054286  2929.405820  Test
2  107806.620788  3057.143560  422.934258  2526.244877  Test
3  116445.275526  4650.473911  429.033535  2281.428574  Test
4  145082.516838  5201.387724  749.860442  2781.697521  Test


In [3]:
# Step 2: Analyze the control and test group data.

print("-"*50, "\n df_control describe: \n", df_control.describe().T)
print("-"*50, "\n df_test describe: \n", df_test.describe().T)

-------------------------------------------------- 
 df_control describe: 
             count           mean           std           min           25%  \
Impression   40.0  101711.449068  20302.157862  45475.942965  85726.690349   
Click        40.0    5100.657373   1329.985498   2189.753157   4124.304129   
Purchase     40.0     550.894059    134.108201    267.028943    470.095533   
Earning      40.0    1908.568300    302.917783   1253.989525   1685.847205   

                     50%            75%            max  
Impression  99790.701078  115212.816543  147539.336329  
Click        5001.220602    5923.803596    7959.125069  
Purchase      531.206307     637.957088     801.795020  
Earning      1975.160522    2119.802784    2497.295218  
-------------------------------------------------- 
 df_test describe: 
             count           mean           std           min            25%  \
Impression   40.0  120512.411758  18807.448712  79033.834921  112691.970770   
Click        40.0

In [11]:
# Step 3: After the analysis, merge the control and test group data using the concat method.

df = pd.concat([df_control, df_test], keys=("Control", "Test_data"))
print(df)

                 Impression        Click    Purchase      Earning    group
Control   0    82529.459271  6090.077317  665.211255  2311.277143  Control
          1    98050.451926  3382.861786  315.084895  1742.806855  Control
          2    82696.023549  4167.965750  458.083738  1797.827447  Control
          3   109914.400398  4910.882240  487.090773  1696.229178  Control
          4   108457.762630  5987.655811  441.034050  1543.720179  Control
...                     ...          ...         ...          ...      ...
Test_data 35   79234.911929  6002.213585  382.047116  2277.863984     Test
          36  130702.239410  3626.320072  449.824592  2530.841327     Test
          37  116481.873365  4702.782468  472.453725  2597.917632     Test
          38   79033.834921  4495.428177  425.359102  2595.857880     Test
          39  102257.454089  4800.068321  521.310729  2967.518390     Test

[80 rows x 5 columns]


---------------------------------------------------------------------------------------------------------------------------
## Task 2: Defining the Hypothesis of A/B Test

A Group = df_control, This is because the maximum bidding method was applied to this one.

B Group = df_test, This is because the average bidding method was applied to this one.

I want to analyze 3 distinct scenarios.
*  Hypothesis 1): 

    H0: df_control["Purchase"].mean() == df.test["Purchase"].mean() | The average purchase value is equal for 'maximum bidding' and 'average bidding'.
    
    H1: df_control["Purchase"].mean() != df.test["Purchase"].mean() | ..... not equal.

*   Hypothesis 2): 

    x = df_control["Purchase"] / df_control["Earning"] | y = df_test["Purchase"] / df_test["Earning"] 

    H0: x == y | 'Purchase'/'Earning' value is equal for 'maximumbidding' and 'averagebidding'.
    
    H1: x != y | ..... not equal.

*   Hypothesis 3): 

    x = df_control["Click"] / df_control["Impression"] | y = df_test["Click"] / df_test["Impression"] 

    H0: x == y | 'Click'/'Purchase' value is equal for 'maximumbidding' and 'averagebidding'.
    
    H1: x != y | ..... not equal.

---------------------------------------------------------------------------------------------------------------------------
## TASK 3: Performing the Hypothesis Test

In [5]:
# hypothesis test function
# This function was shown to us during the Data Science training I received from 'Miuul'. 
# I liked it a lot, so I wanted to use it here as well.

def AB_Test(dataframe, group, target):

    # Necessary packages
    from scipy.stats import shapiro
    import scipy.stats as stats

    # # Split A/B
    control = dataframe[dataframe[group] == "Control"][target] #Old Design
    test = dataframe[dataframe[group] == "Test"][target] #New Desing

    # Assumption of the Normality 
    normality_control = shapiro(control)[1] < 0.05
    normality_test = shapiro(test)[1] < 0.05

    # H0: Data follow a normal distribution.- False
    # H1: Data do not follow a normal distribution. - True

    if (normality_control == False) & (normality_test == False):  # "H0: Data follow a normal distribution
        # Parametric Test
        # Assumption: Homogeneity of variances

        leveneTest = stats.levene(control, test)[1] < 0.05
        # H0: Homogeneity: False
        # H1: Heterogeneous: True

        if leveneTest == False:
            # Homogeneity
            ttest = stats.ttest_ind(control, test, equal_var=True)[1] # Attention! equal_var=True
            # H0: M1 == M2 - False
            # H1: M1 != M2 - True
        else:
            # Heterogeneous
            ttest = stats.ttest_ind(control, test, equal_var=False)[1] #Attention! equal_var=False
            # H0: M1 == M2 - False
            # H1: M1 != M2 - True
    else:
        # Non-Parametric Test
        ttest = stats.mannwhitneyu(control, test)[1]
        # H0: M1 == M2 - False
        # H1: M1 != M2 - True

    # Result
    temp = pd.DataFrame({
        "AB Hypothesis": [ttest < 0.05],
        "p-value": [ttest]
    })
    temp["Test Type"] = np.where((normality_control == False) & (normality_test == False), "Parametric", "Non-Parametric")
    temp["AB Hypothesis"] = np.where(temp["AB Hypothesis"] == False, "Fail to Reject H0", "Reject H0")
    temp["Comment"] = np.where(temp["AB Hypothesis"] == "Fail to Reject H0", "A/B groups are similar!",
                               "A/B groups are not similar!")

    # Columns
    if (normality_control == False) & (normality_test == False):
        temp["Homogeneity"] = np.where(leveneTest == False, "Yes", "No")
        temp = temp[["Test Type", "Homogeneity", "AB Hypothesis", "p-value", "Comment"]]
    else:
        temp = temp[["Test Type", "AB Hypothesis", "p-value", "Comment"]]

    # Print Hypothesis
    print("# A/B Testing Hypothesis")
    print("H0: A == B")
    print("H1: A != B", "\n")

    return temp

#### Hypothesis 1)

In [12]:
AB_Test(df, "group", "Purchase")

# A/B Testing Hypothesis
H0: A == B
H1: A != B 



Unnamed: 0,Test Type,Homogeneity,AB Hypothesis,p-value,Comment
0,Parametric,Yes,Fail to Reject H0,0.349326,A/B groups are similar!


#### Hypothesis 2)

In [8]:
# We can apply the ratio test because our sample size is greater than 30.

test_stat, pvalue = proportions_ztest(count=[df_control["Purchase"].mean(), df_test["Purchase"].mean()],
                                      nobs=[df_control["Earning"].mean(), df_test["Earning"].mean()])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 4.3151, p-value = 0.0000


#### Hypothesis 3)

In [9]:
test_stat, pvalue = proportions_ztest(count=[df_control["Click"].mean(), df_test["Click"].mean()],
                                      nobs=[df_control["Impression"].mean(), df_test["Impression"].mean()])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 20.4489, p-value = 0.0000


---------------------------------------------------------------------------------------------------------------------------
## TASK 4: Analysis of Results

#### Step 1:Can you interpret the results?
* Hypothesis 1):
  
  Since the p-value is more than 0.05, the null hypothesis (H0) isn't rejected.

* Hypothesis 2):
  
  Since the p-value is less than 0.05, the null hypothesis (H0) is rejected, and the alternative hypothesis (H1) is accepted.

* Hypothesis 3):
  
  Since the p-value is less than 0.05, the null hypothesis (H0) is rejected, and the alternative hypothesis (H1) is accepted.


#### Step 2: Based on the test results, provide recommendations to the customer.

We can say that the Average Bidding model improved the sales/profit ratio in a one-month period. We also observed an increase in our sales during this period. However, while our ads were seen by more customers, the click-through rate per impression decreased.

I think we need to perform another A/B test with more observation units to decide whether to continue using the Average Bidding model or not.