### BUSINESS PROBLEM
- Facebook recently introduced a new bidding type, "average bidding", as an alternative to the existing bidding type called "maximumbidding". One of our clients, bombabomba.com, decided to test this new feature and would like to do an A/B test to see if averagebidding increase interaction effect. 
- A/B testing has been going on for 1 month and bombabomba.com is now asking you to do this. It is waiting for you to analyze the results of the A/B test. 
- The success criterion for Bombambomba.com is Purchase. Therefore, the focus should be on the Purchase metric for statistical testing.

#### DATASET
- In this data set, which includes the website information of a company, there is information such as the number of advertisements that users see and click, as well as earnings information from here. There are two separate data sets, the Control and Test group. These datasets are on separate sheets of the ab_testing.xlsx excel. Maximum Bidding was applied to the control group and AverageBidding was applied to the test group.

- impression: advertisement views count
- Click: Number of clicks
- Purchase: The number of products purchased after the advertisements clicked
- Earning: Earnings after purchased products

#### AB TESTING (Independent Sample T-Test)
- 1. Determine null (H0) and alternative (H1) hypotheses
- 2. Control assumptions
    * Normal distribution (shapiro test)
    * Homogeneity of variance (levene test)
- 3. Apply the hypotheses
    * Use independent two-sample t-test if assumptions are met 
    * Use mannwhitneyu test if assumptions are not met
- 4. Comment the results according to p-value

#### TASK1: Prepare the dataset

In [1]:
import pandas as pd
from scipy.stats import shapiro, levene, ttest_ind, mannwhitneyu
from scipy import stats

In [2]:
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [3]:
#MAXIMUM BIDDING
control_data = pd.read_excel("ab_testing.xlsx",sheet_name="Control Group")

#AVERAGE BIDDING
test_data = pd.read_excel("ab_testing.xlsx",sheet_name="Test Group")

In [4]:
def check_df(dataframe, head=5):
    print("INFO".center(70,'='))
    print(dataframe.info())

    print("SHAPE".center(70,'='))
    print('Rows: {}'.format(dataframe.shape[0]))
    print('Columns: {}'.format(dataframe.shape[1]))

    print("TYPES".center(70,'='))
    print(dataframe.dtypes)

    print("HEAD".center(70, '='))
    print(dataframe.head(head))

    print("TAIL".center(70,'='))
    print(dataframe.tail(head))

    print("NULL".center(70,'='))
    print(dataframe.isnull().sum())

    print("QUANTILES".center(70,'='))
    print(dataframe.describe().T)

In [5]:
check_df(control_data)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB
None
Rows: 40
Columns: 4
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
    Impression      Click  Purchase    Earning
0  82529.45927 6090.07732 665.21125 2311.27714
1  98050.45193 3382.86179 315.08489 1742.80686
2  82696.02355 4167.96575 458.08374 1797.82745
3 109914.40040 4910.88224 487.09077 1696.22918
4 108457.76263 5987.65581 441.03405 1543.72018
     Impression      Click  Purchase    Earning
35 132064.21900 3747.15754 551.07241 2256.97559
36  86409.94180 4608.25621 345.04603 1781.35769
37 123678.93423 3649.07379 476.16813 2187.72122
38 101997.4

In [6]:
check_df(test_data)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Impression  40 non-null     float64
 1   Click       40 non-null     float64
 2   Purchase    40 non-null     float64
 3   Earning     40 non-null     float64
dtypes: float64(4)
memory usage: 1.4 KB
None
Rows: 40
Columns: 4
Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object
    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488
3 116445.27553 4650.47391 429.03353 2281.42857
4 145082.51684 5201.38772 749.86044 2781.69752
     Impression      Click  Purchase    Earning
35  79234.91193 6002.21358 382.04712 2277.86398
36 130702.23941 3626.32007 449.82459 2530.84133
37 116481.87337 4702.78247 472.45373 2597.91763
38  79033.8

In [7]:
df = pd.concat([control_data, test_data], ignore_index=True)
df

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.40040,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018
...,...,...,...,...
75,79234.91193,6002.21358,382.04712,2277.86398
76,130702.23941,3626.32007,449.82459,2530.84133
77,116481.87337,4702.78247,472.45373,2597.91763
78,79033.83492,4495.42818,425.35910,2595.85788


#### TASK2: DETERMINE HYPOTHESES
* H0: M1 = M2   THERE IS NO STATISTICAL DIFFERENCE BETWEEN MAXIMUM BIDDING AND AVERAGE BIDDING PURCHASE AVERAGES
* H1: M1 != M2  THERE IS STATISTICAL DIFFERENCE BETWEEN MAXIMUM BIDDING AND AVERAGE BIDDING PURCHASE AVERAGES

In [8]:
# Analyze control and test data according to purchase average
control_data["Purchase"].mean()

550.8940587702316

In [9]:
test_data["Purchase"].mean()

582.1060966484677

In [10]:
#H0: THE DATA IS NORMALLY DISTRIBUTED
def normal_distribution(df, alpha=0.05):
    test_stat, pvalue = shapiro(df["Purchase"])
    if pvalue > alpha:
        result = "P > "+ str(alpha) + " H0 not rejected"
    else:
        result = "P < "+ str(alpha) + " H0 rejected"

    print("--------------------------")
    print(f"Test Statistic: {test_stat}")
    print(f"P-value: {pvalue}")
    print(f"Result: {result}")
    print("--------------------------")

In [11]:
normal_distribution(control_data)

--------------------------
Test Statistic: 0.9772694110870361
P-value: 0.5891125202178955
Result: P > 0.05 H0 not rejected
--------------------------


In [12]:
normal_distribution(test_data)

--------------------------
Test Statistic: 0.9589453935623169
P-value: 0.15413342416286469
Result: P > 0.05 H0 not rejected
--------------------------


In [13]:
#H0: THE VARIANCE IS HOMOGENEOUS
def variance_homogeneity(df, df2, alpha=0.05):
    test_stat, pvalue = levene(df["Purchase"],df2["Purchase"])
    if pvalue > alpha:
        result = "P > "+ str(alpha) + " H0 not rejected"
    else:
        result = "P < "+ str(alpha) + " H0 rejected"

    print("--------------------------")
    print(f"Test Statistic: {test_stat}")
    print(f"P-value: {pvalue}")
    print(f"Result: {result}")
    print("--------------------------")

In [14]:
variance_homogeneity(control_data,test_data)

--------------------------
Test Statistic: 2.6392694728747363
P-value: 0.10828588271874791
Result: P > 0.05 H0 not rejected
--------------------------


In [15]:
def t_test(df,df2,alpha=0.05):
    test_stat, pvalue = ttest_ind(df["Purchase"],df2["Purchase"], equal_var=True)
    if pvalue > alpha:
        result = "P > "+ str(alpha) + " H0 not rejected"
    else:
        result = "P < "+ str(alpha) + " H0 rejected"

    print("--------------------------")
    print(f"Test Statistic: {test_stat}")
    print(f"P-value: {pvalue}")
    print(f"Result: {result}")
    print("--------------------------")

In [16]:
t_test(control_data,test_data)

--------------------------
Test Statistic: -0.9415584300312966
P-value: 0.34932579202108416
Result: P > 0.05 H0 not rejected
--------------------------


#### H0 not rejected. THERE IS NO STATISTICAL DIFFERENCE BETWEEN MAXIMUM BIDDING AND AVERAGE BIDDING PURCHASE AVERAGES