# **AB TESTING PROJECT**

### **BUSINESS PROBLEM**:  A social media  company recently introduced a new type of bidding ,average bidding, as an alternative to existing maximum bidding. One of our customers decided to test this new feature and wants to implement A/B Test in order to see whether average bidding brings more revenue than maximum bidding.  

### **STORY OF THE DATASET**: In "ab_testing.xlsx" dataset, which includes website information of the customer, there is information such as the number of advertisements that users see and click, as well as earnings information from here.

### We have two groups in the dataset as control group and test group

### **ATTRIBUTES**:

#### - Impression: Number of ad views
#### - Click: Indicates the number of clicks on the displayed ad
#### - Purchase: Indicates the number of products purchased after the ads clicked
#### - Earning: Earnings after purchased products

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/ab-testingxlsx/ab_testing.xlsx


## **1. LOAD AND CHECK DATA SET** 

In [2]:
# Let's load the data set 
import pandas as pd
!pip install openpyxl
df_control = pd.read_excel("../input/ab-testingxlsx/ab_testing.xlsx", sheet_name = "Control Group")
df_test = pd.read_excel("../input/ab-testingxlsx/ab_testing.xlsx", sheet_name = "Test Group")

Collecting openpyxl
  Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
[K     |████████████████████████████████| 242 kB 596 kB/s 
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9


In [3]:
# Make necessary adjustments for the representation of the dataset
import itertools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

In [4]:
# In order to check Control Group and Test Group 
def check_df(dataframe):
    """
    This function prints "shape, types, head, NA values and quantiles" for a given dataframe. 
    
    Parameters
    ----------
    dataframe: dataframe
        Given dataframe for which "shape, types, head, NA values and quantiles" will be shown. 
    
    
    Returns
    -------
    None
        
    """
    print(f"""
        ##################### Shape #####################\n\n\t{dataframe.shape}\n\n
        ##################### Types #####################\n\n{dataframe.dtypes}\n\n
        ##################### Head #####################\n\n{dataframe.head(3)}\n\n
        ##################### NA #####################\n\n{dataframe.isnull().sum()}\n\n
        ##################### Quantiles #####################\n\n{dataframe.quantile([0, 0.05, 0.25, 0.50, 0.75, 0.95, 0.99, 1]).T}\n\n""")


In [5]:
# Review Control Group
check_df(df_control)


        ##################### Shape #####################

	(40, 4)


        ##################### Types #####################

Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object


        ##################### Head #####################

   Impression      Click  Purchase    Earning
0 82529.45927 6090.07732 665.21125 2311.27714
1 98050.45193 3382.86179 315.08489 1742.80686
2 82696.02355 4167.96575 458.08374 1797.82745


        ##################### NA #####################

Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64


        ##################### Quantiles #####################

               0.00000     0.05000     0.25000     0.50000      0.75000  \
Impression 45475.94296 79412.01792 85726.69035 99790.70108 115212.81654   
Click       2189.75316  3367.48426  4124.30413  5001.22060   5923.80360   
Purchase     267.02894   328.66242   470.09553   531.20631    637.95709   
Earning     1253.98952  

In [6]:
# Review Test Group
check_df(df_test)


        ##################### Shape #####################

	(40, 4)


        ##################### Types #####################

Impression    float64
Click         float64
Purchase      float64
Earning       float64
dtype: object


        ##################### Head #####################

    Impression      Click  Purchase    Earning
0 120103.50380 3216.54796 702.16035 1939.61124
1 134775.94336 3635.08242 834.05429 2929.40582
2 107806.62079 3057.14356 422.93426 2526.24488


        ##################### NA #####################

Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64


        ##################### Quantiles #####################

               0.00000     0.05000      0.25000      0.50000      0.75000  \
Impression 79033.83492 83150.50378 112691.97077 119291.30077 132050.57893   
Click       1836.62986  2600.36102   3376.81902   3931.35980   4660.49791   
Purchase     311.62952   356.69540    444.62683    551.35573    699.86236   
Earning     

## **2: DEFINE THE HYPOTHESIS FOR A/B TESTING AND IMPLEMENT RELEVANT TESTS**

#### **Remember the Business Problem**:  A social media company recently introduced a new type of bidding ,average bidding, as an alternative to existing maximum bidding. One of our customers decided to test this new feature and wants to implement A/B Test in order to see whether average bidding brings more revenue than maximum bidding.  

#### In order to solve this business problem, we can check different variables or create new features to examine the difference between existing bidding sytle and new bidding style.

#### According to our dataset;   
 * Control Group represents the existing bidding style which is "maximum bidding"
 * Test Group represent the new bidding style which is "average bidding"

#### Considering the variables we see in Control Group and Test Group, let's think about below questions which can gives us insights: 
* Is there a significant difference on "Click" values between Control Group (max.bidding) and Test Group (avg.bidding)? 
* Is there a significant difference on "Click/Impression" rate between Control Group (max.bidding) and Test Group (avg.bidding)? 
* Is there a significant difference on "Purchase" values between Control Group (max.bidding) and Test Group (avg.bidding)? 
* Is there a significant difference on "Earning" values between Control Group (max.bidding) and Test Group (avg.bidding)? 
* Is there a significant difference on "Purchase/Click" rate between Control Group (max.bidding) and Test Group (avg.bidding)?
* Is there a significant difference on "Purchase/Earning" rate between Control Group (max.bidding) and Test Group (avg.bidding)?

### **2.1. Is there a significant difference on "Click" values between Control Group and Test Group?**

In [7]:
df_control["Click"].describe().T

count     40.00000
mean    5100.65737
std     1329.98550
min     2189.75316
25%     4124.30413
50%     5001.22060
75%     5923.80360
max     7959.12507
Name: Click, dtype: float64

In [8]:
df_test["Click"].describe().T

count     40.00000
mean    3967.54976
std      923.09507
min     1836.62986
25%     3376.81902
50%     3931.35980
75%     4660.49791
max     6019.69508
Name: Click, dtype: float64

#### As seen from the Descriptive Statistics of Control Group and Test Group, 
* There is a clear difference between "Click" values of Control Group (Please see min, max, 50%, mean values) and "Click" values of Test Group. 
* There is no need for testing this situation. 

#### "Click" values of Control Group (max.bidding) is higher than Test Group (avg.bidding).

#### **NOTE:** Examining only "Click" values will not give us a sufficient result. So, we will continue with other questions. 

### **2.2. Is there a significant difference on "Click/Impression" rate between Control Group and Test Group?** 

#### Click/Impression rate is important for us.  
#### Click/Impression = (number of clicks on the displayed ad) / (Number of ad views)

In [9]:
# Calculate "Click/Impression" rate for Control Group 
df_control["Click/Impression Rate"] = df_control["Click"]/df_control["Impression"]
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate
0,82529.45927,6090.07732,665.21125,2311.27714,0.07379
1,98050.45193,3382.86179,315.08489,1742.80686,0.0345
2,82696.02355,4167.96575,458.08374,1797.82745,0.0504
3,109914.4004,4910.88224,487.09077,1696.22918,0.04468
4,108457.76263,5987.65581,441.03405,1543.72018,0.05521


In [10]:
# Calculate "Click/Impression" rate for Test Group 
df_test["Click/Impression Rate"] = df_test["Click"]/df_test["Impression"]
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate
0,120103.5038,3216.54796,702.16035,1939.61124,0.02678
1,134775.94336,3635.08242,834.05429,2929.40582,0.02697
2,107806.62079,3057.14356,422.93426,2526.24488,0.02836
3,116445.27553,4650.47391,429.03353,2281.42857,0.03994
4,145082.51684,5201.38772,749.86044,2781.69752,0.03585


In [11]:
# Descriptive Statistics for "Click/Impression Rate" of Control Group
df_control["Click/Impression Rate"].describe().T

count   40.00000
mean     0.05362
std      0.02485
min      0.02076
25%      0.03922
50%      0.04880
75%      0.05799
max      0.16207
Name: Click/Impression Rate, dtype: float64

In [12]:
# Descriptive Statistics for "Click/Impression Rate" of Test Group
df_test["Click/Impression Rate"].describe().T

count   40.00000
mean     0.03418
std      0.01226
min      0.01473
25%      0.02816
50%      0.03136
75%      0.03726
max      0.07575
Name: Click/Impression Rate, dtype: float64

#### **According to the results of Descriptive Statistics of "Click/Impression" rate:** 
* There are some differences between Control Group and Test Group.
* We need to test this in order to see whether there is a significant difference. 

#### **A. Test of Normality**
#### H0: The assumption of normal distribution is provided.
#### H1: The assumption of normal distribution is not provided.

In [13]:
# Test of Normality for Control Group
test_stat, pvalue = shapiro(df_control["Click/Impression Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.8072, p-value = 0.0000


In [14]:
# Test of Normality for Test Group
test_stat, pvalue = shapiro(df_test["Click/Impression Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.8415, p-value = 0.0001


#### **According to the results of the Test of Normality:** 
* p-values of Control Group and Test Group < 0.05.
* This means that **we reject HO hypothesis**. 
* So, the assumption of normal distribution is not provided.
* As the test of normality is not provided, we will skip the variance homogeneity test and continue with mannwhitneyu non-parametric test. 

#### **B. Mannwhitneyu Non-parametric Test**
#### H0: M1 = M2 (There is no statistically significant difference between "Click/Impression" rate of Control Group and Test Group)
#### H1: M1 != M2 (There is statistically significant difference between "Click/Impression" rate of Control Group and Test Group)

In [15]:
# Mannwhitneyu Non-parametric Test 
test_stat, pvalue = mannwhitneyu(df_control["Click/Impression Rate"],
                                 df_test["Click/Impression Rate"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 1308.0000, p-value = 0.0000


#### **According to the results of Mannwhitneyu Non-parametric Test:** 
* p-value < 0.05
* This means that **we reject HO hypothesis**. 
* So, there is statistically significant difference between "Click/Impression" rate of Control Group and Test Group.
* When we consider the descriptive statistics of "Click/Impression Rate", Control Group (maximum bidding) has higher rates than Test Group.
* This gives an impression that our customer should use max. bidding instead of average bidding. Let's do not hurry up to decide and dive in other features (Purchase, Earning).

### **2.3. Is there a significant difference on "Purchase" values between Control Group and Test Group?** 

In [16]:
# Control Group
df_control["Purchase"].describe().T

count    40.00000
mean    550.89406
std     134.10820
min     267.02894
25%     470.09553
50%     531.20631
75%     637.95709
max     801.79502
Name: Purchase, dtype: float64

In [17]:
# Test Group
df_test["Purchase"].describe().T

count    40.00000
mean    582.10610
std     161.15251
min     311.62952
25%     444.62683
50%     551.35573
75%     699.86236
max     889.91046
Name: Purchase, dtype: float64

#### Descriptive Statistics for Test Group seem to be higher than Control Group. 
#### Let's check whether this difference is statistically significant or not. 

#### **C. Test of Normality**
#### H0: The assumption of normal distribution is provided.
#### H1: The assumption of normal distribution is not provided.

In [18]:
# Test of Normality for Control Group
test_stat, pvalue = shapiro(df_control["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [19]:
# Test of Normality for Test Group
test_stat, pvalue = shapiro(df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


#### **According to the results of the Test of Normality:** 
* p-values of Control Group and Test Group > 0.05.
* This means that **we can not reject HO hypothesis**. 
* So, the assumption of normal distribution is provided.
* We will continue with the test of variance homogeneity.

#### **D. Test of Variance Homogeneity**
#### H0: Variances are homogeneous.
#### H1: Variances are not homogeneous.

In [20]:
test_stat, pvalue = levene(df_control["Purchase"],df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


### **According to the results of Levene Test:** 
* p-value > 0.05.
* This means that **we can not reject HO hypothesis**.
* So, variances are homogeneous.
* Both assumptions for normality and variance homogeneity are provided. 
* In this case, we will continue with independent two-sample t-test.

#### **E. Independent Two-Sample T-Test**
#### H0: M1 = M2 (There is no statistically significant difference between the means of "Purchase" of Control Group and Test Group.)
#### H1: M1 != M2 (There is statistically significant difference between the means of "Purchase" of Control Group and Test Group)

In [21]:
test_stat, pvalue = ttest_ind(df_control["Purchase"],df_test["Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


#### **According to the results of Independent Two-Sample T-Test:** 
* p-value > 0.05.
* This means **we can not reject H0 hypothesis**.
* So, there is no statistically significant difference between the means of "Purchase" of Control Group and Test Group.

### **2.4. Is there a significant difference on "Earning" values between Control Group and Test Group?** 

In [22]:
# Control Group
df_control["Earning"].describe().T

count     40.00000
mean    1908.56830
std      302.91778
min     1253.98952
25%     1685.84720
50%     1975.16052
75%     2119.80278
max     2497.29522
Name: Earning, dtype: float64

In [23]:
# Test Group
df_test["Earning"].describe().T

count     40.00000
mean    2514.89073
std      282.73085
min     1939.61124
25%     2280.53743
50%     2544.66611
75%     2761.54540
max     3171.48971
Name: Earning, dtype: float64

#### Descriptive Statistics for Test Group seem to be higher than Control Group. 
#### Let's check whether this difference is statistically significant or not. 

#### **F. Test of Normality**
#### H0: The assumption of normal distribution is provided.
#### H1: The assumption of normal distribution is not provided.

In [24]:
# Control Group
test_stat, pvalue = shapiro(df_control["Earning"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9756, p-value = 0.5306


In [25]:
# Test Group
test_stat, pvalue = shapiro(df_test["Earning"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9780, p-value = 0.6163


#### **According to the results:** 
* p-values of Control Group and Test Group > 0.05.
* This means that **we can not reject HO hypothesis**. 
* So, the assumption of normal distribution is provided.
* We will continue with the test of variance homogeneity

#### **G. Test of Variance Homogeneity**
#### H0: Variances are homogeneous.
#### H1: Variances are not homogeneous.

In [26]:
test_stat, pvalue = levene(df_control["Earning"],df_test["Earning"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.3532, p-value = 0.5540


#### **According to the results of Levene test:** 
* p-value > 0.05.
* This means that **we can not reject HO hypothesis**.
* So, variances are homogeneous.
* Both assumptions for normality and variance homogeneity are provided. 
* In this case, we will continue with independent two-sample t-test.

#### H. Independent Two-Sample T-Test 

#### H0: M1 = M2 (There is no statistically significant difference between the means of "Earning" of Control Group and Test Group).
#### H1: M1 != M2 (There is statistically significant difference between the means of "Earning" of Control Group and Test Group).

In [27]:
test_stat, pvalue = ttest_ind(df_control["Earning"],df_test["Earning"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -9.2545, p-value = 0.0000


#### **According to the results of Independent Two-Sample T-Test:** 
* p-value < 0.05.
* This means **we reject H0 hypothesis**.
* So, there is statistically significant difference between the means of "Earning the two groups.

### **2.5. Is there a significant difference on "Purchase/Click" rate between Control Group and Test Group?** 

#### Purchase/Click rate can give us an insight.  
#### Purchase/Click = (Number of products purchased after the ads clicked) / (number of clicks on the displayed ad) 

In [28]:
# Calculate "Purchase/Click" rate for Control Group 
df_control["Purchase/Click Rate"] = df_control["Purchase"]/df_control["Click"]
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate,Purchase/Click Rate
0,82529.45927,6090.07732,665.21125,2311.27714,0.07379,0.10923
1,98050.45193,3382.86179,315.08489,1742.80686,0.0345,0.09314
2,82696.02355,4167.96575,458.08374,1797.82745,0.0504,0.10991
3,109914.4004,4910.88224,487.09077,1696.22918,0.04468,0.09919
4,108457.76263,5987.65581,441.03405,1543.72018,0.05521,0.07366


In [29]:
# Calculate "Purchase/Click" rate for Test Group 
df_test["Purchase/Click Rate"] = df_test["Purchase"]/df_control["Click"]
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate,Purchase/Click Rate
0,120103.5038,3216.54796,702.16035,1939.61124,0.02678,0.1153
1,134775.94336,3635.08242,834.05429,2929.40582,0.02697,0.24655
2,107806.62079,3057.14356,422.93426,2526.24488,0.02836,0.10147
3,116445.27553,4650.47391,429.03353,2281.42857,0.03994,0.08736
4,145082.51684,5201.38772,749.86044,2781.69752,0.03585,0.12523


In [30]:
# Descriptive Statistics of "Purchase/Click Rate" for Control Group
df_control["Purchase/Click Rate"].describe().T

count   40.00000
mean     0.11593
std      0.04542
min      0.04040
25%      0.08525
50%      0.10957
75%      0.14482
max      0.30436
Name: Purchase/Click Rate, dtype: float64

In [31]:
# Descriptive Statistics of "Purchase/Click Rate" for Test Group
df_test["Purchase/Click Rate"].describe().T

count   40.00000
mean     0.12297
std      0.04997
min      0.04456
25%      0.08984
50%      0.11542
75%      0.14364
max      0.24655
Name: Purchase/Click Rate, dtype: float64

#### **According to the results of Descriptive Statistics of "Purchase/Click" rate:** 
* There are slight differences between Control Group and Test Group.
* We need to test this in order to see whether there is a significant difference. 

#### **I. Test of Normality**
#### H0: The assumption of normal distribution is provided.
#### H1: The assumption of normal distribution is not provided.

In [32]:
# Test of Normality for Control Group (Purchase/Click)
test_stat, pvalue = shapiro(df_control["Purchase/Click Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.8720, p-value = 0.0003


In [33]:
# Test of Normality for Test Group (Purchase/Click)
test_stat, pvalue = shapiro(df_test["Purchase/Click Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9299, p-value = 0.0160


#### **According to the results of the Test of Normality (Purchase/Click):** 
* p-values of Control Group and Test Group < 0.05.
* This means that **we reject HO hypothesis**. 
* So, the assumption of normal distribution is not provided.
* As the test of normality is not provided, we will skip the variance homogeneity test and continue with mannwhitneyu non-parametric test. 

#### **J. Mannwhitneyu Non-parametric Test**
#### H0: M1 = M2 (There is no statistically significant difference between "Purchase/Click" rate of Control Group and Test Group)
#### H1: M1 != M2 (There is statistically significant difference between "Purchase/Click" rate of Control Group and Test Group)

In [34]:
# Mannwhitneyu Non-parametric Test (Purchase/Click)
test_stat, pvalue = mannwhitneyu(df_control["Purchase/Click Rate"],
                                 df_test["Purchase/Click Rate"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 752.0000, p-value = 0.6476


#### **According to the results of Mannwhitneyu Non-parametric Test (Purchase/Click):** 
* p-value > 0.05
* This means that **we can not reject HO hypothesis**. 
* So, there is no statistically significant difference between "Click/Impression" rate of Control Group and Test Group.

### **2.6. Is there a significant difference on "Purchase/Earning" rate between Control Group and Test Group?** 

#### Purchase/Earning rate can give us an insight.  
#### Purchase/Earning = (Number of products purchased after the ads clicked) / (Earnings after purchased products) 

In [35]:
# Calculate "Purchase/Earning" rate for Control Group 
df_control["Purchase/Earning Rate"] = df_control["Purchase"]/df_control["Earning"]
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate,Purchase/Click Rate,Purchase/Earning Rate
0,82529.45927,6090.07732,665.21125,2311.27714,0.07379,0.10923,0.28781
1,98050.45193,3382.86179,315.08489,1742.80686,0.0345,0.09314,0.18079
2,82696.02355,4167.96575,458.08374,1797.82745,0.0504,0.10991,0.2548
3,109914.4004,4910.88224,487.09077,1696.22918,0.04468,0.09919,0.28716
4,108457.76263,5987.65581,441.03405,1543.72018,0.05521,0.07366,0.2857


In [36]:
# Calculate "Purchase/Earning" rate for Test Group 
df_test["Purchase/Earning Rate"] = df_test["Purchase"]/df_control["Earning"]
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning,Click/Impression Rate,Purchase/Click Rate,Purchase/Earning Rate
0,120103.5038,3216.54796,702.16035,1939.61124,0.02678,0.1153,0.3038
1,134775.94336,3635.08242,834.05429,2929.40582,0.02697,0.24655,0.47857
2,107806.62079,3057.14356,422.93426,2526.24488,0.02836,0.10147,0.23525
3,116445.27553,4650.47391,429.03353,2281.42857,0.03994,0.08736,0.25293
4,145082.51684,5201.38772,749.86044,2781.69752,0.03585,0.12523,0.48575


In [37]:
# Descriptive Statistics of "Purchase/Earning Rate" for Control Group
df_control["Purchase/Earning Rate"].describe().T

count   40.00000
mean     0.29602
std      0.08922
min      0.14903
25%      0.24493
50%      0.28643
75%      0.33707
max      0.54754
Name: Purchase/Earning Rate, dtype: float64

In [38]:
# Descriptive Statistics of "Purchase/Earning Rate" for Test Group
df_test["Purchase/Earning Rate"].describe().T

count   40.00000
mean     0.31291
std      0.09858
min      0.16927
25%      0.23061
50%      0.31323
75%      0.40100
max      0.48575
Name: Purchase/Earning Rate, dtype: float64

#### **According to the results of Descriptive Statistics of "Purchase/Earning" rate:** 
* There are differences between Control Group and Test Group.
* We need to test this in order to see whether there is a significant difference. 

#### **K. Test of Normality**
#### H0: The assumption of normal distribution is provided.
#### H1: The assumption of normal distribution is not provided.

In [39]:
# Test of Normality for Control Group (Purchase/Earning)
test_stat, pvalue = shapiro(df_control["Purchase/Earning Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9547, p-value = 0.1099


In [40]:
# Test of Normality for Test Group (Purchase/Earning)
test_stat, pvalue = shapiro(df_test["Purchase/Earning Rate"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9428, p-value = 0.0430


#### **According to the results of the Test of Normality (Purchase/Earning):** 
* p-value of Control Group > 0.05. This means that we can not reject HO hypothesis. 
* p-value of Test Group < 0.05. This means that we reject HO hypothesis.
* So, the assumption of normal distribution is not provided (Both results should be the same(rejected/not-rejected)).
* As the test of normality is not provided, we will skip the variance homogeneity test and continue with mannwhitneyu non-parametric test. 

#### **L. Mannwhitneyu Non-parametric Test**
#### H0: M1 = M2 (There is no statistically significant difference between "Purchase/Earning" rate of Control Group and Test Group)
#### H1: M1 != M2 (There is statistically significant difference between "Purchase/Earning" rate of Control Group and Test Group)

In [41]:
# Mannwhitneyu Non-parametric Test (Purchase/Earning)
test_stat, pvalue = mannwhitneyu(df_control["Purchase/Earning Rate"],
                                 df_test["Purchase/Earning Rate"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 719.0000, p-value = 0.4386


#### **According to the results of Mannwhitneyu Non-parametric Test (Purchase/Earning):** 
* p-value > 0.05
* This means that **we can not reject HO hypothesis**. 
* So, there is no statistically significant difference between "Click/Impression" rate of Control Group and Test Group.

### **3. SUMMARY OF THE TEST RESULTS**
### In this project I asked 6 questions and implement A/B Testing for each of them. Let's the see results:

#### 1. Is there a significant difference on "Click" values between Control Group (max.bidding) and Test Group (avg.bidding)? 
####      *- YES. It was so clear from the descriptive statistics that we didn't need to test this.*   
#### 2. Is there a significant difference on "Click/Impression" rate between Control Group (max.bidding) and Test Group (avg.bidding)? 
####      *- YES. (Mannwhitneyu Test) Control Group has higher values.* 
#### 3. Is there a significant difference between the means of "Purchase" of Control Group (max.bidding) and Test Group (avg.bidding)? 
####      *- NO. (Independent Two-Sample T-Test)*
#### 4. Is there a significant difference on "Earning" values between Control Group (max.bidding) and Test Group (avg.bidding)? 
####      *- YES. (Independent Two-Sample T-Test) Test Group has higher values.*
#### 5. Is there a significant difference on "Purchase/Click" rate between Control Group (max.bidding) and Test Group (avg.bidding)?
####      *- NO. (Mannwhitneyu Test)*
#### 6. Is there a significant difference on "Purchase/Earning" rate between Control Group (max.bidding) and Test Group (avg.bidding)?
####      *- NO. (Mannwhitneyu Test)*

#### (We have 3 "YES" and 3 "NO". It is difficult to decide :))

### **4. RECOMMENDATIONS TO THE CUSTOMER**

#### The aim of this project was to gain insight in order to decide which bidding style to use for more revenue. 
#### * When we consider "Click", "Click/Impression" and "Earning" values, there is a significant difference between maximum bidding and average bidding. 
#### * However, when we consider "Purchase", "Purchase/Click" and "Purchase/Earning" values, there is no significant difference between maximum bidding and average bidding. 
#### * As our aim is to increase our revenues, "Purchase", "Purchase/Click" and "Purchase/Earning" are more important for us rather than "Click", "Click/Impression" and "Earning".  
#### * As a consequence, there is no need to change the bidding style for now. It is better to keep in our mind that more observation can give more insight. So, I suggest that we analyse the situation 3 months later and re-evalute the results accordingly. 

# 
