# Business Problem

* Facebook recently introduced a new type of bidding, **_average bidding_**, as an alternative to the existing type of bidding called **_maximum bidding_**.
* One of our customers has decided to test this new feature and wants to do an A/B test to see if average bidding brings more than maximum bidding.
* The A/B test has been going on for 1 month.
* Now, our customer is waiting for you to analyze the results of this A/B test.

**P.S.** The ultimate measure of success for our customer is **Purchase**. Therefore, the focus should be on the **Purchase** metric for statistical tests.

## Dataset Story
* This data set, which includes a company's website information, contains information such as the number of ads that users see and click on, as well as earnings information from here. 
* There are two separate data sets: the **_Control_** and the **_Test_** group. 
* Maximum Bidding was applied to the control group and Average Bidding was applied to the test group.

## Variables
* **_Impression_**: Number of ad views
* **_Click_**: Number of clicks on the displayed ad
* **_Purchase_**: Number of products purchased after clicked ads
* **_Earning_**: Earnings after purchased products

In [2]:
# Importing necessary libraries and cosmetic settings
!pip install statsmodels
import pandas as pd
import math
import scipy.stats as st
from sklearn.preprocessing import MinMaxScaler
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.float_format', lambda x: '%.5f' % x)



In [3]:
# Reading, Preparing and Analyzing Data
df1 = pd.read_excel("ab_testing.xlsx", sheet_name="Control Group")
df2 = pd.read_excel("ab_testing.xlsx", sheet_name="Test Group")
df_control = df1.copy()
df_test = df2.copy()

In [4]:
df_control.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.45927,6090.07732,665.21125,2311.27714
1,98050.45193,3382.86179,315.08489,1742.80686
2,82696.02355,4167.96575,458.08374,1797.82745
3,109914.4004,4910.88224,487.09077,1696.22918
4,108457.76263,5987.65581,441.03405,1543.72018


In [5]:
df_test.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,120103.5038,3216.54796,702.16035,1939.61124
1,134775.94336,3635.08242,834.05429,2929.40582
2,107806.62079,3057.14356,422.93426,2526.24488
3,116445.27553,4650.47391,429.03353,2281.42857
4,145082.51684,5201.38772,749.86044,2781.69752


In [6]:
df_control.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,101711.44907,20302.15786,45475.94296,85726.69035,99790.70108,115212.81654,147539.33633
Click,40.0,5100.65737,1329.9855,2189.75316,4124.30413,5001.2206,5923.8036,7959.12507
Purchase,40.0,550.89406,134.1082,267.02894,470.09553,531.20631,637.95709,801.79502
Earning,40.0,1908.5683,302.91778,1253.98952,1685.8472,1975.16052,2119.80278,2497.29522


In [7]:
df_test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Impression,40.0,120512.41176,18807.44871,79033.83492,112691.97077,119291.30077,132050.57893,158605.92048
Click,40.0,3967.54976,923.09507,1836.62986,3376.81902,3931.3598,4660.49791,6019.69508
Purchase,40.0,582.1061,161.15251,311.62952,444.62683,551.35573,699.86236,889.91046
Earning,40.0,2514.89073,282.73085,1939.61124,2280.53743,2544.66611,2761.5454,3171.48971


In [8]:
df_control["Purchase"].mean()

550.8940587702316

In [9]:
df_test["Purchase"].mean()

582.1060966484675

### Identifying the Hypothesis of the A/B Test
* $H_0$ : $\mu_1 = \mu_2$ 

_The difference between the purchase average of the Maximum Bidding group and the purchase average of the average Bidding group is not statistically significant._

* $H_1$ : $\mu_1 \neq \mu_2$ 

_The difference between ... is statistically significant._

#### Normality Assumption for Control Group
* $H_0$: _The Purchase of control group has a normal distribution._
* $H_1$: _The Purchase of control group has no normal distribution._

In [11]:
test_stat, pvalue = shapiro(df_control["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


**Since p-value = 0.5891 > 0.05, we can not reject $H_0$**

#### Normality Assumption for Test Group
* $H_0$: _The Purchase of test group has a normal distribution._
* $H_1$: _The Purchase of test group has no normal distribution._

In [12]:
test_stat, pvalue = shapiro(df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


**Since p-value = 0.1541 > 0.05, we can not reject $H_0$.**

#### Homogeneity of Variance Assumption

* $H_0$: The Purchases of control and test groups have a similar distribution
* $H_1$: The Purchases of control and test groups have no similar distribution

In [13]:
test_stat, pvalue = levene(df_control["Purchase"],df_test["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


**Since p-value = 0.1083 > 0.05, we can not reject $H_0$.**

In [14]:
# Hypothesis Testing
test_stat, pvalue = ttest_ind(df_control["Purchase"], df_test["Purchase"], equal_var=True)
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


**Since p-value = 0.3493 > 0.05, we can not reject $H_0$.**

**Hence, the difference between the purchase averages of the maximum bidding and average bidding groups is not statistically significant.**