<a href="https://www.kaggle.com/code/gizemnalbantarslan/a-b-testing?scriptVersionId=184697186" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

A firm recently introduced a new bid type, average bidding, as an alternative to the existing bidding type called "maximum bidding". They want to do an A / B test to see if the new product converts more than maximum bidding. The ultimate success criterion of the firm is Purchase. For this reason, the following study was conducted by focusing on the Purchase metric for statistical tests.

**THE DATA STORY**

In this data set, which includes the website information of a company, there is information such as the number of advertisements that users see and click, as well as earnings information from here. There are two separate data sets, the control and test groups. These datasets are in separate sheets for ab testing.xlsx excel. Maximum Bidding was applied to the control group and Average Bidding was applied to the test group.**

* **Impression** - View of advertisements
* **Click** - Number of clicks on the displayed advertisement
* **Purchase** - Number of products purchased after ads clicked
* **Earning** - Earnings after purchased products

In [1]:
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from scipy.stats import ttest_1samp, shapiro, levene, ttest_ind, mannwhitneyu, \
    pearsonr, spearmanr, kendalltau, f_oneway, kruskal
from statsmodels.stats.proportion import proportions_ztest

In [2]:
df = pd.read_excel("../input/ab-testing/ab_testing.xlsx")
df.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.459271,6090.077317,665.211255,2311.277143
1,98050.451926,3382.861786,315.084895,1742.806855
2,82696.023549,4167.96575,458.083738,1797.827447
3,109914.400398,4910.88224,487.090773,1696.229178
4,108457.76263,5987.655811,441.03405,1543.720179


# PREPARING AND ANALYZING DATA

In [3]:
control_group = pd.read_excel("../input/ab-testing/ab_testing.xlsx",sheet_name="Control Group")
control_group.head()
control_group.describe().T
control_group.info
control_group.shape
control_group.isnull().sum()

Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64

In [4]:
test_group = pd.read_excel("../input/ab-testing/ab_testing.xlsx",sheet_name="Test Group")
test_group.head()
test_group.describe().T
test_group.info
test_group.shape
test_group.isnull().sum()

Impression    0
Click         0
Purchase      0
Earning       0
dtype: int64

In [5]:
df_=pd.concat([control_group, test_group])
df_.head()

Unnamed: 0,Impression,Click,Purchase,Earning
0,82529.459271,6090.077317,665.211255,2311.277143
1,98050.451926,3382.861786,315.084895,1742.806855
2,82696.023549,4167.96575,458.083738,1797.827447
3,109914.400398,4910.88224,487.090773,1696.229178
4,108457.76263,5987.655811,441.03405,1543.720179


# Defining the A/B Test Hypothesis

* H0: M1 == M2
* H1: M1 !=  M2

In [6]:
control_group["Purchase"].mean()

550.8940587702316

In [7]:
test_group["Purchase"].mean()

582.1060966484677

**There is a difference between the two averages. Did this difference occur by chance?**

In [8]:
test_stat, pvalue = shapiro(control_group["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9773, p-value = 0.5891


In [9]:
test_stat, pvalue = shapiro(test_group["Purchase"])
print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 0.9589, p-value = 0.1541


****Both "p_value" values are greater than 0.05.Therefore, the assumption of normality cannot be rejected.****

In [10]:
test_stat, pvalue = levene(control_group["Purchase"],
                           test_group["Purchase"])

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = 2.6393, p-value = 0.1083


****"p_value" value is greater than 0.05.Therefore, the homogeneity assumption cannot be rejected.****

**Parametric test is used because the assumptions of normality and homogeneity cannot be rejected.**

In [11]:
test_stat, pvalue = ttest_ind(control_group["Purchase"],
                              test_group["Purchase"],
                              equal_var=True)

print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))

Test Stat = -0.9416, p-value = 0.3493


****"p_value" value is greater than 0.05.Therefore, HO hypothesis cannot be rejected.****

**At first, when we looked at the averages of the test and control groups, we could see a difference between them.
**However, when we did and analyzed the hypothesis tests, we could not reject the mean equality hypothesis. This shows us that the differences in the mean may be coincidental.
**Moving to the new feature may not bring us a meaningful transformation.**