![](https://www.travelpayouts.com/blog/wp-content/uploads/2018/11/ab-testing.jpg)
image from [link](https://www.travelpayouts.com/blog/a-b-and-split-tests/)

## AB Testing

It is one of the most commonly used tests in the field of data science

* Let A represent a feature or group.
* Let B represent another feature or group.

The topic of interest is whether there is a difference between A and B.

Hypothesis Testing: It is a statistical method used to test a belief or claim. The main purpose in group comparisons in AB tests in hypothesis testing is to try to demonstrate whether possible differences arise by chance or not.

The Two Sample Proportion Test that we will be interested in this project compares the proportions of two groups.

Hypothesis:

* H0: p1 = p2

* H1: p1 != p2

Depending on the resulting p-value;

* If p-value < 0.05, H0 reject

So, there is a significant difference between the groups in terms of the ratios.

* If p-value > 0.05, H0 cannot be rejected 

So, there isn't a significant difference between the groups in terms of the ratios.


### Data Details:

Index: Row index

user id: User ID (unique)

test group: If "ad" the person saw the advertisement, if "psa" they only saw the public service announcement

converted: If a person bought the product then True, else is False

total ads: Amount of ads seen by person

most ads day: Day that the person saw the biggest amount of ads

most ads hour: Hour of day that the person saw the biggest amount of ads


In [2]:
import numpy as np
import pandas as pd
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

df = pd.read_csv("/kaggle/input/marketing-ab-testing/marketing_AB.csv")

In [3]:
# We are trying to understand the data.

def check_df(dataframe, head=7):
    print("################### Shape ####################")
    print(dataframe.shape)
    print("#################### Info #####################")
    print(dataframe.info())
    print("################### Nunique ###################")
    print(dataframe.nunique())
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("################## Quantiles #################")
    print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)
    print("#################### Head ####################")
    print(dataframe.head(head))

check_df(df)


################### Shape ####################
(588101, 7)
#################### Info #####################
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 588101 entries, 0 to 588100
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   Unnamed: 0     588101 non-null  int64 
 1   user id        588101 non-null  int64 
 2   test group     588101 non-null  object
 3   converted      588101 non-null  bool  
 4   total ads      588101 non-null  int64 
 5   most ads day   588101 non-null  object
 6   most ads hour  588101 non-null  int64 
dtypes: bool(1), int64(4), object(2)
memory usage: 27.5+ MB
None
################### Nunique ###################
Unnamed: 0       588101
user id          588101
test group            2
converted             2
total ads           807
most ads day          7
most ads hour        24
dtype: int64
##################### NA #####################
Unnamed: 0       0
user id          0
test gro

In [4]:
# Data Preparation

# We are deleting the variable that does not carry any information.
df.drop("Unnamed: 0", inplace=True, axis=1)

# We convert the true/false values to 1 and 0.
df["converted"] = np.where(df["converted"]==False, 0, 1)


df.head()

Unnamed: 0,user id,test group,converted,total ads,most ads day,most ads hour
0,1069124,ad,0,130,Monday,20
1,1119715,ad,0,93,Tuesday,22
2,1144181,ad,0,21,Tuesday,18
3,1435133,ad,0,355,Tuesday,10
4,1015700,ad,0,276,Friday,14


In [5]:
# We are looking at the mean purchase values of those who saw the advertisement and those who didn't.

df.groupby("test group")["converted"].mean()

test group
ad    0.02555
psa   0.01785
Name: converted, dtype: float64

In [6]:
# We are summing the purchase values separately for those who saw the ad and those who didn't see the ad.
# We assign these to new variables.

ad_converted_count = df.loc[df["test group"] == "ad", "converted"].sum()
psa_converted_count = df.loc[df["test group"] == "psa", "converted"].sum()


# We are calculating the p-value to determine the effect of seeing, 
# the advertisement on the purchase for those who saw it versus those who didn't see it.

test_stat, pvalue = proportions_ztest(count=[ad_converted_count, psa_converted_count],
                                      nobs=[df.loc[df["test group"] == "ad", "converted"].shape[0],
                                            df.loc[df["test group"] == "psa", "converted"].shape[0]])

# count = success count
# nobs = the total number of observations
# Thus, we obtain the ratio.


print('Test Stat = %.4f, p-value = %.4f' % (test_stat, pvalue))


Test Stat = 7.3701, p-value = 0.0000


#### Since p < 0.05, H0 is rejected, meaning there is a statistically significant difference between the two groups.

#### So the advertisement has an effect on the purchase, the advertisement is successful.