**# A/B Test on Ads**

An A/B test will be conducted on the following dataset I pulled from Kaggle. 

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import matplotlib.pyplot as plt

df = pd.read_csv('../input/ad-ab-testing/AdSmartABdata - AdSmartABdata.csv')
print(df.head(5))
print(df.dtypes)

 It looks like the A/B test will be performed on the 'exposed' v. the 'control' versions of the experiment. I organized the data into a matrix for further insight.

In [None]:
print(df.groupby('experiment')[['yes', 'no']].count())
print(df.groupby('experiment').count()[['yes', 'no']].sum())

print(df.groupby('experiment')[['yes', 'no']].sum())
cats = df.groupby(['experiment', 'yes', 'no']).size()
ax = cats.plot.bar(rot=60)

The dataset distinguishes between whether the 'control' and 'exposed' version of the experiment had the user click either yes or no.

In [None]:
dog = df.groupby('experiment')[['yes', 'no']].sum()
ax = dog.plot.bar(rot=60)
plt.show()

For simplicity, I'll only consider the level of engagement with the different ads. Engagement will consist of seeing and click the on the ad (whether yes or no) and non-engagement (not clicking on the ad). 

In [None]:
df['experiment'].value_counts().plot.pie(subplots=True, y='ad experiment', figsize=(5,5))

Although there's more factors to consider (hour of ad engagement, platform on which ad was engaged, etc.) I'll only be running a significance (chi-squared) test on the engagement between the "control" and "exposed" versions of the ads. That is, I'll consider both "yes" and "no" as engagement and sum both to measure the level of engagement.

First, I need to clean the data. Luckily, this data is already well suited for analysis. Nevertheless, I'll need to for duplicates in the auction_id to remove any double counting.

In [None]:
df[df.duplicated(['auction_id'], keep=False,)]['auction_id']
print('The number of duplicate enteries is ' + str(df['auction_id'].duplicated().sum()))

Luckily, there weren't any duplicates in the data. I can now move onto the analysis. I'll select a standard p = .05 significance threshold to determine if there is a differnece in the two ads. The test is as follows:

h0 (null hypothesis): There is no significant difference in the engagement between both ads 

h1 (alternative hypothesis): Fail to reject that there is no significance in the engagement between both ads.

First, I'll make 2x2 matrix to insert into the chi-square test. I'll call the matrix F.

In [None]:
yes_cont = df.loc[df['experiment'] == 'control', 'yes'].sum()
no_cont = df.loc[df['experiment'] == 'control', 'no'].sum()
df_cont_clicked = yes_cont + no_cont


yes_exp = df.loc[df['experiment'] == 'exposed', 'yes'].sum()
no_exp = df.loc[df['experiment'] == 'exposed', 'no'].sum()
df_exp_clicked = yes_exp + no_exp

df_exp = df.experiment
df_cont_total = df_exp.loc[df['experiment'] == 'control'].count()
df_exp_total = df_exp.loc[df['experiment'] == 'exposed'].count()


F = np.array([[df_cont_clicked, df_cont_total], [df_exp_clicked, df_exp_total]], dtype=object)
print(F)

Finally, the Chi-Squared test.

In [None]:
chi2, pval, dof, expected = chi2_contingency(F)
adj_pval = round(pval, 2)
print(adj_pval)
print('After running the Chi-Squared test, the test determined that the pval is: ' + str(adj_pval))

Because our adjusted p-value is less than our significance threshold, .03 < .05, we conclude that we fail to reject that there is no significance in the engagement between both ads.