## Modeling

In this section, we are going to check the p value of our z test and see if we reject or accept the null hypthoesis 

Null hypothesis: There is no significant difference between the ad success rate of both groups

Alt hypothesis: There is significant difference between the ad success of both groups



In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import plotly
import numpy as np

from scipy import stats

In [3]:
abdata = pd.read_csv('../Data/abtesting_cleaned.csv')

In [82]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

control_results = abdata[abdata['experiment'] == 'control']['success']
exp_results = abdata[abdata['experiment'] == 'exposed']['success']

n_con = control_results.count()
n_exp = exp_results.count()

successes = [control_results.sum(), exp_results.sum()]
nobs = [n_con, n_exp]
print(nobs)
print(successes)

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')

[4071, 4006]
[264, 308]
z statistic: -2.11
p-value: 0.035
ci 95% for control group: [0.057, 0.072]
ci 95% for treatment group: [0.069, 0.085]


since our p value is less than 0.05, then we reject the null hypothesis. There is a significant difference in ad success between the control group with the dummy ad and the exposed group with the new ad. 

That means that our exposed group were more likely to click yes after viewing the new ad. Our 95% confidence intervals shows that the true mean of ad success rate is higher for the treatment group by around 12% 

# Z Test for the sample that clicked on yes or no at the end of the ad

Null hypothesis: There is no significant difference between the questionnaire engagement rate of both groups

Alt hypothesis: There is significant difference between the questionnaire engagement rate of both groups

In [85]:
control_results = abdata.loc[((abdata['experiment'] == 'control') & (abdata['no_response'] == 0))][['success','no_response']]
exp_results = abdata[((abdata['experiment'] == 'exposed')& (abdata['no_response'] == 0))][['success','no_response']]

In [86]:
exp_results.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 657 entries, 2 to 8071
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   success      657 non-null    int64
 1   no_response  657 non-null    int64
dtypes: int64(2)
memory usage: 15.4 KB


In [87]:
n_con = len(control_results)
n_exp = len(exp_results)
nobs = [n_con, n_exp]

In [89]:
successes = [control_results['success'].sum(), exp_results['success'].sum()]
nobs = [n_con, n_exp]
print(nobs)
print(successes)

[586, 657]
[264, 308]


In [90]:
z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')

z statistic: -0.65
p-value: 0.518
ci 95% for control group: [0.410, 0.491]
ci 95% for treatment group: [0.431, 0.507]


since our p value is greater than 0.05, then we fail to reject the null hypothesis. There is no significant difference in questionnaire engagement between the control group with the dummy ad and the exposed group with the new ad.

That means that when filtering out the no response answers. The proportion of users that clicked on yes or no is about the same in both groups. This could be due to the low sample size of users that clicked on yes or no. We only have 586 observations of users that responsed with a yes or no in the control group and 657 in the exposed group 

## Z test for No Responses 

Null hypothesis: There is no significant difference between ad disengagement rate between groups

Alt hypothesis: There is significant difference between ad disengagement rate between groups

In [76]:
control_results = abdata.loc[((abdata['experiment'] == 'control'))][['success','no_response']]
exp_results = abdata[((abdata['experiment'] == 'exposed'))][['success','no_response']]

In [79]:
n_con = len(control_results)
n_exp = len(exp_results)
nobs = [n_con, n_exp]

successes = [control_results['no_response'].sum(), exp_results['no_response'].sum()]
print(nobs)
print(successes)

[4071, 4006]
[3485, 3349]


In [80]:
z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')

z statistic: 2.50
p-value: 0.012
ci 95% for control group: [0.845, 0.867]
ci 95% for treatment group: [0.825, 0.847]


since our p value is less than 0.05, then we reject the null hypothesis. There is a significant difference in questionnaire disengagement rate the control group with the dummy ad and the exposed group with the new ad.

That means that our exposed group were less likely to click away from the ad and answer either yes or no at the end of the ad. The difference is slight, but our 95% confidence intervals shows that the true mean of customer disengagement rate is lower for the treatment group by around 2% 