## Modeling

### Z Test for the sample that clicked on yes or no/noresponse

In this section, we are going to check the p value of our z test and see if we reject or accept the null hypthoesi. All models are using a 95% confidence interval. 

Null hypothesis: There is no significant difference between the ad success rate of both groups

Alt hypothesis: There is significant difference between the ad success of both groups



In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import plotly
import numpy as np

from scipy import stats

In [2]:
abdata = pd.read_csv('../Data/abtesting_cleaned.csv')

outcome_table = pd.DataFrame(columns =['Test', 'P value', 'Results'] )
pd.set_option("display.max_colwidth", None)

In [3]:
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

control_results = abdata[abdata['experiment'] == 'control']['success']
exp_results = abdata[abdata['experiment'] == 'exposed']['success']

n_con = control_results.count()
n_exp = exp_results.count()

successes = [control_results.sum(), exp_results.sum()]
nobs = [n_con, n_exp]
print(nobs)
print(successes)

z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')

outcome_table = outcome_table.append({'Test' : 'Yes vs no/no response', 'P value': pval, 'Results': 
                      'Reject the null hypothesis. There is a significant difference in ad success between the control group and the exposed group.'}, ignore_index = True)

[4071, 4006]
[264, 308]
p-value: 0.035
ci 95% for control group: [0.057, 0.072]
ci 95% for treatment group: [0.069, 0.085]


since our p value is less than 0.05, then we reject the null hypothesis. There is a significant difference in ad success between the control group with the dummy ad and the exposed group with the new ad. 

That means that our exposed group were more likely to click yes after viewing the new ad. Our 95% confidence intervals shows that the true mean of ad success rate is higher for the treatment group by around 12% 

## Z Test for the sample that clicked on yes or no at the end of the ad

Null hypothesis: There is no significant difference between the questionnaire engagement rate of both groups

Alt hypothesis: There is significant difference between the questionnaire engagement rate of both groups

In [4]:
control_results = abdata.loc[((abdata['experiment'] == 'control') & (abdata['no_response'] == 0))][['success','no_response']]
exp_results = abdata[((abdata['experiment'] == 'exposed')& (abdata['no_response'] == 0))][['success','no_response']]

In [5]:
n_con = len(control_results)
n_exp = len(exp_results)
nobs = [n_con, n_exp]

In [6]:
successes = [control_results['success'].sum(), exp_results['success'].sum()]
nobs = [n_con, n_exp]
print(nobs)
print(successes)

[586, 657]
[264, 308]


In [7]:
z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')
outcome_table = outcome_table.append({'Test':'Yes vs no', 'P value' : pval,'Results' : 
                      'Fail to reject the null hypothesis. There is no significant difference in questionnaire engagement between the control group and the exposed group.'},ignore_index = True)

p-value: 0.518
ci 95% for control group: [0.410, 0.491]
ci 95% for treatment group: [0.431, 0.507]


since our p value is greater than 0.05, then we fail to reject the null hypothesis. There is no significant difference in questionnaire engagement between the control group with the dummy ad and the exposed group with the new ad.

That means that when filtering out the no response answers. The proportion of users that clicked on yes or no is about the same in both groups. This could be due to the low sample size of users that clicked on yes or no. We only have 586 observations of users that responsed with a yes or no in the control group and 657 in the exposed group 

## Z test for No Responses 

Null hypothesis: There is no significant difference between ad disengagement rate between groups

Alt hypothesis: There is significant difference between ad disengagement rate between groups

In [8]:
control_results = abdata.loc[((abdata['experiment'] == 'control'))][['success','no_response']]
exp_results = abdata[((abdata['experiment'] == 'exposed'))][['success','no_response']]

In [9]:
n_con = len(control_results)
n_exp = len(exp_results)
nobs = [n_con, n_exp]

successes = [control_results['no_response'].sum(), exp_results['no_response'].sum()]
print(nobs)
print(successes)

[4071, 4006]
[3485, 3349]


In [10]:
z_stat, pval = proportions_ztest(successes, nobs=nobs)
(lower_con, lower_exp), (upper_con, upper_exp) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'p-value: {pval:.3f}')
print(f'ci 95% for control group: [{lower_con:.3f}, {upper_con:.3f}]')
print(f'ci 95% for treatment group: [{lower_exp:.3f}, {upper_exp:.3f}]')
outcome_table = outcome_table.append({'Test': 'no response vs response','P value' : pval, 'Results' : 
                      'Reject the null hypothesis. There is a significant difference in questionnaire disengagement between the control group and the exposed group.'}, ignore_index = True)

p-value: 0.012
ci 95% for control group: [0.845, 0.867]
ci 95% for treatment group: [0.825, 0.847]


since our p value is less than 0.05, then we reject the null hypothesis. There is a significant difference in questionnaire disengagement rate the control group with the dummy ad and the exposed group with the new ad.

That means that our exposed group were less likely to click away from the ad and answer either yes or no at the end of the ad. The difference is slight, but our 95% confidence intervals shows that the true mean of customer disengagement rate is lower for the treatment group by around 2% 

## Outcome Table

In [12]:
outcome_table.set_index('Test')

Unnamed: 0_level_0,P value,Results
Test,Unnamed: 1_level_1,Unnamed: 2_level_1
Yes vs no/no response,0.035006,Reject the null hypothesis. There is a significant difference in ad success between the control group and the exposed group.
Yes vs no,0.518486,Fail to reject the null hypothesis. There is no significant difference in questionnaire engagement between the control group and the exposed group.
no response vs response,0.012495,Reject the null hypothesis. There is a significant difference in questionnaire disengagement between the control group and the exposed group.


## Insights and Recommendations

Insights
* When looking at the different statistical models, we ran the model on different metrics of 'success'
* The test for yes vs no/response shows that the differences between control and exposed group was significant. The new ad had an improved impact on customers engaging in the questionnaire 
* The data shows the ad success rate of the control group to be 6.48% and the exposed group to be 7.69%
* The test for only Yes vs No responses shows that the differences were not significant. 
* The test for no response rate shows that there is a significant difference between the control and exposed group. The exposed group has a lower rate of no response compared to the control group. 
* The data shows the ad no response rate of the control group to be 85.61% and the exposed group to be 83.60%


Recommendations
* The data suggests that when looking at yes responses vs no/no response and no response rate, the new ad has more people engaging with the questionnaire. 
* In the Yes vs No model, however we saw no statistical significance between the groups 
* That would mean that the new ad is better at converting customers that normally would not respond
* Therefore, I would recommend the business use the new ad as it has shown to increase customer engagement with the questionnaire 
