# Hypothesis Testing

We conduct a z test on proportions, and t test on continous variables.

## Completion Rate

Null Hypothesis: The completion rate for the Test group (new design) is equal to the completion rate for the Control group (old design).

p_test = p_control

In [44]:
import numpy as np
import statsmodels.api as sm
import pandas as pd
import scipy.stats as st

In [12]:
# Read time spent csv files
time_df = pd.read_csv('../data/clean/time_spent.csv')
time_df.head()

Unnamed: 0.1,Unnamed: 0,client_id,visitor_id,visit_id,from_step,to_step,time_spent,is_error
0,0,169,201385055_71273495308,749567106_99161211863_557568,start,step_1,0 days 00:00:09,False
1,1,169,201385055_71273495308,749567106_99161211863_557568,step_1,step_2,0 days 00:00:46,False
2,2,169,201385055_71273495308,749567106_99161211863_557568,step_2,step_3,0 days 00:01:34,False
3,3,169,201385055_71273495308,749567106_99161211863_557568,step_3,confirm,0 days 00:01:04,False
4,4,546,475037402_89828530214,731811517_9330176838_94847,start,step_1,0 days 00:00:10,False


In [25]:
completed = np.array([18280, 15205]) # test, control
total = np.array([28570, 23793]) # test, control

completion_rate = completed / total
completion_rate

array([0.63983199, 0.6390535 ])

In [26]:
# perform Two-proportion Z test package
z_stat, p_value = sm.stats.proportions_ztest(count, nobs)

print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")

alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in completion rates.")
else:
    print("Fail to reject the null hypothesis: No significant difference in completion rates.")

Z-statistic: 0.1526
P-value: 0.8787
Fail to reject the null hypothesis: No significant difference in completion rates.


# Completion Rate with a Cost-Effectiveness Threshold

Null Hypothesis: The completion rate for the test group is equal to or greater than the completion rate for the control group increased by 5%.

p_test >= p_control * 1.05

Alternative Hypothesis: The completion rate for the test group is lower than the completion rate for the Control group increased by 5%.

p_test < p_control * 1.05

In [28]:
completed = np.array([18280, 15205]) # test, control
total = np.array([28570, 23793]) # test, control

completed[1] = round(completed[1] * 1.05)
completion_rate = completed / total
completion_rate

array([0.63983199, 0.67099567])

In [37]:
# Perform the two-proportion z-test
z_stat, p_value = proportions_ztest(completed, total, alternative='smaller')

# Print results
print(f"Z-statistic:", z_stat)
print(f"P-value:", p_value)

alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis: The completion rate for the test group is lower than the completion rate for the control group increased by 5%.")
else:
    print("Fail to reject the null hypothesis: No significant difference in completion rates.")

Z-statistic: -7.464264105798114
P-value: 4.188321807812895e-14
Reject the null hypothesis: The completion rate for the test group is lower than the completion rate for the control group increased by 5%.


# Age

You might want to test whether the average age of clients engaging with the new process is the same as those engaging with the old process.

In [41]:
test = pd.read_csv('../data/clean/client_id_test.csv')
control = pd.read_csv('../data/clean/client_id_control.csv')
info = pd.read_csv('../data/clean/total_client_info.csv')
info

Unnamed: 0,client_id,Variation,client_tenure_year,client_tenure_month,client_age,gender,account_number,balance,calls_6_months,logons_6_months
0,9988021,Test,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0
1,8320017,Test,22.0,274.0,34.5,M,2.0,36001.90,5.0,8.0
2,4033851,Control,12.0,149.0,63.5,M,2.0,142642.26,5.0,8.0
3,1982004,Test,6.0,80.0,44.5,U,2.0,30231.76,1.0,4.0
4,9294070,Control,5.0,70.0,29.0,U,2.0,34254.54,0.0,3.0
...,...,...,...,...,...,...,...,...,...,...
50482,393005,Control,15.0,191.0,52.5,M,2.0,60344.67,1.0,4.0
50483,2908510,Control,21.0,252.0,34.0,M,3.0,141808.05,6.0,9.0
50484,7230446,Test,6.0,74.0,62.0,M,2.0,58778.11,2.0,5.0
50485,5230357,Test,23.0,278.0,30.5,M,2.0,61349.70,0.0,3.0


## Mean Ages

In [56]:
info_test = round(info[info['Variation'] == 'Test']['client_age'].mean(), 3)
info_control = round(info[info['Variation'] == 'Control']['client_age'].mean(), 3)
print('Mean client age test group:', info_test)
print('Mean client age control group:', info_control)

Mean client age test group: 47.164
Mean client age control group: 47.498


In [49]:
info_test = info[info['Variation'] == 'Test']['client_age']
info_control = info[info['Variation'] == 'Control']['client_age']
t_stat, p_value = st.ttest_ind(info_test, info_control, alternative = 'two-sided', equal_var = False)

print('t_stat, p_value:', round(t_stat, 2), round(p_value, 5))
if p_value < alpha:
    print('We succesfully reject the null hypothesis: Clients in the control group are older on average.')
else: 
    print('We fail to reject the null hypothesis.')

t_stat, p_value: -2.42 0.01569
We succesfully reject the null hypothesis: Clients in the control group are older on average.


# Tenure

## Mean Tenure

In [57]:
info_test = round(info[info['Variation'] == 'Test']['client_tenure_year'].mean(), 3)
info_control = round(info[info['Variation'] == 'Control']['client_tenure_year'].mean(), 3)
print('Mean client tenure test group:', info_test)
print('Mean client tenure control group:', info_control)

Mean client age test group: 11.983
Mean client age control group: 12.088


In [53]:
info_test = info[info['Variation'] == 'Test']['client_tenure_year']
info_control = info[info['Variation'] == 'Control']['client_tenure_year']
t_stat, p_value = st.ttest_ind(info_test, info_control, alternative = 'two-sided', equal_var = False)

print('t_stat, p_value:', round(t_stat, 2), round(p_value, 5))
if p_value < alpha:
    print('We succesfully reject the null hypothesis: Clients in the control group have been with Vanguard for more years.')
else: 
    print('We fail to reject the null hypothesis.')

t_stat, p_value: -1.71 0.08647
We fail to reject the null hypothesis.
