<h1 align="center">A/B Testing To Measure Drug Effectiveness</h1>

In this notebook, we will show how to conduct A/B testings using a two sample Z test.
There is a drug A in the market that removes the headache in X number of hours. New drug B is being developed in this same pharma company and their hypothesis is that B can reduce the headache in less number of hours (i.e Recovery time for drug B is better compared to drug A). We have conducted the clinical trail and the results are available in the csv file

In [47]:
import pandas as pd
import numpy as np
from scipy import stats as st

In [39]:
control = pd.read_csv("control_group.csv")
control.head()

Unnamed: 0,person_id,recovery_time_hrs
0,C001,4.4
1,C002,4.1
2,C003,3.2
3,C004,4.3
4,C005,4.9


In [40]:
test = pd.read_csv("test_group.csv")
test.head()

Unnamed: 0,person_id,recovery_time_hrs
0,T001,3.7
1,T002,4.1
2,T003,3.7
3,T004,3.9
4,T005,3.7


In [41]:
control_mean = control.recovery_time_hrs.mean().round(2)
control_std = control.recovery_time_hrs.std().round(2)
control_size = control.shape[0]
control_mean, control_std, control_size

(4.0, 0.48, 130)

In [42]:
test_mean = test.recovery_time_hrs.mean().round(2)
test_std = test.recovery_time_hrs.std().round(2)
test_size = test.shape[0] 
test_mean, test_std, test_size

(3.91, 0.32, 100)

### Hypothesis

1. Null Hypothesis (H0): No improvement in recovery time in the test group (mean_control <= mean_test)
1. Alternate Hypothesis (H1): Improvement in recovery time in the test group (mean_control > mean_test)

This is a right-tailed test since we are testing if the mean of the test group is significantly lower than the control group.


### Test Using Rejection Region (i.e. Critical Z Value)

In [59]:
a = (control_std**2/control_size)
b = (test_std**2/test_size)

Z_score = (control_mean-test_mean)/np.sqrt(a+b)
Z_score

1.701962671923127

In [60]:
# For a significance level of 5% (0.05) in a right-tailed test, the critical Z-value is approximately 1.645

alpha = 0.05 # significance level of 5%

critical_z_value = st.norm.ppf(1 - alpha)  # Right-tailed test at 5% significance level
critical_z_value

1.6448536269514722

In [55]:
Z_score > critical_z_value

True

As you can see above, since the Z score is higher than the critical z value, it falls inside the rejection region. Hence we will reject our null hypothesis and accept that alternate hypothesis that the new drug indeed reduces the recovery time for the headache

### Test Using p-Value

In [56]:
# Calculate the p-value corresponding to z score for a right-tailed test
p_value = 1 - st.norm.cdf(Z_score)
p_value

0.0443811829929599

In [62]:
p_value < alpha # p value is less than significance level of 5% (or 0.05 for absolute value)

True

As you can see above, calculated p value is less than 5% significance level. Hence we will reject our null hypothesis and accept that alternate hypothesis that the new drug indeed reduces the recovery time for the headache