<a href="https://colab.research.google.com/github/rosslogan702/hypothesis_testing_notes/blob/master/2_sample_z_test_proportion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **What is a Standard Normal Distribution?**

####Definition

The standard normal distribution is a special case of the normal distribution. 
It is the distribution that occurs when a normal random variable has a mean of 
zero and a standard deviation of 1.

### **Standard Score (Z-Score)**

The normal random variable of a standard normal distribution is called a standard score or z-score.

Every normal random variable can be transformed into a z-score using the following formula:

$z = (X-\mu)/\sigma$


Where X is the normal random variable, $\mu$ is the mean of X and $\sigma$ is the standard deviation of X.

### **2 Sample Z-Test of Proportions**

A 2 sample Z-test of proportions allows you to compare two proportions to determine if there is a difference between them.

#### Sample Question - Manual Calculation

A pharmaceutical study is testing two new treatments for the flu, drug A and drug B.

Drug A works on 41 people out of a sample of 195. Drug B works on 351 people out of a sample of 605. Are the two drugs comparable?

### **Step 1 - Define Null and Alternative Hypothesis**

$H_0$: The proportions of people that Drug A and Drug B work on are the same.

$H_A$: The proportions of people that Drug A and Drug B work on are not the same

Test at the significance level of 0.05

$\alpha$ = 0.05

### **Step 2 - Find the two proportions & overall sample proportion**

In [0]:
drug_a_sample_size = 195
drug_a_works = 41

drug_b_sample_size = 605
drug_b_works = 351

p1 = drug_a_works/drug_a_sample_size
p2 = drug_b_works/drug_b_sample_size

print('Proportion of people drug A works on, p1: {}'.format(p1))
print('Proportion of people drug b works on, p2: {}'.format(p2))

p = (drug_a_works + drug_b_works)/(drug_a_sample_size + drug_b_sample_size)
print('Overall sample proportion, p: {}'.format(p))

Proportion of people drug A works on, p1: 0.21025641025641026
Proportion of people drug b works on, p2: 0.5801652892561984
Overall sample proportion, p: 0.49


### **Step 3 - Calculate the Test Statistic**

$z\text{-}statistic = \frac{(\hat{p_2} - \hat{p_1})-0}{\sqrt{{\hat{p}(1-{\hat{p})}}({\dfrac{1}{n_1} + \dfrac{1}{n_2}})}}$

In [0]:
import numpy as np

z_statistic = (p2-p1)/(np.sqrt((p*(1-p)*((1/drug_a_sample_size) + (1/drug_b_sample_size)))))

print(z_statistic)

8.985900954503084


### **Step 4 - Find the Z-Score Value**

We now need to determine if the z_statistic value falls into the rejection region. To do this we look up the z score value for the alpha/2 value.

The z-score associated with 0.05/2 value is 1.96

Z table: https://www.statisticshowto.datasciencecentral.com/tables/z-table/

### **Step 5 - Compare the calculated z-score to the critical value from z-table**

If the calculate z-statistic value is greater than the z-score from the z-table then we can reject the null hypothesis and favour the alternative hypothesis.

In [0]:
z_table_score = 1.96

if z_table_score < z_statistic:
  print('Z statistic > Z table score, reject the null hypothesis')
else:
  print('Z statistic < Z table score, failed to reject the null hypothesis')

Z statistic > Z table score, reject the null hypothesis


### **Alternate method using Statsmodels (python lib)**

Rather than doing the above calculation manually, there are python libraries we can use to perform the calculation for us.

This is shown below.

In [0]:
from statsmodels.stats.proportion import proportions_ztest

In [0]:
counts = np.array([41, 351])
nobs = np.array([195, 605])
stat, pval = proportions_ztest(counts, nobs)

print('Z Statistic: {0:0.3f}'.format(stat))
print('P-Value: {0:0.3f}'.format(pval))

if pval < 0.05:
  print('Reject the null hypothesis')
else:
  print('Failed to reject the null hypothesis')

Z Statistic: -8.986
P-Value: 0.000
Reject the null hypothesis


### **References**

[0] https://stattrek.com/probability-distributions/standard-normal.aspx?Tutorial=AP  
[1] https://www.statisticshowto.datasciencecentral.com/z-test/  

[2] https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.proportions_ztest.html  