# Lab | Inferential statistics
### Instructions
It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

Set up the hypothesis test.
Write down all the steps followed for setting up the test.
Calculate the test statistic by hand and also code it in Python. It should be 4.76190. We will take a look at how to make decisions based on this calculated value.
If you finished the previous question, please go through the code for principal_component_analysis_example provided in the files_for_lab folder .

**Step 1:** Define the null hypothesis - This is our assumption about the population. It is defined by H0 and in this case **H0: μ = 120**;

**Step 2:** Define the alternative hypothesis - This means, what if our assumption is not true. It is defined by Ha and in this case **Ha: μ ≠ 120**. 

**Step 3:** Determine if it is a one-tailed or a two-tailed test. This is a **two-tailed test**.

**Step 4:** N = 100, we can either use z-test or t-test.

**Step 5:** Level of significance: This defines the rejection region/critical region, it's the probability of making the wrong decision when the null hypothesis is true. In **this case is 0.05**. 
 
**Step 6:** Calculate the test statistic based on the given information.

**Step 7:** Check the table to determine the critical value.
<br> For z-test you have fixed values according to Confidence Level.
<br> For t-test you have to calculate according to the degrees of freedom (df), which is the *sample_size - 1*.

**Step 8:** Make conclusions:
* If the test statistic falls in the critical region, then we reject the Null Hypothesis
* If the test statistic falls in the region between the critical region, then we fail to reject the Null Hypothesis.

In [35]:
import math
import scipy.stats
from scipy.stats import t, norm
import numpy as np

In [95]:
sample_mean = 130.1
pop_mean = 120
pop_std = 21.21
n = 100

statistic = (sample_mean - pop_mean)/(pop_std/math.sqrt(n))
print("statistic value", statistic)

t_critic = scipy.stats.t.ppf(1-0.05/2, n-1)
print("t-critical value: ", t_critic) 


# Determining the p-value
p_val = scipy.stats.t.sf(abs(statistic), df = n-1)*2
print("p-value: ", p_val)

statistic value 4.761904761904759
t-critical value:  1.9842169515086827
p-value:  6.562701817208617e-06


### Conclusion

> statistic = 4.76 

> p-value < 0.05

- From the result, the H0 hypothesis was rejected, meaning

    **Blood pressure of sample group in Honolulu is significantly different from the regular population**

In [2]:
#### IGNORE BELOW THIS LINE ####

### Two-tailed T-test

In [92]:
# Find t-score --> scipy.stats.t.ppf(q, df) -- q = critical value, df = degree of freedom
# In our case, this is two-tailed test
t_stat = round(abs(sample_mean - pop_mean) / (pop_std / np.sqrt(n)), 4)
t_critic = round(scipy.stats.t.ppf(1-0.05/2, n-1), 4)
print("t-statistic (t-value): ", t_stat)
print("t-critical value: ", t_critic)


# Find p-value from t-score (just want to try using different function)
p_val_cdf = round((scipy.stats.t.cdf(-abs(t_stat), n-1))*2, 4) # cdf = cumulative distribution function
p_val_sf = scipy.stats.t.sf(abs(t_stat), n-1)*2 

print("p-value:", p_val_cdf)
#print("p-value using t.sf:", p_val_sf)

t-statistic (t-value):  4.7619
t-critical value:  1.9842
p-value: 0.0


### Two-tailed Z-test

In [93]:
alpha = 0.05
n_sided = 2

# Calculate z-score
z_stat = round((sample_mean - pop_mean) / (pop_std / np.sqrt(n)), 4)
z_critic = round(scipy.stats.norm.ppf(1-alpha/n_sided), 4)
print("z-statistic (z-value): ", z_stat)
print("z-critical value: ", z_critic)

# Find p-value from z-score
p_val_cdf = 2*(1 - scipy.stats.norm.cdf(abs(z_stat)))
p_val_sf = scipy.stats.norm.sf(abs(z_stat))*2

print("p-value:", p_val_cdf)
# print("p-value using sf:", p_val_sf)

z-statistic (z-value):  4.7619
z-critical value:  1.96
p-value: 1.9177871528608392e-06


- Z-test gave different p-value comparing to T-test but still < 0.05. H0 was rejected

### Sources:

#### Z-test
- [How to Find a P-Value from a Z-Score in Python](https://softwareto.com/p-value-from-z-score-python/)
- [Z Test: Uses, Formula & Examples](https://statisticsbyjim.com/hypothesis-testing/z-test/)
- [Probability to z-score and vice versa](https://stackoverflow.com/questions/20864847/probability-to-z-score-and-vice-versa)
- [The Standard Normal Distribution | Calculator, Examples & Uses](https://www.scribbr.com/statistics/standard-normal-distribution/)

#### T-test
- [A beginner’s guide to Student’s t-test in python from scratch](https://analyticsindiamag.com/a-beginners-guide-to-students-t-test-in-python-from-scratch%EF%BF%BC/)
- [How to do a t-test in Python?](https://thedatascientist.com/how-to-do-a-t-test-in-python/)
- [INDEPENDENT T-TEST](https://www.pythonfordatascience.org/independent-samples-t-test-python/)
- [Python Statistics – Python p-Value, Correlation, T-test, KS Test](https://data-flair.training/blogs/python-statistics/)
- [T test formula](http://www.sthda.com/english/wiki/t-test-formula)
- [Types of t-tests](https://www.jmp.com/en_ch/statistics-knowledge-portal/t-test.html)