# Imports

In [3]:
import numpy as np
from scipy import stats
import pandas as pd
import math

# Hypotesis testing
# 1. One sample t-test

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

- Set up the hypothesis test.
- Write down all the steps followed for setting up the test.
- Calculate the test statistic by hand and also code it in Python. It should be 4.76190. We will take a look at how to make decisions based on this calculated value.

### Task

**significance level = 0.05 (95% confidence)  
goal: check if the blood pressure of the sample is significantly different from the regular population (120 mmHG)**

### Set up hypotesis

**- H0: There is NO difference between the blood pressure of the grup and the regular population.**  
    This means that we can say that the sample group has a mean blood pressure of 120 and wouldn't make much of a difference, because the difference between the mean distribution is within the confidence intervals of acceptance.  
    H0: µ = 


**- HA: There is some significant difference between the blood pressure of the group and the regular population.**  
    This means that we can't say that the sample group has a mean blood pressure of 120, because the sample group has enought evidence to be considered outside of the set up confidence intervals.  
    Ha: µ ≠ 15

### Define the problem

The probability of an average systolic blood pressure of 130.1 mm Hg of the sample being statistically equal to the average of the population (120 mm Hg) with an level of confidence of 0.05 is equal to the probability of t-critical value being bigger than our t-statistic value.

- P(μ_sample = 120)
- P(t-critical > t-statistics) 
- P(t-critical > (mean_difference / standard_error))
- P(t-critical > (μ_sample - μ_population) / (std / sqr(n))

### Set up confidence - t-critical value

Determining a t-critical value from the Z distribution. How likely are we to reject the Null Hypotesis when it is actually true (Type I error rate).

- α = Level of significance = P(Type I error) = P(Reject H0 | H0 is true) = 0.05 (95% of confidence)  
- dof (degrees of freedom) = n - 1 = 99  
- t-critical = 1.984 (from tables, two tailed test)

### Calculating our t-statistics

In [4]:
difference_between_means = 130.1 - 120
standard_error_for_the_mean = 21.21 / math.sqrt(100)
t = round((difference_between_means / standard_error_for_the_mean), 3)
print(t)

4.762


### Results

In [11]:
t_critical = 1.984  # t_critical is usually written as t_0.05_99
                    # as it is a two sided test, this t_critical is in fact +/- 1.984
t_statistics = t

t_critical > t_statistics

False

### Conclusions

Our t-statistics is more extreme than the t-critical value.  
This means that the t-statistic value does not lie between the accepted margins of -1.984 and +1.984. So we can reject the null hypothesis, that the mean of the sample is equal to the mean of the population.  
And in that case, we have statistical evidence to say that the mean of 130.1 mm Hg is statistically different from the mean of 120 mm Hg.

### Checking results

We are checking the conclusions calculating the p-value.  
To calculate the p-value, we will use the survival function (sf = 1 - cdf (cumulative distribution function)) for the given t-statistics value and dof.

In [33]:
t_stat = t
dof = 99
alpha = 0.05

In [30]:
p_value = stats.t.sf(abs(t_stat), df=dof)*2 # multiply by two as it is a two sided test
# 2*(1 - stats.t.cdf(abs(t), df=dof)) -- same results

In [32]:
print(p_value)

6.560183365621503e-06


In [34]:
alpha > p_value

True

We can reject the Null Hipotesis.