# Lab | Inferential statistics


### Instructions

1. It is assumed that the mean systolic blood pressure is `μ = 120 mm Hg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

   - Set up the hypothesis test.
   - Write down all the steps followed for setting up the test.
   - Calculate the test statistic by hand and also code it in Python. It should be 4.76190. We will take a look at how to make decisions based on this calculated value.

2. If you finished the previous question, please go through the code for `principal_component_analysis_example` provided in the `files_for_lab` folder .

`Ho: μ = 120 mm Hg`

`H1: μ != 120 mm Hg`

x = 130.1 mmm Hg

std = 21.21 mmg Hg

n = 100 people

In [1]:
import math

sample_mean = 130.1
pop_mean = 120
sample_std = 21.21
n = 100
statistic = (sample_mean - pop_mean)/(sample_std/math.sqrt(n))
print("Statistic is: ", statistic)

Statistic is:  4.761904761904759


In [5]:
from scipy import stats
from numpy.random import normal
import numpy as np

samples = {}

for i in range(10):
    sample_name = "sample_" + str(i)
    samples[sample_name] = normal(loc = 80.94, scale = 11.6, size = 25)
    sample_mean = "sample_" + str(i) + "_mean"
    samples[sample_mean] = np.mean(samples[sample_name])
    sample_std = "sample_" + str(i) + "_std"
    samples[sample_std] = np.std(samples[sample_name],ddof=1)
    sample_statistic = "sample_" + str(i) + "_t-statistic"
    samples[sample_statistic] = (samples[sample_mean]- pop_mean)/(samples[sample_std]/math.sqrt(n)) 
    print("The t-statistic for the sample {} is: {}".format(i,samples[sample_statistic]))


The t-statistic for the sample 0 is: -35.96451043183534
The t-statistic for the sample 1 is: -31.575164901583932
The t-statistic for the sample 2 is: -36.52720050113699
The t-statistic for the sample 3 is: -41.126239769424714
The t-statistic for the sample 4 is: -33.555009500993044
The t-statistic for the sample 5 is: -26.667895961545646
The t-statistic for the sample 6 is: -32.82560711795883
The t-statistic for the sample 7 is: -29.462957575141786
The t-statistic for the sample 8 is: -47.83689432848486
The t-statistic for the sample 9 is: -38.79057237716093


In [6]:
print("Assuming a significance level of 0.05")
print()

for i in range(10):
    sample_name = "sample_" + str(i)
    # In the next line, 85 is the population's mean.
    print("The p-value of sample {} is: {:-5.3}".format(i,stats.ttest_1samp(samples[sample_name],85)[1]))
    print("The values in the sample are: ")
    print(samples[sample_name])
    sample_mean = "sample_" + str(i) + "_mean"
    print(samples[sample_mean])
    print()
    if ( stats.ttest_1samp(samples[sample_name],85)[1] < 0.05 ):
        print("Therefore we discard the null hypothesis Ho, as it's very unlikely to get sample {} given Ho.".format(i))
    else: 
        print("We accept the null hypothesis Ho, as it's very likely to obtain sample {} given Ho".format(i) )
    print()

Assuming a significance level of 0.05

The p-value of sample 0 is: 0.0377
The values in the sample are: 
[81.90436042 79.65358321 82.39749941 75.01975358 93.96286745 61.8080026
 78.28490727 61.76037452 73.57098308 83.6403375  72.97611031 96.37848914
 93.1455673  91.72942521 94.93452544 65.56054805 85.57452469 78.20776294
 80.44445061 96.9748267  87.54564262 60.75928124 68.770748   73.36761971
 84.64009739]
80.12049153485731

Therefore we discard the null hypothesis Ho, as it's very unlikely to get sample 0 given Ho.

The p-value of sample 1 is: 0.198
The values in the sample are: 
[ 69.6246599   85.85827926  81.41702513  79.30724326  83.83062367
  75.82776578  89.0196785   74.91050969  71.28637072  92.76105195
  73.72183214  64.59990546  88.59898626  77.39523311  76.08151148
  92.06205387  70.83803278  68.31881746  93.87150932  75.96586511
  94.52491762 108.64490584  61.00792406  90.44464702 104.9618805 ]
81.79524919583784

We accept the null hypothesis Ho, as it's very likely to obtai