**Instructions**

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. 

1. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

Set up the hypothesis test.
- Write down all the steps followed for setting up the test.
- Calculate the test statistic by hand and also code it in Python. It should be 4.76190. We will take a look at how to make decisions based on this calculated value.

- If you finished the previous question, please go through the code for principal_component_analysis_example provided in the files_for_lab folder .

In [5]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import seaborn as sns 

In [6]:
import math

sample_mean = 130.1
pop_mean = 120
sample_std = 21.21
n = 100
statistic = (sample_mean - pop_mean)/(sample_std/math.sqrt(n))
print("Statistic is: ", statistic)

Statistic is:  4.761904761904759


1. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?


Yes

#### Set up the hypothesis test.

- Write down all the steps followed for setting up the test.


In [9]:
# from scipy import stats
# from numpy.random import normal
# import numpy as np

# samples = {}

# for i in range(10):
#     sample_name = "sample_" + str(i)
#     samples[sample_name] = normal(loc = sample_mean, scale = sample_std, size = n)
#     sample_mean = "sample_" + str(i) + "_mean"
#     samples[sample_mean] = np.mean(samples[sample_name])
#     sample_std = "sample_" + str(i) + "_std"
#     samples[sample_std] = np.std(samples[sample_name],ddof=1)
#     sample_statistic = "sample_" + str(i) + "_t-statistic"
#     samples[sample_statistic] = (samples[sample_mean]- pop_mean)/(samples[sample_std]/math.sqrt(n)) 
#     print("The t-statistic for the sample {} is: {}".format(i,samples[sample_statistic]))

In [10]:
from scipy import stats
from numpy.random import normal
import numpy as np

samples = {}

for i in range(10):
    sample_name = "sample_" + str(i)
    samples[sample_name] = normal(loc = 130.1, scale = 21.21, size = 100)
    sample_mean = "sample_" + str(i) + "_mean"
    samples[sample_mean] = np.mean(samples[sample_name])
    sample_std = "sample_" + str(i) + "_std"
    samples[sample_std] = np.std(samples[sample_name],ddof=1)
    sample_statistic = "sample_" + str(i) + "_t-statistic"
    samples[sample_statistic] = (samples[sample_mean]- pop_mean)/(samples[sample_std]/math.sqrt(n)) 
    print("The t-statistic for the sample {} is: {}".format(i,samples[sample_statistic]))

The t-statistic for the sample 0 is: 6.109995888793006
The t-statistic for the sample 1 is: 7.098520665583747
The t-statistic for the sample 2 is: 4.811224574955763
The t-statistic for the sample 3 is: 4.410720135780319
The t-statistic for the sample 4 is: 2.577265261120735
The t-statistic for the sample 5 is: 4.995143191854517
The t-statistic for the sample 6 is: 2.855961109672858
The t-statistic for the sample 7 is: 3.442171985731387
The t-statistic for the sample 8 is: 4.183110137275605
The t-statistic for the sample 9 is: 5.504422490023835


In [20]:
len(samples["sample_0"])

100

In [21]:
samples.keys()

dict_keys(['sample_0', 'sample_0_mean', 'sample_0_std', 'sample_0_t-statistic', 'sample_1', 'sample_1_mean', 'sample_1_std', 'sample_1_t-statistic', 'sample_2', 'sample_2_mean', 'sample_2_std', 'sample_2_t-statistic', 'sample_3', 'sample_3_mean', 'sample_3_std', 'sample_3_t-statistic', 'sample_4', 'sample_4_mean', 'sample_4_std', 'sample_4_t-statistic', 'sample_5', 'sample_5_mean', 'sample_5_std', 'sample_5_t-statistic', 'sample_6', 'sample_6_mean', 'sample_6_std', 'sample_6_t-statistic', 'sample_7', 'sample_7_mean', 'sample_7_std', 'sample_7_t-statistic', 'sample_8', 'sample_8_mean', 'sample_8_std', 'sample_8_t-statistic', 'sample_9', 'sample_9_mean', 'sample_9_std', 'sample_9_t-statistic'])

In [22]:
print("Assuming a significance level of 0.05")
print()

for i in range(10):
    sample_name = "sample_" + str(i)
    # In the next line, 85 is the population's mean.
    print("The p-value of sample {} is: {:-5.3}".format(i,stats.ttest_1samp(samples[sample_name],120)[1]))
    print("The values in the sample are: ")
    print(samples[sample_name])
    sample_mean = "sample_" + str(i) + "_mean"
    print(samples[sample_mean])
    print()
    if ( stats.ttest_1samp(samples[sample_name],120)[1] < 0.05 ):
        print("Therefore we discard the null hypothesis Ho, as it's very unlikely to get sample {} given Ho.".format(i))
    else: 
        print("We accept the null hypothesis Ho, as it's very likely to obtain sample {} given Ho".format(i) )
    print()

Assuming a significance level of 0.05

The p-value of sample 0 is: 1.97e-08
The values in the sample are: 
[129.17967687 161.53553075 142.28350588 120.35006611 153.51032635
 154.52418149 135.02307635 113.60374277 126.24174261 122.7089168
 109.56686094 135.07713774 120.82140855 154.93250966 129.84259517
 140.19939724 170.26286784 105.56353328 151.82595086 139.05069843
 130.96430385 157.56939171 121.32642206 156.63438552 114.09476574
 130.5880102  124.15086022 158.05715087 136.11798949 108.32738564
 144.53476124 126.60282866 155.88549742 119.73286111  91.13296971
 156.69018505 156.61528637 135.11596595  87.29552151 112.59581621
  99.04363325 111.44404364 139.71626252 143.4981455  103.56326952
 109.63803423 140.2553391  128.50999371 119.2764842  150.2220346
 156.10065305 128.68228773  95.05145956 107.41508043 127.38759184
 127.2765043  135.73053878 118.28953963  98.37294449 131.98044917
 123.309127   159.79419316 139.08606361 131.08140523 131.41567289
 147.72039947 147.02658095 120.738781