# Hypothesis Test

In [1]:
import scipy.stats
import pandas as pd
import numpy as np
%config Completer.use_jedi = False

1. It is assumed that the mean systolic blood pressure is `μ = 120 mmHg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mmHg with a standard deviation of 21.21 mmHg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?


   - Set up the hypothesis test.
   - Write down all the steps followed for setting up the test.
   - Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?

Null hypothesis: H<sub>0</sub>: $\mu$ = $\mu$<sub>0</sub> = 120    
Alternative hypothesis:  H<sub>0</sub>: $\mu$ $\neq$ $\mu$<sub>0</sub>   

1. Calculate the value of the test statistic:
     
$$t = \frac{\bar{x} - \mu}{s / \sqrt{n}} \Longrightarrow   
t = \frac{130.1 - 120}{21.11 / \sqrt{100}} \Longrightarrow t = 4.7619$$

2. Find critic value of t from the [Student's t distribution table](https://edisciplinas.usp.br/pluginfile.php/1786954/mod_resource/content/1/Tabelat-student-2.pdf) with  $\alpha$ = 0.05 and (n - 1) degrees of freedom:       

$$| t_{0.025, 99} | = 1.9842$$

3. Compare both values and make a desision:    
Since 4.7619 > 1.9842, we __reject__ the null hypothesis.

In [2]:
scipy.stats.t.ppf(.975, df=99)

1.9842169515086827

2. In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `Data/machine.txt`. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other?

In [3]:
machine = pd.read_csv('Data/machine.txt', sep='\t')
machine.columns = machine.columns.str.strip()
machine

Unnamed: 0,New Machine,Old Machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [4]:
scipy.stats.ttest_ind(machine['Old Machine'], machine['New Machine'])

Ttest_indResult(statistic=3.3972307061176026, pvalue=0.0032111425007745158)

p-value < 0.05. There's no evidence that the means are different (keep the null hypothesis).