# Inferential Statistics

Let's say we wanted to know how many hours of sleep DSI students get, on average. It's not really a viable option to ask every single DSI student in all of the campuses (especially if we're checking across cohorts!) So instead, we'll collect a sample of hours of sleep of students in the DC campus, and use hypothesis testing to if a DSI gets, for example, 6 hours of sleep every night, on average. 

In [1]:
# List of average hours of sleep each student in DSI 10 gets a night
sleep = [5, 7, 6, 8, 6, 8.5, 6.5, 8, 7.5, 7, 6.5, 6, 8]

In [2]:
# import the necessary libraries 
import numpy as np
from scipy import stats

## Hypothesis Testing

You probably remember that we talked about Confidence Intervals on Week 2. Hypothesis Test is kind of an inversion of confidence interval.

1. The first step is setting null and alternative hypothesis. One or the other has to be true:
$$H_0 {(null)}: \mu = 6$$
$$H_A {(alternative)}: {\mu \ne 6}$$
2. Step two - gather data
3. Step three - calcualte statistic
$$t = \frac{(\bar{x} - \mu)}{\frac{\sigma}{\sqrt{n}}}$$
In words/English: **T-statistic** equals the **sample mean** minus the hypothesized **population mean**, divided by the Standard Error. As a reminder, the Standard Error formula - $s_{e} =  \frac {\sigma}{\sqrt n}$
4. Find p-value (the probability that if we run this test again that we get this result, or a more extreme one, again)
5. Make a conclusion - if the p-value is small enough, it means the difference is pretty big, and then we reject the null hypothesis. If it's not, we fail to reject the null hypothesis ($H_0$).

### One Sample T-Test

Let's check if on average, a student in DSI 10 gets 6 hours of sleep on average. For that we'll use **One Sample T-Test**.

In [3]:
# Run a t-test using the stats library
stats.ttest_1samp(sleep, 6)

Ttest_1sampResult(statistic=3.207134902949094, pvalue=0.007532147111056759)

In [None]:
# What's the T-Test result? 

If the P-value is lower than 0.05, we will able to reject the null hypothesis and conclude that a student in DSI 10 doesn't not get 6 hours of sleep a night on average. 

In [4]:
# What about 7 hours?
stats.ttest_1samp(sleep, 7)

Ttest_1sampResult(statistic=-0.26726124191242345, pvalue=0.793805888355454)

In [5]:
np.mean(sleep)

6.923076923076923

In [6]:
# What about 5? 
stats.ttest_1samp(sleep, 5)

Ttest_1sampResult(statistic=6.681531047810611, pvalue=2.2553017396155065e-05)

### Independent Sample T-Test

Let's say we want to see if there's a **statistically significant** difference between the average sleep time of DSI 9 and DSI 10 students (in the DC campus). For that, we will use the **Independent Samples T-Test**. 

The formula for **Independent Samples T-Test**:
![](https://miro.medium.com/max/932/1*1ZUnA4eR5J2WEGhDVPDkEw.png)

The hypothesis in Independent Samples Hypothesis Test is a little different:
![](https://slideplayer.com/slide/3605887/13/images/8/Two+Sample+Hypothesis+Test+with+Independent+Samples.jpg)

In [7]:
# List of average hours of sleep each student in DSI 9 gets a night
dsi_9_sleep = [6.5, 6.18, 6, 7, 6, 5.5, 7, 8.5, 7, 6.5]

In [8]:
# Independent sample T-Test
stats.ttest_ind(dsi_9_sleep, sleep)

Ttest_indResult(statistic=-0.7597986077718687, pvalue=0.4558206435414779)

In [9]:
# What are the averages?
print(np.mean(sleep))
print(np.mean(dsi_9_sleep))

6.923076923076923
6.618


In [10]:
print(np.std(sleep))
print(np.std(dsi_9_sleep))

0.9970370305242862
0.788236005267458


In [11]:
# Independent sample T-Test without equal variances
stats.ttest_ind(dsi_9_sleep, sleep, equal_var=False)

Ttest_indResult(statistic=-0.7828268532630052, pvalue=0.4424916504513319)

Based on the p-value, there isn't a **statistically significant** difference between DSI 9 and DSI 10 in hours of sleep they get every night.

Documentation for [One Way Anova](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.f_oneway.html)