# Inferential Statistics

**Author**: _Adi Bronshtein (DC)_ with additions from _Wessley Bosse (LA)_ and _Jeff Hale (DC)_

---

## Confidence Interval Review

Let's say we wanted to know how many hours of sleep DSI students get, on average. It's not really a viable option to ask every single DSI student in all of the campuses (especially if we're checking across cohorts!) 

So instead, we'll collect a sample of hours of sleep of students in the DC campus, and use that to build a confidence interval - a range of values of average hours of a sleep. The level of confidence we have in our estimates/predictions will change the range of values. 

Let's check it out:

#### List of average hours of sleep each student gets a night

In [None]:
sleep = []

#### import the necessary libraries 

In [None]:
import numpy as np
import math 
from scipy import stats

#### Get the sample's mean and standard deviation (sigma)

In [None]:
mean

In [None]:
stdv



### Sample

The formula for the sample **mean**: $\bar{x} = \frac{1}{n} \sum_{i=1}^{n}x_{i}$


The formula for the sample **standard deviation** : $s = \sqrt{\frac {\sum{(x_i - \bar{X})^2}} {n-1}}$


- $x_i$ = each value from the sample
- $\bar{X}$ = the sample mean
- $n$ = the size of the sample


### Population


$\mu$ (mu) is the **population mean** $\mu = \frac{1}{N} \sum_{i=1}^{N}x_{i}$


$\sigma$ (sigma) is the **population standard deviation**  $sigma = \sqrt{\frac {\sum{(x_i - \mu)^2}} {N}}$

- $x_i$ = each value from the population
- $\mu$ = the population mean
- $N$ = the size of the population


**REMEMBER**: Greek letters are for the population! ☝️


## Calculate the Confidence Interval Manually

### 68-95-99.7

- 68% confidence interval can be approximated by adding and subtracting 1 standard deviation from the mean.
- 95% confidence interval is ~2 standard deviation away from the mean.
- 99.7% confidence interval is ~3 standard deviation away from the mean.
![](https://miro.medium.com/max/24000/1*IZ2II2HYKeoMrdLU5jW6Dw.png)

#### What is the value we will need to add/subtract to find the 95% confidence interval?  (using the approximation)

In [None]:
diff

#### generate the 95.45% confidence interval

In [None]:
lower_boundry = 
upper_boundry = 
(lower_boundry, upper_boundry)

#### repeat the process for 99.7% confidence

In [None]:
diff_99 = 
lower_99 = 
upper_99 = 



### Interpretation (loaned [ok, stolen] directly from the lecture):

Generally, we would say:

- "I am {confidence level}% confident
- that the true population {parameter}
- is between {lower confidence bound} and {upper confidence bound}."

#### 99.7% of the time the true population mean should be within these values.

## Confidence Interval with the Stats Module (from SciPy)

#### Create a confidence interval with 95% confidence 

- first argument = level of confidence
- second argument (loc) = location (where do we center the confidence interval)
- third argument (scale) = scale/spread

In [None]:
confidence_95 = stats.norm.interval(0.95, loc=mean, scale=stdv)

The confidence interval - the left side is the bottom estimate and the right is the top estimate.

#### What type of Python object is confidence_95?

In [None]:
f"I am 95% confident that a DSI student sleeps between {round(confidence_95[0], 2)} hours \
and {round(confidence_95[1], 2)} hours a night, on average."

#### Use `stats` package to create a confidence interval with 99% confidence

#### Print out the interpretation with an f-string.

#### Use stats package to create a confidence interval with 90% confidence

#### Print out the interpretation with an f-string.