In [None]:
import numpy as np
import scipy.stats

# Confidence intervals for the Normal distribution: $\sigma$ unknown

In this notebook, you'll practice contructing the most basic confidence intervals - those for the parameter $\mu$ of the Normal distribution assuming that the variance $\sigma^2$ is unknown, which means that it should be estimated from the data with the sample variance $s^2$.

As you saw already, a $(1-\alpha)-$ confidence interval for the unknown parameter $\mu$ of a Normal distribution with **unknown** variance $\sigma^2$ can be constructed according to the following formula:

$(\bar{X} - \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1} \ ; \ \bar{X} + \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1}),$

where $t_{1-\alpha/2}^{n-1}$ denotes $(1-\alpha/2)-$quantile of the Student Distribution with $(n-1)$ degrees of freedom.
 
Indeed,

$P(\bar{X} - \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1} < \mu < \bar{X} + \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1}) = 1 - \alpha$.


Define a function *t_interval()* that constructs such an interval (i.e., computes and returns its lower and upper bounds) given the current sample szie $n$, sample mean $\bar{X}$, sample standard deviation $s$ and the confidence level $\alpha$.

To compute the quantiles, use [scipy.stats.t.ppf()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html).

In [None]:
# Your code here


Check your function: for the example solved before, where $n = 20, \ \bar{X} = 42$ and $s^2 = 36$, a 0.95 - confidence interval for the unknown parameter $\mu$ is approximately $(39.191, 44.808)$.

In [None]:
# Your code here


## The effect of approximation

Before, you were constructing so-called z-intervals, i.e., CIs for the mean of the normal distribution with known variance. In the current setting, when the true variance $\sigma^2$ in unknown, you approzimate it with the sample one. 

How does such an approximation influence the size of the interval? Make a guess based on your intuition.

Now, let's run a little experiment to find that out. 

For various levels of confidence (e.g., $\alpha = 0.1, 0.05, 0.01$) and various sample sizes $n$, sample from the normal distribution with the parameter values of your choice. 

Then, construct two CIs: the z-interval, where you use the true values of $\sigma$, and the t-interval, where you pretend that the true value is unknown and approximate it with the sample standard deviation. 

Then, compare the obtained intervals. IS there any difference?

Use the function *z_interval()* you've implemented before to construct CIs for the mean of the normal distribution with known variance.

To obtain sample variance, use the **unbiased** estimator of $\sigma$.

In [None]:
# Your code here


In [None]:
# Your code here
