In [None]:
import numpy as np
import scipy.stats

# Confidence intervals for the Normal distribution: $\sigma$ unknown

In this notebook, you'll practice contructing the most basic confidence intervals - those for the parameter $\mu$ of the Normal distribution assuming that the variance $\sigma^2$ is unknown, which means that it should be estimated from the data with the sample variance $s^2$.

As you saw already, a $(1-\alpha)-$ confidence interval for the unknown parameter $\mu$ of a Normal distribution with **unknown** variance $\sigma^2$ can be constructed according to the following formula:

$(\bar{X} - \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1} \ ; \ \bar{X} + \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1}),$

where $t_{1-\alpha/2}^{n-1}$ denotes $(1-\alpha/2)-$quantile of the Student Distribution with $(n-1)$ degrees of freedom.
 
Indeed,

$P(\bar{X} - \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1} < \mu < \bar{X} + \frac{\sigma}{\sqrt{n}}t_{1-\alpha/2}^{n-1}) = 1 - \alpha$.


Define a function *t_interval()* that constructs such an interval (i.e., computes and returns its lower and upper bounds) given the current sample szie $n$, sample mean $\bar{X}$, sample standard deviation $s$ and the confidence level $\alpha$.

To compute the quantiles, use [scipy.stats.t.ppf()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html).

In [None]:
# Your code here

def t_interval(n, sample_mean, sample_std, a=0.05):
  t = scipy.stats.t.ppf(1-a/2, n-1)
  lower = sample_mean - sample_std*t/np.sqrt(n)
  upper = sample_mean + sample_std*t/np.sqrt(n)
  return lower, upper

Check your function: for the example solved before, where $n = 20, \ \bar{X} = 42$ and $s^2 = 36$, a 0.95 - confidence interval for the unknown parameter $\mu$ is approximately $(39.191, 44.808)$.

In [None]:
n = 20
sample_mean = 42
s2 = 36
sample_std = np.sqrt(s2)
a=0.05

t_interval(n, sample_mean, sample_std, a)

(39.19191356148055, 44.80808643851945)

## The effect of approximation

Before, you were constructing so-called z-intervals, i.e., CIs for the mean of the normal distribution with known variance. In the current setting, when the true variance $\sigma^2$ in unknown, you approzimate it with the sample one. 

How does such an approximation influence the size of the interval? Make a guess based on your intuition.

Now, let's run a little experiment to find that out. 

For various levels of confidence (e.g., $\alpha = 0.1, 0.05, 0.01$) and various sample sizes $n$, sample from the normal distribution with the parameter values of your choice. 

Then, construct two CIs: the z-interval, where you use the true values of $\sigma$, and the t-interval, where you pretend that the true value is unknown and approximate it with the sample standard deviation. 

Then, compare the obtained intervals. IS there any difference?

Use the function *z_interval()* you've implemented before to construct CIs for the mean of the normal distribution with known variance.

To obtain sample variance, use the **unbiased** estimator of $\sigma$.

In [None]:
# Your code here

def z_interval(n, sample_mean, sigma=1, a=0.05):
  z = scipy.stats.norm.ppf(1-a/2, 0, 1)
  lower = sample_mean - sigma*z/np.sqrt(n)
  upper = sample_mean + sigma*z/np.sqrt(n)
  return lower, upper

In [None]:
# Your code here

mu = 5
sigma = 10

for n in [20, 100, 1000, 10000]:
  print('\nn =', n)
  for a in [0.1, 0.05, 0.001]:

    print('alpha = ', a)

    data = np.random.normal(mu, sigma, n)
    sample_mean = np.mean(data)

    s2 = np.var(data, ddof=1)
    sample_std = np.sqrt(s2)

    l_z, u_z = z_interval(n, sample_mean, sigma, a)
    l_t, u_t = t_interval(n, sample_mean, sample_std, a)

    print('z-interval size: ', np.round(2*(u_z-l_z), 5), 
          't-interval size: ', np.round(2*(u_t-l_t), 5))


n = 20
alpha =  0.1
z-interval size:  14.71202 t-interval size:  16.45442
alpha =  0.05
z-interval size:  17.53045 t-interval size:  20.12682
alpha =  0.001
z-interval size:  29.43137 t-interval size:  36.4966

n = 100
alpha =  0.1
z-interval size:  6.57941 t-interval size:  6.60738
alpha =  0.05
z-interval size:  7.83986 t-interval size:  7.95527
alpha =  0.001
z-interval size:  13.16211 t-interval size:  13.23689

n = 1000
alpha =  0.1
z-interval size:  2.08059 t-interval size:  2.16043
alpha =  0.05
z-interval size:  2.47918 t-interval size:  2.46017
alpha =  0.001
z-interval size:  4.16222 t-interval size:  4.21464

n = 10000
alpha =  0.1
z-interval size:  0.65794 t-interval size:  0.65931
alpha =  0.05
z-interval size:  0.78399 t-interval size:  0.79135
alpha =  0.001
z-interval size:  1.31621 t-interval size:  1.33265
