# z-INTERVAL: CI for the normal distribution, $\sigma$ known

In this notebook, you'll practice contructing the most basic confidence intervals - those for the parameter $\mu$ of the Normal distribution assuming that the variance $\sigma^2$ is known and therefore shouldn't be estimated from the data.

As you saw already, a $(1-\alpha)-$ confidence interval for the unknown parameter $\mu$ of a Normal distribution with **known** variance $\sigma^2$ can be constructed according to the following formula:

$(\bar{X} - \frac{\sigma}{\sqrt{n}}z_{1-\alpha/2} \ ; \ \bar{X} + \frac{\sigma}{\sqrt{n}}z_{1-\alpha/2}),$

where $z_{1-\alpha/2}$ denotes $(1-\alpha/2)-$quantile of the Standard Normal Distribution. 
 
Indeed,

$P(\bar{X} - \frac{\sigma}{\sqrt{n}}z_{1-\alpha/2} < \mu < \bar{X} + \frac{\sigma}{\sqrt{n}}z_{1-\alpha/2}) = 1 - \alpha$.



In [None]:
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import scipy.stats

Define a function *z_interval()* that constructs such an interval (i.e., computes and returns its lower and upper bounds) given the current sample szie $n$, sample mean $\bar{X}$, value of $\sigma$ and the confidence level $\alpha$.

To compute the quantiles, use [scipy.stats.norm.ppf()](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html).

In [None]:
# Your code here

def z_interval(n, sample_mean, sigma=1, a=0.05):
  z = scipy.stats.norm.ppf(1-a/2, 0, 1)
  lower = sample_mean - sigma*z/np.sqrt(n)
  upper = sample_mean + sigma*z/np.sqrt(n)
  return lower, upper

Check your function: for the example solved before, where $n = 100, \ \bar{X} = 5$ and $\sigma = 1$, a 0.95 - confidence interval for the unknown parameter $\mu$ is approximately $(4.804, 5.196)$.

In [None]:
n = 100
sample_mean = 5
s = 1

a = 0.05

z_interval(n, sample_mean, s, a)

(4.804003601545995, 5.195996398454005)

## The effect of $\alpha$

Construct CI for the same data, but now with different values of confidence, e.g., $\alpha = 0.1$ and $\alpha = 0.01$. 

How does it influence the width of the intervals - do they get narrower or wider? Why?

In [None]:
# Your code here 

n = 100
sample_mean = 5
s = 1

for a in [0.1, 0.05, 0.01]:
  l, u = z_interval(n, sample_mean, s, a)
  print('Confidence: ', 1-a, '\tsize of the interval: ', np.round(2*(u-l), 3))

Confidence:  0.9 	size of the interval:  0.658
Confidence:  0.95 	size of the interval:  0.784
Confidence:  0.99 	size of the interval:  1.03


## The effect of $n$

Now, let's keep the confidence level $\alpha = 0.05$, but alter the sample size. What happens to the CI when it's constructed based on less or more samples? Why?

In [None]:
sample_mean = 5
s = 1
a = 0.05

for n in [10, 50, 100, 500, 1000]:
  l, u = z_interval(n, sample_mean, s, a)
  print('n:\t', n, '\tsize of the interval: ', np.round(2*(u-l), 3))

n:	 10 	size of the interval:  2.479
n:	 50 	size of the interval:  1.109
n:	 100 	size of the interval:  0.784
n:	 500 	size of the interval:  0.351
n:	 1000 	size of the interval:  0.248


## Empirical confidence

In theory, a confidence level $\alpha = 0.05$ means that if we construct the interval many times, the true values of the parameter won't be covered by the interval only 5% of the times.

Check this. Sample from the Normal distribution with some values of the parameters several times. Based on each sample, construct confidence intervals for $\mu$ and record how many times the true mean value of the parameter isn't covered by it.

Note: you should repeat sampling sufficient number of times, and your samples should be large enough. Try, for example, samplig $n = 1000$ values at the time, and repeat the process $N = 10000$ times.

In [None]:
N = 10000
n = 1000
mu = 5
sigma = 1
count = 0
for i in range(N):
  data = np.random.normal(5, 1, 1000)
  sample_mean = np.mean(data)
  lower, upper = z_interval(n, sample_mean=sample_mean, sigma=sigma, a=0.05)
  if ((lower > mu) | (upper < mu)):
    count += 1

What is the % of times that the interval fails to cover the true value of the parameter $\mu$? How is it related to the confidence of the confidence interval $\alpha$?

In [None]:
count/N

0.0537