## Introduction:
In this assignment I will empirically verify the concept of confidence intervals. A confidence interval is a range of values in which we are confident the true value of a parameter lies in.

Specifically, I will be using a Normal distribution with mean $\mu = 3$ and variance $\sigma^2 = 25$, i.e. $N(3,25)$.

The theoretical $(1-\alpha)$% confidence interval for a $N(\mu,\sigma^2)$ distribution for the parameter $\mu$ is $\mu \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}$ where n is the number of observations in the sample. Furthermore the theoretical $(1-\alpha)$% confidence interval for the parameter $\sigma^2$ is $[\frac{(n-1)\sigma^2}{\chi^2_{1-\alpha/2}},\frac{(n-1)\sigma^2}{\chi^2_{\alpha/2}}]$ where $\chi^2$ has n-1 degrees of freedom.

I will generate 100 random samples of 1000 observations (n=1000) in order to see how many times the confidence bound is breached compared to 95%, 90%, and 80% confidence.

In [None]:
import numpy as np
n= 1000
s = 100
# creating an array with 100 samples with 1000 observations each of a normal distribution
# with mean 3 and standard deviation 5
np.random.seed(834920347)
samples = np.random.normal(3,5,size =(s,n))
print("shape of samples:", samples.shape)

In [None]:
samples

In [None]:
# find mean and variance of each sample
mean = []
var = []
for i in range(0,100):
    mean += [np.mean(samples[i])]
    var += [np.var(samples[i])]

In [None]:
print("Length of list of sample means =",len(mean))
print("Length of list of sample variance =",len(var))

In [None]:
mean[:10] #print out first 10 sample means

In [None]:
var[:10] #print out first 10 sample variance

In [None]:
# We use this function to count the number of sample parameters that breach the CI
def count_breach_CI(param = 1, l_bound = 0, u_bound = 1):
    count = 0
    for i in range(0,len(param)):
        if param[i] < l_bound or param[i] > u_bound:
            count += 1
    return count

### 95% Confidence Interval:
First we calculate the theoretical 95% confidence interval for $\mu$ and $\sigma^2$, so $\alpha = 0.05$.
From a Normal Curve Area table we get that $z_{0.025} = 1.96$, and from a $\chi^2$ distribution table we get $\chi^2_{0.975} = 1088.49$ and $\chi^2_{0.025} = 913.3$.

In [None]:
l_bound_mean = 3 - (1.96*(5/1000**0.5))
u_bound_mean = 3 + (1.96*(5/1000**0.5))
print("Lower Bound for mean:",round(l_bound_mean,2))
print("Upper Bound for mean:",round(u_bound_mean,2))
l_bound_var = 999*25/1088.49
u_bound_var = 999*25/913.3
print("Lower Bound for variance:",round(l_bound_var,2))
print("Upper Bound for variance:",round(u_bound_var,2))

Thus, the 95% confidence interval for the mean $\mu$ of a N(3,25) with 1000 observations is [2.69 , 3.31] and for the variance $\sigma^2$ is [22.94, 27.35].

In [None]:
count95_mean = count_breach_CI(mean, l_bound_mean, u_bound_mean)
count95_var = count_breach_CI(var, l_bound_var, u_bound_var)
print("Number of sample means outside theoretical CI = ", count95_mean)
print("Number of sample variance outside theoretical CI = ", count95_var)

#### Conclusion
4 out of the 100 sample means and lay outside the theoretical 95% confidence interval, which means 96% of the sample means were within the confidence interval. Thus, 1% more sample means were inside the confidence interval than what the 95% confidence interval predicted.

2 out of the 100 sample variances lay outside of the theoretical 95% confidence interval, which means 98% of the sample variances lay within the confidence interval. Thus, 3% more sample variances were inside the confidence interval than what the 95% confidence interval predicted.

### 90% Confidence Interval:
Next we calculate the theoretical 90% confidence interval, so $\alpha = 0.1$.
From a Normal Curve Area table we get that $z_{0.05} = 1.645$, and from a $\chi^2$ distribution table we get $\chi^2_{0.95} = 1073.64$ and $\chi^2_{0.05} = 926.63$.

In [None]:
l_bound_mean = 3 - (1.645*(5/1000**0.5))
u_bound_mean = 3 + (1.645*(5/1000**0.5))
print("Lower Bound for mean:",round(l_bound_mean,2))
print("Upper Bound for mean:",round(u_bound_mean,2))
l_bound_var = 999*25/1073.64
u_bound_var = 999*25/926.63
print("Lower Bound for variance:",round(l_bound_var,2))
print("Upper Bound for variance:",round(u_bound_var,2))

Thus, the 90% confidence interval for the mean $\mu$ of a N(3,25) with 1000 observations is [2.74 , 3.26] and for the variance $\sigma^2$ is [23.26, 26.95].

In [None]:
count90_mean = count_breach_CI(mean, l_bound_mean, u_bound_mean)
count90_var = count_breach_CI(var, l_bound_var, u_bound_var)
print("Number of sample means outside theoretical CI = ", count90_mean)
print("Number of sample variance outside theoretical CI = ", count90_var)

#### Conclusion
10 out of the 100 sample means lay outside of the theoretical 90% confidence interval, which means 90% of the sample means were within the confidence interval. This is exactly the same as what is predicted by the 90% confidence interval.

7 out of the 100 sample variances lay outside of the theoretical 90% confidence interval, which means 93% of sample variances were within the confidence interval. Thus, 3% more sample variances were inside the confidence interval than what the 90% confidence interval predicted.

### 80% Confidence Interval:
Last we calculate the theoretical 80% confidence interval, so $\alpha = 0.2$.
From a Normal Curve Area table we get that $z_{0.1} = 1.282$, and from a $\chi^2$ distribution table we get $\chi^2_{0.90} = 1056.695$ and $\chi^2_{0.10} = 942.16$.

In [None]:
l_bound_mean = 3 - (1.282*(5/1000**0.5))
u_bound_mean = 3 + (1.282*(5/1000**0.5))
print("Lower Bound for mean:",round(l_bound_mean,2))
print("Upper Bound for mean:",round(u_bound_mean,2))
l_bound_var = 999*25/1056.695
u_bound_var = 999*25/942.16
print("Lower Bound for variance:",round(l_bound_var,2))
print("Upper Bound for variance:",round(u_bound_var,2))

Thus, the 80% confidence interval for the mean $\mu$ of a N(3,25) with 1000 observations is [2.8 , 3.2] and for the variance $\sigma^2$ is [23.64, 26.51].

In [None]:
count90_mean = count_breach_CI(mean, l_bound_mean, u_bound_mean)
count90_var = count_breach_CI(var, l_bound_var, u_bound_var)
print("Number of sample means outside theoretical CI = ", count90_mean)
print("Number of sample variance outside theoretical CI = ", count90_var)

#### Conclusion
20 out of the 100 sample means and lay outside of the theoretical 80% confidence interval, which means 80% of the sample means were within the confidence interval. This is exactly the same as what is predicted by the 80% confidence interval.

14 out of the 100 sample variances lay outside of the theoretical 80% confidence interval, which means 86% of the sample variances were within the confidence interval. Thus, 6% more sample variances were inside the confidence interval than what the 80% confidence interval predicted.