## Interval Estimation Demo for Population Variance

Lets assume we have a population that is normal distributed with mean of $\mu=3$ and standard deviation of $\sigma=5$.
$$
X \sim {\cal N}(3, 25)
$$

Lets take a sample of size 100 from this population.

In [1]:
import pandas as pd
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

# sample size
N = 100

# population mean and standard deviation
mu = 3
std = 5

sample = norm.rvs(loc = mu, scale=std, size=N)

Lets write a function that computes a confidence interval for population variance. 

This function would return lower and upper limit of the interval.

$(1-\alpha)$% CI is $[\frac{(N-1)s^2}{\chi_{N-1, 1-\alpha/2}}, \frac{(N-1)s^2}{\chi_{N-1,\alpha/2}}]$

In [2]:
from scipy.stats import chi2

def confidence_interval(data, alpha):
    s = np.std(data, ddof=1)
    N = np.shape(data)[0]

    L = (N-1)*s**2/chi2.ppf(1-0.5*alpha, N-1)
    H = (N-1)*s**2/chi2.ppf(0.5*alpha, N-1)
    
    return L, H

# test
alpha = 0.1
L, H = confidence_interval(sample, alpha)
print(f"{1-alpha}% CI: [{L}, {H}]")
 

0.9% CI: [22.39514784492078, 35.81801996168978]


Lets generate 1.000 samples and compute their 95% CI intervals for population variance. Lets see how many of these intervals include real population variance of 25.

In [3]:
variance_in_interval = list()

# sample size
N = 100

# population mean and standard deviation
mu = 3
std = 5

# significance level 
alpha = 0.05

# number of repeats
M = 1000

for i in range(M):
    # form  sample
    sample = norm.rvs(loc = mu, scale=std, size=N) 

    L, H = confidence_interval(sample, alpha)

    if L <= std**2 <= H:
        variance_in_interval.append(1)
    else:
        variance_in_interval.append(0)

print(f"Proportion of catches: {np.sum(variance_in_interval)/M}")
print(f"This value should be close to {1-alpha}")


Proportion of catches: 0.946
This value should be close to 0.95
