## Interval Estimation Demo

Lets assume we have a population that is normal distributed with mean of $\mu=3$ and standard deviation of $\sigma=5$.
$$
X \sim {\cal N}(3, 25)
$$

Lets take a sample of size 100 from this population.

In [1]:
import pandas as pd
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

# sample size
N = 100

# population mean and standard deviation
mu = 3
std = 5

sample = norm.rvs(loc = mu, scale=std, size=N)

Lets assume that we know the population standard deviation.

Lets write a function that computes a confidence interval. This function would return lower and upper limit of the interval.

$(1-\alpha)$% CI is $[\overline{x}-z_{\alpha/2}\frac{\sigma}{\sqrt{N}}, \overline{x}+z_{\alpha/2}\frac{\sigma}{\sqrt{N}}]$

In [16]:
def confidence_interval(data, alpha, std):
    xbar = np.mean(data)
    np.std(data, ddof=1)
    N = np.shape(data)[0]

    if std == None:
        # if population standard deviation is unknown pass None
        pass
    elif std <= 0:
        raise Exception("Standard deviation should be nonnegative")
    else:
        # population deviation is known and has a valid value
        z_half_alpha = norm.ppf(1-0.5*alpha)
        L = xbar - z_half_alpha*std/np.sqrt(N)
        H = xbar + z_half_alpha*std/np.sqrt(N)
        return L, H

# test
alpha = 0.1
L, H = confidence_interval(sample, alpha, 5)
print(f"{1-alpha}% CI: [{L}, {H}]")

alpha = 0.05
L, H = confidence_interval(sample, alpha, 5)
print(f"{1-alpha}% CI: [{L}, {H}]")

alpha = 0.01
L, H = confidence_interval(sample, alpha, 5)
print(f"{1-alpha}% CI: [{L}, {H}]")   

0.9% CI: [1.4234079590333581, 3.0682615859848306]
0.95% CI: [1.2658527802390673, 3.2258167647791214]
0.99% CI: [0.9579201207346442, 3.5337494242835445]


Lets assume that population standard deviation is unknown.

Lets revise the function that computes a confidence interval without population standard deviation. This function returns lower and upper limit of the interval.

$(1-\alpha)$% CI is $[\overline{x}-t_{\alpha/2}\frac{s}{\sqrt{N}}, \overline{x}+t_{\alpha/2}\frac{s}{\sqrt{N}}]$

In [18]:
from scipy.stats import t

def confidence_interval(data, alpha, std):
    xbar = np.mean(data)
    s = np.std(data, ddof=1)
    N = np.shape(data)[0]

    if std == None:
        # if population standard deviation is unknown pass None
        t_half_alpha = t.ppf(1-0.5*alpha, N-1)
        L = xbar - t_half_alpha*s/np.sqrt(N)
        H = xbar + t_half_alpha*s/np.sqrt(N)
        return L, H
    elif std <= 0:
        raise Exception("Standard deviation should be nonnegative")
    else:
        # population deviation is known and has a valid value
        z_half_alpha = norm.ppf(1-0.5*alpha)
        L = xbar - z_half_alpha*std/np.sqrt(N)
        H = xbar + z_half_alpha*std/np.sqrt(N)
        return L, H

# test
alpha = 0.1
L, H = confidence_interval(sample, alpha, None)
print(f"{1-alpha}% CI: [{L}, {H}]")

alpha = 0.05
L, H = confidence_interval(sample, alpha, None)
print(f"{1-alpha}% CI: [{L}, {H}]")

alpha = 0.01
L, H = confidence_interval(sample, alpha, None)
print(f"{1-alpha}% CI: [{L}, {H}]")   

0.9% CI: [1.3982241194324452, 3.0934454255857435]
0.95% CI: [1.2329147626426422, 3.2587547823755463]
0.99% CI: [0.905084891921581, 3.5865846530966077]


Now consider 95% CI with unknown population standard deviation. 

Lets generate 1.000 samples and compute their 95% CI intervals. Lets see how many of these intervals include real population mean 3.

In [23]:
mean_in_interval = list()

# sample size
N = 100

# population mean and standard deviation
mu = 3
std = 5

# significance level 
alpha = 0.05

# number of repeats
M = 1000

for i in range(M):
    # form  sample
    sample = norm.rvs(loc = mu, scale=std, size=N) 

    L, H = confidence_interval(sample, alpha, None)

    if L <= mu <= H:
        mean_in_interval.append(1)
    else:
        mean_in_interval.append(0)

print(f"Proportion of catches: {np.sum(mean_in_interval)/M}")
print(f"This value should be close to {1-alpha}")


Proportion of catches: 0.951
This value should be close to 0.95
