# Confidence Intervals

Confidence intervals are another important concept of inferential statistics. Since the interpretation of confidence intervals is straightforward, they are often useful to express the certainty that we have about some numbers. In simple words, the confidence interval refers to the probability that a population paramater will fall between two set values. In this notebook we will demonstrate the general idea behind confidence intervals using two common examples.

In [10]:
import numpy as np
from scipy import stats

## Confidence Interval of a Sample Mean

The first use case is to estimate the confidence interval of some sample mean. This is common in hypothesis testing where in most cases we will compare the sample means of two groups (e.g. the conversion rate when testing a new landing page). In the following we will prove that the 95% confidence interval that we calculate from some sample of size `N`, will contain the true population mean `mu` with 95% proabability. Therefore, we will repeat the experiment thousands of times and measure how often the observed confidence interval contains the true paramater.

In [46]:
def calculate_confidence_interval_of_sample_mean(sample, alpha):
    """Calculates lower and upper bound of confidence interval of the sample mean"""
    
    N = len(sample)
    sample_mean = sample.mean()
    sample_std = sample.std(ddof=1) # one degree of freedom
    z_value = stats.norm.ppf(1-alpha/2)
    ci_lb = sample_mean - z_value * sample_std/np.sqrt(N)
    ci_ub = sample_mean + z_value * sample_std/np.sqrt(N)
    
    return ci_lb, ci_ub


mu = 1000
sigma = 50
N = 50

ci_contains_population_mean = []

for _ in range(1_000_000):
    
    # Draw random sample from population
    sample = np.random.normal(mu, sigma, N)
    
    # Calculate confidence interval
    ci_lb, ci_ub = calculate_confidence_interval_of_sample_mean(sample, 0.05)
    
    # Check if confidence interval contains population mean
    ci_contains_population_mean.append(ci_lb <= mu <= ci_ub)

print(np.mean(ci_contains_population_mean))

0.944587


## Confidence Interval of a Regression Coefficient

tbd