# Topic 08 : Confidence Intervals and p-values

A 95% confidence interval is a random interval that contains the parameter being estimated 95% of the time. In other words, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

While we briefly discussed confidence intervals and p-values before in "Basics 04", I want to revisit this topic again and quickly make clear of the following:

1. How to construct a two-sided confidence interval for an A/B Test
2. How to use the confidence interval with the p-value

# Constructing a Confidence Interval

We can use a two-sample Welch's $t$-test to construct the Confidence Interval. As mentioned everywhere ([Deng and Shi, 2016](https://www.kdd.org/kdd2016/papers/files/adf0853-dengA.pdf) as an example), from a practical perspective, Data Scientists often think of the z-test and t-test as the same because in many cases, the sample size is large enough that $t$-statistic is normally distributed.

The $(1-\alpha)\cdot 100\%$ confidence interval is ([Source](https://online.stat.psu.edu/stat415/lesson/3/3.2)):

$$\bar X_t-\bar X_c\pm t_{\alpha/2,r}\sqrt{\frac{s_t^2}{n_t}+\frac{s_c^2}{n_c}}$$

the degrees of freedom $r$ can be approximated by:

$$r=\frac{\left(\frac{s_t^2}{n_t}+\frac{s_c^2}{n_c}\right)^2}{\frac{(s_t^2/n)^2}{n_t-1}+\frac{(s_c^2/n_c)^2}{n_c-1}}$$





Let's test this out.

We will first construct a data generator that generates data from a normal distribution for both control and treatment.

We will then create a function that performs a statistical test and outputs the confidence interval.

Then, we will run the test a bunch of times to see what % of the time the confidence interval contains the true difference in means.

In [23]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def data_generator_normal(n, mu1, mu2, sigma1, sigma2):
  control = np.random.normal(mu1, sigma1, n)
  treatment = np.random.normal(mu2, sigma2, n)
  return control, treatment

def confidence_interval(control, treatment, alpha=0.05):
  """
  Calculates the confidence itnerval for the difference between two independent samples

  Args:
  - control: A numpy array containing the data for the first sample
  - treatment: A numpy array containing the data for the second sample

  Returns:
  - A tuple containing the lower and upper bounds of the confidence interval
  """
  mean_c, mean_t = np.mean(control), np.mean(treatment)
  var_c, var_t = np.var(control, ddof=1), np.var(treatment, ddof=1)
  n_c, n_t = len(control), len(treatment)

  # calculate pooled variance
  pooled_var = var_t/n_t + var_c/n_c

  # degrees of freedom
  df = (pooled_var**2) / ((var_t/n_t)**2 / (n_t-1) + (var_c/n_c)**2 / (n_c-1))

  # critical value
  t_critical = stats.t.ppf(1-alpha/2, df)

  # interval
  lower = mean_t - mean_c - t_critical * np.sqrt(pooled_var)
  upper = mean_t - mean_c + t_critical * np.sqrt(pooled_var)

  return lower, upper

def contains_parameter(lower, upper, param=0):
  return lower <= param and upper >= param


Generate some data

In [10]:
control, treatment = data_generator_normal(1000, 0, 1, 1, 1)

In [13]:
lower, upper = confidence_interval(control, treatment)
print(lower, upper)

0.9699110972204522 1.1458566533045635


Let's actually check this formula using a built-in package

In [17]:
from scipy.stats import ttest_ind

# Run a Welch's t-test
t_statistic, p_value = ttest_ind(treatment, control, equal_var=False)

# Compute the confidence interval using the package
confidence_interval = ttest_ind(treatment, control, equal_var=False).confidence_interval()

# Print the results
print("Welch's t-test:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)
print("Confidence interval:", confidence_interval)


Welch's t-test:
t-statistic: 23.583118259547682
p-value: 1.117240533353608e-108
Confidence interval: ConfidenceInterval(low=0.9699110972204522, high=1.1458566533045635)


We got exactly the same result. Now, let's run this a bunch of times and see how often our 95% confidence intervals include the true parameter. It should be about 95%.

In [29]:
list_of_contains_true_param = []

for _ in range(1000):
  control, treatment = data_generator_normal(10000, 0, 1, 1, 1)
  lower, upper = confidence_interval(control, treatment)
  if contains_parameter(lower, upper, 1):
    list_of_contains_true_param.append(1)
  else:
    list_of_contains_true_param.append(0)

print(round(np.mean(list_of_contains_true_param)*100,4),'% of confidence intervals included the true parameter')

95.0 % of confidence intervals included the true parameter


# Why the Confidence Interval is Important

The reason we are discussing Confidence Intervals even though we have p-values is that knowing the p-value does not provide information about the precision of the point estimate. You can have a small p-value indicating statistical significance even if the confidence interval is wide, which suggests high uncertainty in the estimate. If your goal is to understand the precision of the point estimate, you need a confidence interval.