# Chapter 5, Question 9: Resampling Methods (Group 3)
## Importing required packages

In [1]:
import numpy as np
import pandas as pd

## Loading the Data

We begin by loading the Boston housing data set from the `ISLP` library and extracting the variable of interest, `medv`, which represents the median value of owner-occupied homes (in \$1000s). Let \( n \) denote the number of observations.

The goal throughout this problem is to estimate population quantities related to `medv` and assess their uncertainty.

---



In [3]:
# Load the Boston dataset
Boston = pd.read_csv("Boston.csv")
medv = Boston["medv"]
n = len(medv)


## a. Estimating the Population Mean of `medv`

To estimate the population mean of `medv`, we use the **sample mean**:

$\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} \mathrm{medv}_i$\]


This estimator is unbiased and is the standard choice for estimating the population mean.


In [17]:
mu_hat = medv.mean()
round(mu_hat, 2)


22.53

## b. Estimating the Standard Error of the Mean

The standard error of the sample mean measures how much the estimate \( \hat{\mu} \) would vary across repeated samples. It is given by:

$\mathrm{SE}(\hat{\mu}) = \dfrac{s}{\sqrt{n}}$

where \( s \) is the sample standard deviation of `medv`.

**Interpretation:**  
A small standard error indicates that the sample mean is a precise estimate of the population mean.

---



In [18]:
se_mu_hat = medv.std(ddof=1) / np.sqrt(n)
round(se_mu_hat,2)


0.41

## c. Bootstrap Estimate of the Standard Error of the Mean

To validate the analytic standard error, we use the **bootstrap** procedure:

1. Repeatedly resample the data with replacement.
2. Compute the sample mean for each bootstrap sample.
3. Estimate the standard error as the standard deviation of the bootstrap means.

The bootstrap estimate closely matches the analytic estimate from Part (b).

---



In [19]:
rng = np.random.default_rng(0)
B = 10000

boot_means = [
    rng.choice(medv, size=n, replace=True).mean()
    for _ in range(B)
]

boot_se_mean = np.std(boot_means, ddof=1)
round(boot_se_mean,2)


0.41

## d. 95% Confidence Interval for the Mean

Using the bootstrap standard error, we construct an approximate 95% confidence interval using the **twoâ€“standard error rule**:


$[\hat{\mu} - 2\,\mathrm{SE}(\hat{\mu}),\ \hat{\mu} + 2\,\mathrm{SE}(\hat{\mu})]$

This interval is very similar to the one obtained using the sample standard deviation, indicating consistency between the two approaches.

---



In [21]:
ci_boot = (
    mu_hat - 2 * boot_se_mean,
    mu_hat + 2 * boot_se_mean
)

ci_formula = (
    mu_hat - 2 * se_mu_hat,
    mu_hat + 2 * se_mu_hat
)

ci_boot_rounded = tuple(round(x, 2) for x in ci_boot)
ci_formula_rounded = tuple(round(x, 2) for x in ci_formula)

ci_boot_rounded, ci_formula_rounded



((21.71, 23.36), (21.72, 23.35))

## e. Estimating the Population Median of `medv`

We estimate the population median of `medv` using the **sample median**, denoted by:

$\hat{\mu}_{\text{med}}$

The median is a robust measure of central tendency and is less sensitive to extreme values than the mean.

---



In [11]:
mu_med_hat = medv.median()
mu_med_hat


21.2

## f. Bootstrap Estimate of the Standard Error of the Median

There is no simple analytic formula for the standard error of the median, so we again use the bootstrap:

1. Generate bootstrap samples.
2. Compute the median for each sample.
3. Estimate the standard error as the standard deviation of the bootstrap medians.

This provides a reliable measure of the uncertainty in the median estimate.

---



In [22]:
boot_medians = [
    np.median(rng.choice(medv, size=n, replace=True))
    for _ in range(B)
]

boot_se_median = np.std(boot_medians, ddof=1)
round(boot_se_median,2)


0.38

## g. Estimating the 10th Percentile of `medv`

We estimate the 10th percentile of `medv`, denoted \( \hat{\mu}_{0.1} \), using the sample percentile. This quantity reflects lower-end housing values in the Boston area.

---



In [12]:
mu_01_hat = np.percentile(medv, 10)
mu_01_hat


12.75

## h. Bootstrap Estimate of the Standard Error of the 10th Percentile

To estimate the variability of the 10th percentile, we apply the bootstrap:

1. Resample the data with replacement.
2. Compute the 10th percentile for each bootstrap sample.
3. Compute the standard deviation of these bootstrap percentiles.

The standard error is larger than that of the mean or median, reflecting the greater instability of estimating extreme quantiles.


In [23]:
boot_p10 = [
    np.percentile(rng.choice(medv, size=n, replace=True), 10)
    for _ in range(B)
]

boot_se_p10 = np.std(boot_p10, ddof=1)
round(boot_se_p10,2)


0.5

## Summary

- The sample mean provides a precise estimate of the population mean.
- Bootstrap and analytic standard errors for the mean are nearly identical.
- The median and lower percentiles require the bootstrap to estimate uncertainty.
- Estimating tail quantities such as the 10th percentile is inherently more variable.