# 5.3.4 The Bootstrap

In [1]:
library(ISLR2)
library(boot)

## Estimating the Accuracy of a Statistic of Interest

We need to set up a function, `alpha.fn()`, which takes as intput the $(X,Y)$ data as well as a vector indicating which observations should be used to estimate $\alpha$. The function then outputs the estimate for $\alpha$ based on the selected observations.

In [2]:
alpha.fn <- function(data, index) {
    X <- data$X[index]
    Y <- data$Y[index]
    (var(Y) - cov(X, Y)) / (var(X) + var(Y) - 2 * cov(X, Y))
}

This function returns or outputs an estimate for $\alpha$ based on applying (5.7) to the observations indexed by the argument `index`.
\begin{align}\tag{5.7}
\hat{\alpha} = \frac{ \hat{\sigma}^2_Y - \hat{\sigma}_{XY} } { \hat{\sigma}^2_X + \hat{\sigma}^2_Y - 2 \hat{\sigma}_{XY} }
\end{align}

In [3]:
alpha.fn(Portfolio, 1:100)

The next command uses the `sample()` function to randomly select 100 observations from the range of 1 to 100, with replacement. This is equivalent to constructing a new bootstrap data set and recomputing $\hat{\alpha}$ based on the new data set.

In [4]:
set.seed(7)
alpha.fn(Portfolio, sample(100, 100, replace = T))

The `boot()` function will allow us to perform a boostrap analysis many times and record all of the corresponding estimates for $\alpha$, and computing the resulting standard deviation.

In [5]:
boot(Portfolio, alpha.fn, R = 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Portfolio, statistic = alpha.fn, R = 1000)


Bootstrap Statistics :
     original       bias    std. error
t1* 0.5758321 0.0007959475  0.08969074

The final output shows that using the original data, $\hat{\alpha} = 0.5758$, and that the bootstrap estimate for $\text{SE}(\hat{\alpha})$ is $0.0897$.

## Estimating the Accuracy of a Linear Regression Model

We need to create the `boot.fn()` function. This takes the `Auto` data set as well as a set of indices for the observations, and returns the intercept and slope estimates for the linear regression model.

In [6]:
boot.fn <- function(data, index)
    coef(lm(mpg ~ horsepower, data = data, subset = index))

In [7]:
boot.fn(Auto, 1:392)

The `boot.fn()` function can be used to create bootstrap estimates for the intercept and slope terms by randomly sampling from among the observations with replacement.

In [8]:
set.seed(1)
boot.fn(Auto, sample(392, 392, replace = T))

In [9]:
boot.fn(Auto, sample(392, 392, replace = T))

In [10]:
boot(Auto, boot.fn, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
      original        bias    std. error
t1* 39.9358610  0.0544513229 0.841289790
t2* -0.1578447 -0.0006170901 0.007343073

We can calculate the coeficients using the standard formulas as well.

In [11]:
summary(lm(mpg ~ horsepower, data = Auto))$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),39.935861,0.717498656,55.65984,1.2203619999999999e-187
horsepower,-0.1578447,0.006445501,-24.48914,7.031989000000001e-81


Note that there are differences here in the standard error. However, this does not mean that there is a problem with the bootstrap. See page 218 for further explaination.

We will now compute the bootstrap standard error estimates and the standard linear regression estimates that result from fitting the quadratic model to the data.

In [12]:
boot.fn <- function(data, index)
    coef(
        lm(mpg ~ horsepower + I(horsepower^2),
           data = data, subset = index)
    )

In [13]:
set.seed(1)
boot(Auto, boot.fn, 1000)


ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = Auto, statistic = boot.fn, R = 1000)


Bootstrap Statistics :
        original        bias     std. error
t1* 56.900099702  3.511640e-02 2.0300222526
t2* -0.466189630 -7.080834e-04 0.0324241984
t3*  0.001230536  2.840324e-06 0.0001172164

In [14]:
summary(
    lm(mpg ~ horsepower + I(horsepower^2), data = Auto)
)$coef

Unnamed: 0,Estimate,Std. Error,t value,Pr(>|t|)
(Intercept),56.900099702,1.8004268063,31.60367,1.740911e-109
horsepower,-0.46618963,0.0311246171,-14.97816,2.289429e-40
I(horsepower^2),0.001230536,0.0001220759,10.08009,2.19634e-21
