All instructions are provided for R. I am going to reproduce them in Python as best as I can.

# Preface

From the textbook, p. 263:
> In this exercise, we will predict the number of applications received
using the other variables in the College data set.

In [1]:
import numpy as np
import pandas as pd
from sklearn.cross_decomposition import PLSRegression
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler

In [2]:
college = pd.read_csv('https://www.statlearning.com/s/College.csv')
college = college.replace({'Yes' : 1, 'No' : 0})
x = college.drop(['Apps', 'Unnamed: 0'], axis='columns')
y = college.Apps

scaler = StandardScaler().fit(x)
x = scaler.transform(x)

# (a)

From the textbook, p. 263:
> Split the data set into a training set and a test set.

In [3]:
np.random.seed(1)
x_train, x_test, y_train, y_test = train_test_split(x, y)

# (b)

From the textbook, p. 263:
> Fit a linear model using least squares on the training set, and report the test error obtained.

In [4]:
results = []

model_b = LinearRegression()
model_b.fit(x_train, y_train)
r2 = model_b.score(x_test, y_test)
results.append(['OLS', r2])
print(r2)

0.9504599702986559


For ordinary least squares regression, $R^2 = 0.950$ on the test data.

# (c)

From the textbook, p. 263:
> Fit a ridge regression model on the training set, with $\lambda$ chosen by cross-validation. Report the test error obtained.

In [5]:
def powspace(start, stop, power, num):
  start = np.power(start, 1/float(power))
  stop = np.power(stop, 1/float(power))
  return np.power( np.linspace(start, stop, num=num), power)

reg_params = powspace(start=0.01, stop=100, power=1.1, num=200)

model_c = RidgeCV(alphas=reg_params, cv=5)
model_c.fit(x_train, y_train)
r2 = model_c.score(x_test, y_test)
results.append(['Ridge', r2])
print(r2)

0.9490946730672606


For Ridge regression, $R^2 = 0.949$ on the test data.

# (d)

From the textbook, p. 263:
> Fit a lasso model on the training set, with $\lambda$ chosen by cross-validation. Report the test error obtained, along with the number of non-zero coefficient estimates.

In [6]:
model_d = LassoCV(max_iter=30000)
model_d.fit(x_train, y_train)
r2 = model_d.score(x_test, y_test)
results.append(['Lasso', r2])
print(r2)
print(model_d.coef_)

0.9446890241597852
[-1.02272457e+02  3.54768484e+03 -5.32229899e+01  6.25902825e+02
 -8.54585779e+01 -0.00000000e+00  0.00000000e+00 -2.91923555e+02
  1.66184100e+02  0.00000000e+00  0.00000000e+00 -9.39736030e+01
 -3.85823337e+01  2.78270724e+01 -1.31335536e+00  3.92128200e+02
  9.11206509e+01]


For Lasso regression, $R^2 = 0.945$ on the test data. There are 4 coefficients that Lasso has set to zero; those correspond to `F.Undergrad`, `P.Undergrad`, `Books`, `Personal`.

# (e)

From the textbook, p. 263:
> Fit a PCR model on the training set, with $M$ chosen by cross-validation. Report the test error obtained, along with the value of $M$ selected by cross-validation.

In [7]:
params = {'pca__n_components' : np.arange(1, 18)}
pcr = make_pipeline(PCA(), LinearRegression())
grid_search = GridSearchCV(pcr, params)
grid_search.fit(x_train, y_train)
r2 = grid_search.score(x_test, y_test)
results.append(['PCR', r2])
print('Number of components selected:'
      , grid_search.best_estimator_.get_params()['pca__n_components']
)
print(f'R^2 = {r2}')

Number of components selected: 17
R^2 = 0.9504599702986559


For PCR, $R^2 = 0.950$ on the test data. The best number of principal components is 17 &mdash; that's all of them. This is unsurprising, given that ordinary least squares regression got the highest score (the same as PCR).

# (f)

From the textbook, p. 263:
> Fit a PLS model on the training set, with $M$ chosen by cross-validation. Report the test error obtained, along with the value of $M$ selected by cross-validation.

In [8]:
params = {'n_components' : np.arange(1, 18)}
pls = PLSRegression()
grid_search = GridSearchCV(pls, params)
grid_search.fit(x_train, y_train)
r2 = grid_search.score(x_test, y_test)
results.append(['PLS', r2])
print('Number of components selected:'
      , grid_search.best_estimator_.get_params()['n_components']
)
print(f'R^2 = {r2}')

Number of components selected: 17
R^2 = 0.9504599702986559


The same result as with PCR: the selected number of components is 17 (all) and $R^2 = 0.950$.

# (g)

From the textbook, p. 263:
> Comment on the results obtained. How accurately can we predict the number of college applications received? Is there much difference among the test errors resulting from these five approaches?

In [9]:
pd.DataFrame(results, columns=['method', 'R^2']).sort_values('R^2', ascending=False)

Unnamed: 0,method,R^2
0,OLS,0.95046
3,PCR,0.95046
4,PLS,0.95046
1,Ridge,0.949095
2,Lasso,0.944689


The results are quite similar. The main factor influencing $R^2$ that I noticed is the change in the random seed before the train-test split.