# DATASCI 503, Homework 6: Splines and Smoothing

This assignment covers **regression splines** (piecewise polynomials joined at knots), **smoothing splines** (which balance fit and smoothness via a penalty term), and **generalized additive models (GAMs)** that extend these ideas to multiple predictors.

---

**Problem 1 (ISLP Ch 7, Exercise 1):** Piecewise Cubic Functions

It was mentioned in this chapter that a cubic regression spline with one knot at $\xi$ can be obtained using a basis of the form $x$, $x^2$, $x^3$, $(x - \xi)^3_+$, where $(x - \xi)^3_+ = (x - \xi)^3$ if $x > \xi$ and equals 0 otherwise. We will now show that a function of the form

$$f(x) = \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)^3_+$$

is indeed a cubic regression spline, regardless of the values of $\beta_0$, $\beta_1$, $\beta_2$, $\beta_3$, $\beta_4$.

**(a)** Find a cubic polynomial

$$f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3$$

such that $f(x) = f_1(x)$ for all $x \leq \xi$. Express $a_1$, $b_1$, $c_1$, $d_1$ in terms of $\beta_0$, $\beta_1$, $\beta_2$, $\beta_3$, $\beta_4$.

> BEGIN SOLUTION

We find that $f_1(x)=\beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3$.

In particular, we can let

* $a_1 = \beta_0$,
* $b_1 = \beta_1$,
* $c_1 = \beta_2$,
* $d_1 = \beta_3$,

and write $f_1(x)=a_1 + b_1 x + c_1 x^2 + d_1 x^3$.

> END SOLUTION

**(b)** Find a cubic polynomial

$$f_2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3$$

such that $f(x) = f_2(x)$ for all $x > \xi$. Express $a_2$, $b_2$, $c_2$, $d_2$ in terms of $\beta_0$, $\beta_1$, $\beta_2$, $\beta_3$, $\beta_4$. We have now established that $f(x)$ is a piecewise polynomial.

> BEGIN SOLUTION

We calculate that 

$$
\begin{align}
    f_2(x) &= \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)^3 \\
    &= \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x^3 - 3 \xi x^2 + 3 \xi^2 x - \xi^3) \\
    &= (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) x + (\beta_2 - 3 \beta_4 \xi) x^2 + (\beta_3 + \beta_4) x^3
\end{align}
$$

Thus we can take

* $a_2 = \beta_0 - \beta_4 \xi^3$,
* $b_2 = \beta_1 + 3 \beta_4 \xi^2$,
* $c_2 = \beta_2 - 3 \beta_4 \xi$,  and
* $d_2 = \beta_3 + \beta_4$.

Then $f_2(x)=a_2 + b_2 x + c_2 x^2 + d_2 x^3$.

> END SOLUTION

**(c)** Show that $f_1(\xi) = f_2(\xi)$. That is, $f(x)$ is continuous at $\xi$.

> BEGIN SOLUTION

$$
\begin{align}
    f_2(\xi) &= \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 + \beta_4 (\xi - \xi)^3 \\
    &= \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 = f_1(\xi)
\end{align}
$$

> END SOLUTION

**(d)** Show that $f_1'(\xi) = f_2'(\xi)$. That is, $f'(x)$ is continuous at $\xi$.

> BEGIN SOLUTION

$$
\begin{align}
    f_1'(\xi) &= \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 \\
    f_2'(\xi) &= (\beta_1 + 3 \beta_4 \xi^2) + 2(\beta_2 - 3 \beta_4 \xi) \xi + 3(\beta_3 + \beta_4) \xi^2 \\
    &= \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 = f_1'(\xi)
\end{align}
$$

> END SOLUTION

**(e)** Show that $f_1''(\xi) = f_2''(\xi)$. That is, $f''(x)$ is continuous at $\xi$.

Therefore, $f(x)$ is indeed a cubic spline.

*Hint: Parts (d) and (e) of this problem require knowledge of single-variable calculus. As a reminder, given a cubic polynomial $f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3$, the first derivative takes the form $f_1'(x) = b_1 + 2c_1 x + 3d_1 x^2$ and the second derivative takes the form $f_1''(x) = 2c_1 + 6d_1 x$.*

> BEGIN SOLUTION

$$
\begin{align}
    f_1''(\xi) &= 2 \beta_2 + 6 \beta_3 \xi \\
    f_2''(\xi) &= 2(\beta_2 - 3 \beta_4 \xi) + 6 (\beta_3 + \beta_4) \xi \\
    &= 2 \beta_2 + 6 \beta_3 \xi = f_1''(\xi)
\end{align}
$$

> END SOLUTION

---

**Problem 2 (ISLP Ch 7, Exercise 2):** Smoothing Spline Behavior

Suppose that a curve $\hat{g}$ is computed to smoothly fit a set of $n$ points using the following formula:

$$\hat{g} = \arg\min_g \sum_{i=1}^{n}(y_i - g(x_i))^2 + \lambda \int \left[g^{(m)}(x)\right]^2 dx,$$

where $g^{(m)}$ represents the $m$th derivative of $g$ (and $g^{(0)} = g$). Provide example sketches of $\hat{g}$ in each of the following scenarios.

**(a)** $\lambda = \infty$, $m = 0$

> BEGIN SOLUTION

$\hat{g}(x) = 0$. When $m=0$, the penalty is $\int g(x)^2 dx$. With $\lambda = \infty$, the penalty term dominates completely, forcing $g(x) = 0$ everywhere to minimize the integral of $g^2$.

> END SOLUTION

**(b)** $\lambda = \infty$, $m = 1$

> BEGIN SOLUTION

$\hat{g}(x) = \bar{y}$ (a horizontal line at the mean of $y$). When $m=1$, the penalty is $\int (g'(x))^2 dx$. With $\lambda = \infty$, the penalty forces $g'(x) = 0$ everywhere, meaning $g$ must be constant. The constant that minimizes the RSS is the mean $\bar{y}$.

> END SOLUTION

**(c)** $\lambda = \infty$, $m = 2$

> BEGIN SOLUTION

$\hat{g}(x)$ is the OLS regression line. When $m=2$, the penalty is $\int (g''(x))^2 dx$. With $\lambda = \infty$, the penalty forces $g''(x) = 0$ everywhere, meaning $g$ must be linear (at most degree 1). The linear function that minimizes RSS is the least squares regression line.

> END SOLUTION

**(d)** $\lambda = \infty$, $m = 3$

> BEGIN SOLUTION

$\hat{g}(x)$ is the best-fitting quadratic polynomial. When $m=3$, the penalty is $\int (g'''(x))^2 dx$. With $\lambda = \infty$, the penalty forces $g'''(x) = 0$ everywhere, meaning $g$ must be at most a quadratic polynomial (degree 2). The quadratic that minimizes RSS is the least squares quadratic fit.

> END SOLUTION

**(e)** $\lambda = 0$, $m = 3$

> BEGIN SOLUTION

$\hat{g}(x)$ interpolates all data points. When $\lambda = 0$, there is no penalty on the roughness of the function, so the optimization only minimizes RSS. The function that achieves RSS = 0 is one that passes through every data point exactly (interpolation).

> END SOLUTION

---

**Problem 3 (ISLP Ch 7, Exercise 3):** Sketching a Basis Function Curve

Suppose we fit a curve with basis functions $b_1(X) = X$, $b_2(X) = (X - 1)^2 I(X \geq 1)$. (Note that $I(X \geq 1)$ equals 1 for $X \geq 1$ and 0 otherwise.) We fit the linear regression model

$$Y = \beta_0 + \beta_1 b_1(X) + \beta_2 b_2(X) + \varepsilon,$$

and obtain coefficient estimates $\hat{\beta}_0 = 1$, $\hat{\beta}_1 = 1$, $\hat{\beta}_2 = -2$. Sketch the estimated curve between $X = -2$ and $X = 2$. Note the intercepts, slopes, and other relevant information.

> BEGIN SOLUTION

For $x < 1$: $(x-1)_+ = 0$, so $f(x) = 1 + x$. This is a line with slope 1.

For $x \geq 1$: $(x-1)_+ = x-1$, so $f(x) = 1 + x - 2(x-1)^2$.

**Key features:**

- **y-intercept:** $f(0) = 1 + 0 - 0 = 1$. The curve crosses the y-axis at $(0, 1)$.

- **x-intercept:** For $x < 1$: $1 + x = 0 \Rightarrow x = -1$. The curve crosses the x-axis at $(-1, 0)$.

- **Slope for $x < 1$:** $f'(x) = 1$ (constant slope of 1).

- **Value at knot:** $f(1) = 1 + 1 - 0 = 2$.

- **Slope for $x \geq 1$:** $f'(x) = 1 - 4(x-1)$. At $x = 1$, the slope is still 1 (continuous first derivative). For $x > 1$, the slope decreases and becomes negative.

- **Nature of knot:** At $x = 1$, the function is continuous and the first derivative is continuous (both equal 1 from left and right). However, the second derivative changes: $f''(x) = 0$ for $x < 1$ and $f''(x) = -4$ for $x > 1$. This is a quadratic spline knot with $C^1$ continuity.

The curve is a straight line with slope 1 for $x < 1$, then bends downward (concave down) for $x > 1$ due to the $-2(x-1)^2$ term.

> END SOLUTION

---

**Problem 4 (ISLP Ch 7, Exercise 4):** Sketching Basis Function Expansions

Suppose we fit a curve with basis functions $b_1(X) = I(0 \leq X \leq 2) - (X - 1)I(1 \leq X \leq 2)$, $b_2(X) = (X - 3)I(3 \leq X \leq 4) + I(4 < X \leq 5)$. We fit the linear regression model

$$Y = \beta_0 + \beta_1 b_1(X) + \beta_2 b_2(X) + \varepsilon,$$

and obtain coefficient estimates $\hat{\beta}_0 = 1$, $\hat{\beta}_1 = 1$, $\hat{\beta}_2 = 3$. Sketch the estimated curve between $X = -2$ and $X = 6$. Note the intercepts, slopes, and other relevant information.

> BEGIN SOLUTION

We have $\hat{Y} = 1 + b_1(X) + 3 b_2(X)$.

**Evaluating the basis functions in each region:**

| Region | $b_1(X)$ | $b_2(X)$ | $\hat{Y}$ |
|--------|----------|----------|----------|
| $X < 0$ | 0 | 0 | 1 |
| $0 \le X < 1$ | 1 | 0 | 2 |
| $1 \le X \le 2$ | $1 - (X-1) = 2-X$ | 0 | $3 - X$ |
| $2 < X < 3$ | 0 | 0 | 1 |
| $3 \le X \le 4$ | 0 | $X - 3$ | $1 + 3(X-3)$ |
| $4 < X \le 5$ | 0 | 1 | 4 |
| $X > 5$ | 0 | 0 | 1 |

**Key features of the curve:**

- **y-intercept:** At $X = 0$, $\hat{Y} = 2$.
- **For $X < 0$:** Flat at $\hat{Y} = 1$.
- **For $0 \le X < 1$:** Flat at $\hat{Y} = 2$ (slope = 0).
- **For $1 \le X \le 2$:** Linear with slope $-1$, from $(1, 2)$ to $(2, 1)$.
- **For $2 < X < 3$:** Flat at $\hat{Y} = 1$ (slope = 0).
- **For $3 \le X \le 4$:** Linear with slope $3$, from $(3, 1)$ to $(4, 4)$.
- **For $4 < X \le 5$:** Flat at $\hat{Y} = 4$ (slope = 0).
- **For $X > 5$:** Flat at $\hat{Y} = 1$.

The curve is a step function with two "ramp" sections: one descending (slope $-1$) between $X = 1$ and $X = 2$, and one ascending (slope $3$) between $X = 3$ and $X = 4$.

> END SOLUTION

---

**Problem 5:** True or False - Splines and Polynomials

For each statement, indicate whether it is True or False and briefly explain your reasoning.

**(a)** A cubic regression spline with 3 knots has 7 degrees of freedom.

> BEGIN SOLUTION

**True.** A cubic regression spline with K knots has K + 4 degrees of freedom (4 for the cubic polynomial basis, plus 1 additional basis function for each knot). With 3 knots: 3 + 4 = 7 degrees of freedom.

> END SOLUTION

**(b)** A natural cubic spline with 3 knots has 7 degrees of freedom.

> BEGIN SOLUTION

**False.** A natural cubic spline with K knots has K degrees of freedom (the boundary constraints reduce it from K + 4).

> END SOLUTION

**(c)** As $\lambda \to \infty$ in a smoothing spline, the fitted curve approaches the OLS regression line.

> BEGIN SOLUTION

**True.** As $\lambda \to \infty$, the penalty term dominates, forcing the second derivative to be zero everywhere, which means the function must be linear.

> END SOLUTION

**(d)** A natural cubic spline is linear beyond the boundary knots because we assume the second derivative equals zero in those regions.

> BEGIN SOLUTION

**False.** A natural cubic spline is linear beyond the boundary knots because we impose that the second AND third derivatives equal zero at the boundaries (not just the second derivative).

> END SOLUTION

---

**Problem 6:** Comparing GAMs and Linear Regression

What is the primary difference between a generalized additive model (GAM) and a standard linear regression model? Choose the best answer:

(a) GAMs can only handle binary outcomes

(b) GAMs require more training data than linear regression

(c) GAMs allow for non-linear relationships between predictors and the response through smooth functions

(d) GAMs cannot include interaction terms

> BEGIN SOLUTION

The correct answer is **(c)**. GAMs allow for non-linear relationships between predictors and the response by replacing the linear terms $\beta_j X_j$ with smooth functions $f_j(X_j)$. This is the defining characteristic that distinguishes GAMs from standard linear regression.

> END SOLUTION

---

**Problem 7:** Model Comparison on Boston Housing Data

Using the Boston housing dataset (provided in `boston.csv`), predict nitrogen oxide concentration (`nox`) from the proportion of non-retail business acres (`indus`).

(a) Compare three modeling approaches using 4-fold cross-validation:
1. A cubic spline with knots at 0, 12, 17, and 30
2. A polynomial regression of degree 12
3. K-nearest neighbors with k=13

Which method achieves the lowest mean squared error? Discuss their relative merits beyond just the MSE.

> BEGIN SOLUTION

Based on 4-fold cross-validation:
- Spline MSE: ~0.0034
- Polynomial MSE: ~0.0033 (lowest)
- KNN MSE: ~0.0040

The polynomial achieves the lowest MSE, but the spline provides a smoother, more interpretable fit. The KNN curve is rough and difficult to interpret. The polynomial shows concerning behavior at the endpoints, suggesting potential overfitting despite its good CV performance. The spline offers a good balance between flexibility and stability, with knots placed at meaningful locations.

> END SOLUTION

(b) Fit a GAM using `pygam` to predict `nox` from three predictors: `dis` (distance to employment centers), `indus`, and `rad` (accessibility to radial highways). Use grid search to find optimal smoothing parameters. Plot the partial dependence functions and discuss what they reveal about the relationships.

In [None]:
# BEGIN SOLUTION
import matplotlib.pyplot as plt
import pandas as pd
from pygam import LinearGAM, s

# Load the Boston housing data
boston_data = pd.read_csv("./data/boston.csv")

# Part (b): Fit a GAM with multiple predictors
features_multi = boston_data[["dis", "indus", "rad"]].values
target = boston_data["nox"].values.reshape(-1)

# Fit GAM with grid search for optimal lambda values
gam = LinearGAM(s(0) + s(1) + s(2))
lambda_values = [0.5, 1, 5, 10]
lambda_grid = [lambda_values] * 3
gam.gridsearch(features_multi, target, lam=lambda_grid)
print(f"Optimal lambda values: {gam.lam}")

# Plot partial dependence functions
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
feature_names = ["dis", "indus", "rad"]

for idx, ax in enumerate(axes):
    xx = gam.generate_X_grid(term=idx)
    ax.plot(xx[:, idx], gam.partial_dependence(term=idx, X=xx))
    ax.plot(
        xx[:, idx],
        gam.partial_dependence(term=idx, X=xx, width=0.95)[1],
        c="r",
        ls="--",
        label="95% CI",
    )
    ax.set_ylim(-0.2, 0.2)
    ax.set_xlabel(feature_names[idx])
    ax.set_ylabel("Partial Dependence")
    ax.set_title(f"Effect of {feature_names[idx]} on nox")

plt.tight_layout()
plt.show()

print("""
Discussion: The partial dependence plots reveal:
- dis: NOx decreases as distance from employment centers increases (expected)
- indus: NOx increases with industrial proportion, leveling off at high values
- rad: The relationship is unclear with wide confidence intervals, suggesting
  insufficient data to reliably estimate this effect. The wiggly pattern likely
  reflects noise rather than a true relationship.
""")
# END SOLUTION

In [None]:
# Test assertions
# Verify the data was loaded correctly
assert boston_data is not None, "boston_data should be loaded"
assert "nox" in boston_data.columns, "boston_data should have nox column"
assert "indus" in boston_data.columns, "boston_data should have indus column"

# Verify GAM was fitted successfully
assert gam is not None, "GAM model should be defined"

print("All tests passed!")

# BEGIN HIDDEN TESTS
assert len(gam.lam) == 3, "GAM should have 3 smoothing parameters"
# END HIDDEN TESTS