# DATASCI 503, Homework 6: Splines and Smoothing

This assignment covers **regression splines** (piecewise polynomials joined at knots), **smoothing splines** (which balance fit and smoothness via a penalty term), and **generalized additive models (GAMs)** that extend these ideas to multiple predictors.

---

**Problem 1:** Piecewise Cubic Functions (Chapter 7, Exercise 1)

We begin by finding functions $f_1$ and $f_2$ so that $f(x)$ can be written in the form

$$f(x)=\begin{cases}
f_1(x) & x<\xi \\
f_2(x) & x>\xi 
\end{cases}$$

#### (a)

Find coefficients $a_1, b_1, c_1, d_1$ such that $f_1(x) = a_1 + b_1 x + c_1 x^2 + d_1 x^3$.

#### (b)

Find coefficients $a_2, b_2, c_2, d_2$ such that $f_2(x) = a_2 + b_2 x + c_2 x^2 + d_2 x^3$.

#### (c)

Show that $f_1(\xi) = f_2(\xi)$ (continuity at the knot).

#### (d)

Show that $f_1'(\xi) = f_2'(\xi)$ (first derivative continuity at the knot).

#### (e)

Show that $f_1''(\xi) = f_2''(\xi)$ (second derivative continuity at the knot).

> BEGIN SOLUTION

#### (a)

We find that $f_1(x)=\beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3$.

In particular, we can let

* $a_1 = \beta_0$,
* $b_1 = \beta_1$,
* $c_1 = \beta_2$,
* $d_1 = \beta_3$,

and write $f_1(x)=a_1 + b_1 x + c_1 x^2 + d_1 x^3$. 

#### (b)

We calculate that 

$$
\begin{align}
    f_2(x) &= \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x - \xi)^3 \\
    &= \beta_0 + \beta_1 x + \beta_2 x^2 + \beta_3 x^3 + \beta_4 (x^3 - 3 \xi x^2 + 3 \xi^2 x - \xi^3) \\
    &= (\beta_0 - \beta_4 \xi^3) + (\beta_1 + 3 \beta_4 \xi^2) x + (\beta_2 - 3 \beta_4 \xi) x^2 + (\beta_3 + \beta_4) x^3
\end{align}
$$

Thus we can take

* $a_2 = \beta_0 - \beta_4 \xi^3$,
* $b_2 = \beta_1 + 3 \beta_4 \xi^2$,
* $c_2 = \beta_2 - 3 \beta_4 \xi$,  and
* $d_2 = \beta_3 + \beta_4$.


Then $f_2(x)=a_2 + b_2 x + c_2 x^2 + d_2 x^3$.


#### (c)

$$
\begin{align}
    f_2(\xi) &= \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 + \beta_4 (\xi - \xi)^3 \\
    &= \beta_0 + \beta_1 \xi + \beta_2 \xi^2 + \beta_3 \xi^3 = f_1(\xi)
\end{align}
$$

#### (d)

$$
\begin{align}
    f_1'(\xi) &= \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 \\
    f_2'(\xi) &= (\beta_1 + 3 \beta_4 \xi^2) + 2(\beta_2 - 3 \beta_4 \xi) \xi + 3(\beta_3 + \beta_4) \xi^2 \\
    &= \beta_1 + 2 \beta_2 \xi + 3 \beta_3 \xi^2 = f_1'(\xi)
\end{align}
$$

#### (e)

$$
\begin{align}
    f_1''(\xi) &= 2 \beta_2 + 6 \beta_3 \xi \\
    f_2''(\xi) &= 2(\beta_2 - 3 \beta_4 \xi) + 6 (\beta_3 + \beta_4) \xi \\
    &= 2 \beta_2 + 6 \beta_3 \xi = f_1''(\xi)
\end{align}
$$

> END SOLUTION

---

**Problem 2:** Smoothing Spline Behavior (Chapter 7, Exercise 2)

Consider the smoothing spline optimization problem:

$$\min_g \sum_{i=1}^{n}(y_i - g(x_i))^2 + \lambda \int g^{(m)}(x)^2 dx$$

where $g^{(m)}$ is the $m$-th derivative of $g$. For each of the following scenarios, sketch (or produce in code) what $\hat{g}$ will look like. Generate some sample data using:

```python
np.random.seed(1)
x = np.linspace(-5, 5, 100)
y = x ** 3 + 4 * x ** 2 + 3 * x + 1 + np.random.randn(100)
```

#### (a)

$\lambda = \infty$, $m = 0$

#### (b)

$\lambda = \infty$, $m = 1$

#### (c)

$\lambda = \infty$, $m = 2$

#### (d)

$\lambda = \infty$, $m = 3$

#### (e)

$\lambda = 0$, $m = 3$

In [None]:
# BEGIN SOLUTION
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pygam import LinearGAM, s
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold, cross_val_score
from sklearn.neighbors import KNeighborsRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, SplineTransformer, StandardScaler

np.random.seed(1)
x_vals = np.linspace(-5, 5, 100)
y_vals = x_vals**3 + 4 * x_vals**2 + 3 * x_vals + 1 + np.random.randn(100)
X_data = x_vals.reshape(-1, 1)

# (a) lambda = infinity, m = 0: g(x) = 0 (minimize integral of g^0 = g itself)
plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_vals, label="Original Data Points")
plt.plot(X_data, np.zeros(X_data.size), label="$\\hat{g}$", color="red")
plt.title("(a) When $\\lambda = \\infty, m = 0$")
plt.ylabel("y")
plt.xlabel("x")
plt.legend()
plt.show()

# (b) lambda = infinity, m = 1: g(x) = constant (minimize integral of g')
poly_pipeline_b = Pipeline(
    [
        ("poly", PolynomialFeatures(degree=0)),
        ("scaling", StandardScaler()),
        ("linear_regression", LinearRegression()),
    ]
)
poly_pipeline_b.fit(X_data, y_vals)
y_pred_b = poly_pipeline_b.predict(X_data)

plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_vals, label="Original Data Points")
plt.plot(X_data, y_pred_b, label="$\\hat{g}$", color="red")
plt.title("(b) When $\\lambda = \\infty, m = 1$")
plt.ylabel("y")
plt.xlabel("x")
plt.legend()
plt.show()

# (c) lambda = infinity, m = 2: g(x) = linear (minimize integral of g'')
poly_pipeline_c = Pipeline(
    [
        ("poly", PolynomialFeatures(degree=1)),
        ("scaling", StandardScaler()),
        ("linear_regression", LinearRegression()),
    ]
)
poly_pipeline_c.fit(X_data, y_vals)
y_pred_c = poly_pipeline_c.predict(X_data)

plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_vals, label="Original Data Points")
plt.plot(X_data, y_pred_c, label="$\\hat{g}$", color="red")
plt.title("(c) When $\\lambda = \\infty, m = 2$")
plt.ylabel("y")
plt.xlabel("x")
plt.legend()
plt.show()

# (d) lambda = infinity, m = 3: g(x) = quadratic (minimize integral of g''')
poly_pipeline_d = Pipeline(
    [
        ("poly", PolynomialFeatures(degree=2)),
        ("scaling", StandardScaler()),
        ("linear_regression", LinearRegression()),
    ]
)
poly_pipeline_d.fit(X_data, y_vals)
y_pred_d = poly_pipeline_d.predict(X_data)

plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_vals, label="Original Data Points")
plt.plot(X_data, y_pred_d, label="$\\hat{g}$", color="red")
plt.title("(d) When $\\lambda = \\infty, m = 3$")
plt.ylabel("y")
plt.xlabel("x")
plt.legend()
plt.show()

# (e) lambda = 0, m = 3: g(x) interpolates all data points (no smoothing penalty)
plt.figure(figsize=(10, 6))
plt.scatter(X_data, y_vals, label="Original Data Points")
plt.plot(X_data, y_vals, label="$\\hat{g}$", color="red")
plt.title("(e) When $\\lambda = 0, m = 3$")
plt.ylabel("y")
plt.xlabel("x")
plt.legend()
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Verify the data was generated correctly
assert len(x_vals) == 100, "Should have 100 data points"
assert len(y_vals) == 100, "Should have 100 y values"
assert X_data.shape == (100, 1), "X_data should be reshaped to (100, 1)"

# Verify predictions have the correct shape
assert len(y_pred_b) == 100, "Constant prediction should have 100 values"
assert len(y_pred_c) == 100, "Linear prediction should have 100 values"
assert len(y_pred_d) == 100, "Quadratic prediction should have 100 values"

print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify the constant prediction is approximately the mean
assert abs(y_pred_b.mean() - y_vals.mean()) < 0.1, "Constant should be close to mean"
# Verify linear prediction has some variance
assert y_pred_c.std() > 0, "Linear prediction should vary"
# END HIDDEN TESTS

---

**Problem 3:** Sketching a Truncated Power Function (Chapter 7, Exercise 3)

Given the function $f(x) = 1 + x - 2(x-1)^2_+$ where $(x)_+ = \max(0, x)$, sketch this curve and identify key features such as:
- Intercepts with the axes
- Slope in different regions
- The location and nature of the knot at $x = 1$

In [None]:
# BEGIN SOLUTION
# Plot the truncated power function f(x) = 1 + x - 2(x-1)^2_+
x_vals = np.linspace(-2, 2, 1000)
y_vals = 1 + x_vals - 2 * ((x_vals - 1) ** 2) * (x_vals >= 1)

plt.figure(figsize=(10, 6))
plt.plot(x_vals, y_vals, label="$f(x) = 1 + x - 2(x-1)^2_+$", color="red")
plt.ylabel("y")
plt.xlabel("x")
plt.axhline(0, color="blue")
plt.axvline(0, color="blue")
plt.annotate(
    "intersects x-axis at (-1, 0)",
    xy=(-1, 0),
    xytext=(-1.8, 0.5),
    arrowprops={"arrowstyle": "<|-|>", "color": "pink", "lw": 3.5, "ls": "--"},
)
plt.annotate(
    "crosses y-axis at (0, 1)",
    xy=(0, 1),
    xytext=(0.1, 0.5),
    arrowprops={"arrowstyle": "<|-|>", "color": "pink", "lw": 3.5, "ls": "--"},
)
plt.annotate("slope is 1 for x < 1", xy=(-0.5, 1))
plt.annotate("knot at x = 1", xy=(1, 2), xytext=(1.2, 2.3))
plt.legend()
plt.title("Truncated Power Function with Knot at x = 1")
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Verify key properties of the truncated power function
assert len(x_vals) == 1000, "Should have 1000 points for smooth plotting"
assert len(y_vals) == 1000, "y_vals should match x_vals length"

# Verify y-intercept: f(0) = 1 + 0 - 0 = 1
y_at_zero = 1 + 0 - 2 * max(0, 0 - 1) ** 2
assert abs(y_at_zero - 1.0) < 0.01, "y-intercept should be 1"

# Verify x-intercept: f(-1) = 1 + (-1) - 0 = 0
y_at_neg_one = 1 + (-1) - 2 * max(0, -1 - 1) ** 2
assert abs(y_at_neg_one - 0.0) < 0.01, "f(-1) should be 0"

print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify value at knot: f(1) = 1 + 1 - 0 = 2
y_at_knot = 1 + 1 - 2 * max(0, 1 - 1) ** 2
assert abs(y_at_knot - 2.0) < 0.01, "f(1) should be 2"
# END HIDDEN TESTS

---

**Problem 4:** Sketching Basis Function Expansions (Chapter 7, Exercise 4)

Given the model:
$$Y = \beta_0 + \beta_1 b_1(X) + \beta_2 b_2(X) + \varepsilon$$

where:
- $b_1(X) = I(0 \le X \le 2) - (X-1)I(1 \le X \le 2)$
- $b_2(X) = (X-3)I(3 \le X \le 4) + I(4 < X \le 5)$

and $\beta_0 = 1$, $\beta_1 = 1$, $\beta_2 = 3$, sketch the estimated curve and annotate its key features.

In [None]:
# BEGIN SOLUTION
# Plot the basis function expansion
x_vals = np.linspace(-2, 6, 800)
b_1 = ((x_vals >= 0) & (x_vals <= 2)) - (x_vals - 1) * ((x_vals >= 1) & (x_vals <= 2))
b_2 = (x_vals - 3) * ((x_vals >= 3) & (x_vals <= 4)) + ((x_vals > 4) & (x_vals <= 5))
y_vals = 1 + b_1 + 3 * b_2

plt.figure(figsize=(10, 6))
plt.plot(x_vals, y_vals, label="Estimated Curve", color="red")
plt.ylabel("y")
plt.xlabel("x")
plt.axhline(0, color="blue")
plt.axvline(0, color="blue")
plt.annotate(
    "crosses y-axis at (0, 2)",
    xy=(0, 2),
    xytext=(0.1, 2.5),
    arrowprops={"arrowstyle": "<|-|>", "color": "pink", "lw": 3.5, "ls": "--"},
)
plt.annotate("slope = 0", xy=(-1, 1.2))
plt.annotate("slope = 0", xy=(0.1, 1.8))
plt.annotate("slope = -1", xy=(0.8, 1.2))
plt.annotate("slope = 0", xy=(2, 1.2))
plt.annotate("slope = 3", xy=(3.5, 2))
plt.annotate("slope = 0", xy=(4.1, 3.8))
plt.annotate("slope = 0", xy=(5.1, 1.2))
plt.legend()
plt.title("Basis Function Expansion")
plt.show()
# END SOLUTION

In [None]:
# Test assertions
# Verify the data arrays have correct shapes
assert len(x_vals) == 800, "Should have 800 points for smooth plotting"
assert len(y_vals) == 800, "y_vals should match x_vals length"
assert len(b_1) == 800, "b_1 should match x_vals length"
assert len(b_2) == 800, "b_2 should match x_vals length"

# Verify y-intercept: at x=0, b_1=1, b_2=0, so y = 1 + 1 + 0 = 2
# Find index closest to x=0
idx_zero = np.argmin(np.abs(x_vals - 0))
assert abs(y_vals[idx_zero] - 2.0) < 0.1, "y-intercept should be approximately 2"

print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify value at x=4.5 (in range where b_2=1): y = 1 + 0 + 3*1 = 4
idx_4_5 = np.argmin(np.abs(x_vals - 4.5))
assert abs(y_vals[idx_4_5] - 4.0) < 0.1, "y(4.5) should be approximately 4"
# END HIDDEN TESTS

---

**Problem 5:** True or False - Splines and Polynomials

For each statement, indicate whether it is True or False and briefly explain your reasoning.

(a) A cubic regression spline with 3 knots has 7 degrees of freedom.

(b) A natural cubic spline with 3 knots has 7 degrees of freedom.

(c) As $\lambda \to \infty$ in a smoothing spline, the fitted curve approaches the OLS regression line.

(d) A natural cubic spline is linear beyond the boundary knots because we assume the second derivative equals zero in those regions.

> BEGIN SOLUTION

(a) **True.** A cubic regression spline with K knots has K + 4 degrees of freedom (4 for the cubic polynomial basis, plus 1 additional basis function for each knot). With 3 knots: 3 + 4 = 7 degrees of freedom.

(b) **False.** A natural cubic spline with K knots has K degrees of freedom (the boundary constraints reduce it from K + 4).

(c) **True.** As $\lambda \to \infty$, the penalty term dominates, forcing the second derivative to be zero everywhere, which means the function must be linear.

(d) **False.** A natural cubic spline is linear beyond the boundary knots because we impose that the second AND third derivatives equal zero at the boundaries (not just the second derivative).
> END SOLUTION

---

**Problem 6:** Comparing GAMs and Linear Regression

What is the primary difference between a generalized additive model (GAM) and a standard linear regression model? Choose the best answer:

(a) GAMs can only handle binary outcomes

(b) GAMs require more training data than linear regression

(c) GAMs allow for non-linear relationships between predictors and the response through smooth functions

(d) GAMs cannot include interaction terms

> BEGIN SOLUTION

The correct answer is **(c)**. GAMs allow for non-linear relationships between predictors and the response by replacing the linear terms $\beta_j X_j$ with smooth functions $f_j(X_j)$. This is the defining characteristic that distinguishes GAMs from standard linear regression.
> END SOLUTION


---

**Problem 7:** Model Comparison on Boston Housing Data

Using the Boston housing dataset (provided in `boston.csv`), predict nitrogen oxide concentration (`nox`) from the proportion of non-retail business acres (`indus`).

#### (a)

Compare three modeling approaches using 4-fold cross-validation:
1. A cubic spline with knots at 0, 12, 17, and 30
2. A polynomial regression of degree 12
3. K-nearest neighbors with k=13

Which method achieves the lowest mean squared error? Plot the fitted curves for all three methods and discuss their relative merits beyond just the MSE.

#### (b)

Fit a GAM using `pygam` to predict `nox` from three predictors: `dis` (distance to employment centers), `indus`, and `rad` (accessibility to radial highways). Use grid search to find optimal smoothing parameters. Plot the partial dependence functions and discuss what they reveal about the relationships.

In [None]:
# BEGIN SOLUTION
# Part (a): Compare three modeling approaches

# Load the Boston housing data
boston_data = pd.read_csv("./data/boston.csv")

features = boston_data["indus"].values.reshape(-1, 1)
target = boston_data["nox"].values.reshape(-1)

# Set up cross-validation
cv_splitter = KFold(n_splits=4, random_state=3, shuffle=True)

# Define the three models
spline_pipeline = Pipeline(
    [
        ("spline", SplineTransformer(degree=3, knots=np.array([0, 12, 17, 30]).reshape(-1, 1))),
        ("linear_regression", LinearRegression()),
    ]
)

poly_pipeline = Pipeline(
    [
        ("poly", PolynomialFeatures(degree=12)),
        ("scaling", StandardScaler()),
        ("linear_regression", LinearRegression()),
    ]
)

knn_model = KNeighborsRegressor(n_neighbors=13)

# Evaluate each model
spline_mse = -cross_val_score(
    spline_pipeline, features, target, scoring="neg_mean_squared_error", cv=cv_splitter
)
print(f"Spline MSE: {spline_mse.mean():.4f}")

poly_mse = -cross_val_score(
    poly_pipeline, features, target, scoring="neg_mean_squared_error", cv=cv_splitter
)
print(f"Polynomial MSE: {poly_mse.mean():.4f}")

knn_mse = -cross_val_score(
    knn_model, features, target, scoring="neg_mean_squared_error", cv=cv_splitter
)
print(f"KNN MSE: {knn_mse.mean():.4f}")

# Fit models and plot
spline_pipeline.fit(features, target)
poly_pipeline.fit(features, target)
knn_model.fit(features, target)

x_plot = np.linspace(features.min(), features.max(), 1000).reshape(-1, 1)

plt.figure(figsize=(10, 6))
plt.scatter(features, target, facecolor="gray", label="Data", alpha=0.3)
plt.plot(x_plot, spline_pipeline.predict(x_plot), label="Spline", color="dodgerblue")
plt.plot(x_plot, poly_pipeline.predict(x_plot), label="Polynomial", color="green")
plt.plot(x_plot, knn_model.predict(x_plot), label="KNN", color="purple")
plt.xlabel("indus")
plt.ylabel("nox")
plt.title("Predicted NOx Concentration vs Industrial Proportion")
plt.legend()
plt.show()

print("""
Discussion: The polynomial achieves the lowest MSE, but the spline provides a smoother,
more interpretable fit. The KNN curve is rough and difficult to interpret. The polynomial
shows concerning behavior at the endpoints, suggesting potential overfitting.
""")
# END SOLUTION

In [None]:
# BEGIN SOLUTION
# Part (b): Fit a GAM with multiple predictors
features_multi = boston_data[["dis", "indus", "rad"]].values
target = boston_data["nox"].values.reshape(-1)

# Fit GAM with grid search for optimal lambda values
gam = LinearGAM(s(0) + s(1) + s(2))
lambda_values = [0.5, 1, 5, 10]
lambda_grid = [lambda_values] * 3
gam.gridsearch(features_multi, target, lam=lambda_grid)
print(f"Optimal lambda values: {gam.lam}")

# Plot partial dependence functions
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
feature_names = ["dis", "indus", "rad"]

for i, ax in enumerate(axes):
    xx = gam.generate_X_grid(term=i)
    ax.plot(xx[:, i], gam.partial_dependence(term=i, X=xx))
    ax.plot(
        xx[:, i],
        gam.partial_dependence(term=i, X=xx, width=0.95)[1],
        c="r",
        ls="--",
        label="95% CI",
    )
    ax.set_ylim(-0.2, 0.2)
    ax.set_xlabel(feature_names[i])
    ax.set_ylabel("Partial Dependence")
    ax.set_title(f"Effect of {feature_names[i]} on nox")

plt.tight_layout()
plt.show()

print("""
Discussion: The partial dependence plots reveal:
- dis: NOx decreases as distance from employment centers increases (expected)
- indus: NOx increases with industrial proportion, leveling off at high values
- rad: The relationship is unclear with wide confidence intervals, suggesting
  insufficient data to reliably estimate this effect. The wiggly pattern likely
  reflects noise rather than a true relationship.
""")
# END SOLUTION

In [None]:
# Test assertions
# Verify the data was loaded correctly
assert boston_data is not None, "boston_data should be loaded"
assert "nox" in boston_data.columns, "boston_data should have nox column"
assert "indus" in boston_data.columns, "boston_data should have indus column"

# Verify models were fitted
assert spline_pipeline is not None, "spline_pipeline should be defined"
assert poly_pipeline is not None, "poly_pipeline should be defined"
assert knn_model is not None, "knn_model should be defined"

# Verify MSE values are reasonable (should be small positive numbers)
assert 0 < spline_mse.mean() < 1, "Spline MSE should be small positive number"
assert 0 < poly_mse.mean() < 1, "Polynomial MSE should be small positive number"
assert 0 < knn_mse.mean() < 1, "KNN MSE should be small positive number"

print("All tests passed!")

# BEGIN HIDDEN TESTS
# Verify GAM was fitted successfully
assert gam is not None, "GAM model should be defined"
assert len(gam.lam) == 3, "GAM should have 3 smoothing parameters"
# END HIDDEN TESTS