# Comprehensive Splines Tutorial - Foundations

This notebook provides a comprehensive introduction to spline methods, covering:
- Mathematical foundations and basis functions
- Regression splines with manual knot placement
- Natural cubic splines with boundary constraints
- Smoothing splines with automatic regularization

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
sys.path.append('../src')

from splines import (
    truncated_power, 
    truncated_power_basis_matrix,
    RegressionSpline,
    NaturalCubicSpline,
    SmoothingSpline
)
from utils import (
    generate_sinusoidal_data, 
    generate_polynomial_data,
    plot_spline_fit, 
    plot_basis_functions,
    plot_cv_curve,
    mean_squared_error, 
    r_squared
)

plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline
np.random.seed(42)

# 1. Truncated Power Basis Functions

## Mathematical Background

For a k-th order spline with knots at $t_1 < \dots < t_m$, the truncated power basis consists of $(m+k+1)$ functions:

- Polynomial terms: $g_1(x) = 1, g_2(x) = x, \dots, g_{k+1}(x) = x^k$
- Truncated power terms: $g_{k+1+j}(x) = (x - t_j)_+^k$, for $j = 1, \dots, m$

where $(x)_+ = \max(x, 0)$ is the positive part function.

In [None]:
x = np.linspace(-2, 3, 500)
knot = 0.5

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

degrees = [1, 2, 3]
for ax, degree in zip(axes, degrees):
    y = truncated_power(x, knot, degree)
    ax.plot(x, y, linewidth=2)
    ax.axvline(knot, color='r', linestyle='--', alpha=0.5, label=f'knot at {knot}')
    ax.set_title(f'$(x - {knot})_+^{degree}$', fontsize=14)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Cubic Spline Basis Functions

For **cubic splines** (degree 3) with 3 interior knots, we have $3 + 3 + 1 = 7$ basis functions:
- $g_1(x) = 1$
- $g_2(x) = x$
- $g_3(x) = x^2$
- $g_4(x) = x^3$
- $g_5(x) = (x - t_1)_+^3$
- $g_6(x) = (x - t_2)_+^3$
- $g_7(x) = (x - t_3)_+^3$

In [None]:
x = np.linspace(0, 10, 500)
knots = np.array([3.0, 5.0, 7.0])
degree = 3

basis_matrix = truncated_power_basis_matrix(x, knots, degree)

print(f"Basis matrix shape: {basis_matrix.shape}")
print(f"Number of basis functions: {basis_matrix.shape[1]}")
print(f"Expected: m + k + 1 = {len(knots)} + {degree} + 1 = {len(knots) + degree + 1}")

In [None]:
fig = plot_basis_functions(x, basis_matrix, knots, 
                           title="Cubic Spline Basis Functions (Truncated Power Basis)",
                           max_functions=7)
plt.show()

## Polynomial Terms vs Truncated Terms

Let's separate and visualize the polynomial terms and truncated power terms.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

ax = axes[0]
for i in range(degree + 1):
    ax.plot(x, basis_matrix[:, i], label=f'$x^{i}$', linewidth=2)
ax.set_title('Polynomial Terms', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('Basis value')
ax.legend()
ax.grid(True, alpha=0.3)

ax = axes[1]
for j, knot in enumerate(knots):
    ax.plot(x, basis_matrix[:, degree + 1 + j], label=f'$(x - {knot})_+^3$', linewidth=2)
    ax.axvline(knot, color='red', linestyle='--', alpha=0.3)
ax.set_title('Truncated Power Terms', fontsize=14)
ax.set_xlabel('x')
ax.set_ylabel('Basis value')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Effect of Polynomial Degree

Compare basis functions for different polynomial degrees (linear, quadratic, cubic).

In [None]:
x = np.linspace(0, 10, 500)
knots = np.array([3.0, 5.0, 7.0])
degrees = [1, 2, 3]

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for ax, degree in zip(axes, degrees):
    basis_matrix = truncated_power_basis_matrix(x, knots, degree)
    
    for i in range(basis_matrix.shape[1]):
        ax.plot(x, basis_matrix[:, i], alpha=0.7)
    
    for knot in knots:
        ax.axvline(knot, color='r', linestyle='--', alpha=0.2)
    
    ax.set_title(f'Degree {degree} Spline Basis\n({len(knots) + degree + 1} functions)', fontsize=12)
    ax.set_xlabel('x')
    ax.set_ylabel('Basis value')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Constructing a Spline from Basis Functions

Any spline can be written as a linear combination of basis functions:
$$f(x) = \sum_{j=1}^{m+k+1} \beta_j g_j(x)$$

Let's create a spline by choosing coefficients manually.

In [None]:
x = np.linspace(0, 10, 500)
knots = np.array([3.0, 5.0, 7.0])
degree = 3

G = truncated_power_basis_matrix(x, knots, degree)

beta = np.array([1.0, 0.5, -0.1, 0.01, 0.2, -0.3, 0.15])

f_x = G @ beta

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, f_x, linewidth=3, label='Spline $f(x) = \\sum \\\\beta_j g_j(x)$')

for knot in knots:
    ax.axvline(knot, color='r', linestyle='--', alpha=0.3, linewidth=2)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('f(x)', fontsize=12)
ax.set_title('Spline as Linear Combination of Basis Functions', fontsize=14)
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)
plt.show()

print("Coefficients β:")
for i, b in enumerate(beta):
    print(f"  β_{i+1} = {b:.3f}")

# 2. Regression Splines

Given samples $(x_i, y_i)$ for $i=1,\dots,n$, we estimate the regression function $r(x) = E(Y|X=x)$ by fitting a $k$-th order spline with knots at prespecified locations $t_1, \dots, t_m$.

We minimize:
$$\sum_{i=1}^n \left(y_i - \sum_{j=1}^{m+k+1} \beta_j g_j(x_i)\right)^2 = \|y - G\beta\|_2^2$$

Solution: $\hat{\beta} = (G^T G)^{-1} G^T y$

## Fit Cubic Regression Spline to Sinusoidal Data

In [None]:
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=50, noise_std=0.2, 
                                            x_range=(0, 10), frequency=1.0)

knots = np.array([2.0, 4.0, 5.0, 6.0, 8.0])

model = RegressionSpline(degree=3)
model.fit(x_train, y_train, knots)

x_test = np.linspace(0, 10, 500)
y_pred = model.predict(x_test)

y_true_func = lambda x: np.sin(2 * np.pi * x / 10)

fig = plot_spline_fit(x_train, y_train, x_test, y_pred, knots, 
                      y_true_func=y_true_func,
                      title="Cubic Regression Spline: Sinusoidal Data")
plt.show()

y_pred_train = model.predict(x_train)
mse = mean_squared_error(y_train, y_pred_train)
r2 = r_squared(y_train, y_pred_train)

print(f"Training MSE: {mse:.4f}")
print(f"Training R²: {r2:.4f}")
print(f"Number of knots: {len(knots)}")
print(f"Number of parameters: {len(knots) + 3 + 1} = {len(model.coefficients)}")

## Effect of Number of Knots

More knots → more flexibility → better fit (but risk of overfitting)

In [None]:
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=50, noise_std=0.2, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)

knot_configs = [
    (np.array([5.0]), "1 knot"),
    (np.linspace(2, 8, 3), "3 knots"),
    (np.linspace(2, 8, 5), "5 knots"),
    (np.linspace(1, 9, 10), "10 knots")
]

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()

for ax, (knots, label) in zip(axes, knot_configs):
    model = RegressionSpline(degree=3)
    model.fit(x_train, y_train, knots)
    y_pred = model.predict(x_test)
    
    y_pred_train = model.predict(x_train)
    mse = mean_squared_error(y_train, y_pred_train)
    r2 = r_squared(y_train, y_pred_train)
    
    ax.scatter(x_train, y_train, alpha=0.5, s=30, label='Data', color='gray')
    ax.plot(x_test, y_pred, 'b-', linewidth=2, label='Spline')
    ax.plot(x_test, np.sin(2*np.pi*x_test/10), 'g--', alpha=0.5, label='True')
    
    for knot in knots:
        ax.axvline(knot, color='r', linestyle='--', alpha=0.3)
    
    ax.set_title(f'{label}\nMSE={mse:.4f}, R²={r2:.4f}', fontsize=11)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend(loc='upper right', fontsize=9)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Effect of Polynomial Degree

Compare linear (degree=1), quadratic (degree=2), and cubic (degree=3) splines.

In [None]:
np.random.seed(42)
x_train, y_train = generate_polynomial_data(n_samples=50, degree=3, noise_std=0.3)
x_test = np.linspace(-1, 1, 500)

knots = np.linspace(-0.6, 0.6, 4)

degrees = [1, 2, 3]

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for ax, degree in zip(axes, degrees):
    model = RegressionSpline(degree=degree)
    model.fit(x_train, y_train, knots)
    y_pred = model.predict(x_test)
    
    y_pred_train = model.predict(x_train)
    mse = mean_squared_error(y_train, y_pred_train)
    
    ax.scatter(x_train, y_train, alpha=0.5, s=30, color='gray')
    ax.plot(x_test, y_pred, 'b-', linewidth=2)
    
    for knot in knots:
        ax.axvline(knot, color='r', linestyle='--', alpha=0.3)
    
    degree_names = {1: 'Linear', 2: 'Quadratic', 3: 'Cubic'}
    ax.set_title(f'{degree_names[degree]} Spline (k={degree})\nMSE={mse:.4f}', fontsize=12)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Boundary Behavior Problem

One problem with regression splines is that the estimates tend to display erratic behavior, i.e., they have high variance, at the boundaries.

Let's demonstrate this issue.

In [None]:
np.random.seed(42)
x_train = np.concatenate([
    np.linspace(0, 1, 5),
    np.linspace(1.5, 8.5, 40),
    np.linspace(9, 10, 5)
])
y_train = np.sin(2*np.pi*x_train/10) + np.random.normal(0, 0.2, len(x_train))

knots = np.linspace(2, 8, 8)

model = RegressionSpline(degree=3)
model.fit(x_train, y_train, knots)

x_test = np.linspace(0, 10, 500)
y_pred = model.predict(x_test)

fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.6, s=40, label='Training data', color='gray', zorder=3)
ax.plot(x_test, y_pred, 'b-', linewidth=2, label='Regression spline')
ax.plot(x_test, np.sin(2*np.pi*x_test/10), 'g--', alpha=0.5, linewidth=2, label='True function')

for knot in knots:
    ax.axvline(knot, color='r', linestyle='--', alpha=0.2)

ax.axvspan(0, 1.5, alpha=0.1, color='red', label='Boundary regions')
ax.axvspan(8.5, 10, alpha=0.1, color='red')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Boundary Variance Problem in Regression Splines', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.show()

## Comparison with Polynomial Regression

Splines are more flexible than global polynomials.

In [None]:
np.random.seed(123)
x_train, y_train = generate_sinusoidal_data(n_samples=50, noise_std=0.25, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)

knots = np.linspace(2, 8, 5)
spline_model = RegressionSpline(degree=3)
spline_model.fit(x_train, y_train, knots)
y_spline = spline_model.predict(x_test)

n_params_spline = len(knots) + 3 + 1
poly_degree = n_params_spline - 1
poly_coeffs = np.polyfit(x_train, y_train, poly_degree)
y_poly = np.polyval(poly_coeffs, x_test)

fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.5, s=30, label='Data', color='gray', zorder=3)
ax.plot(x_test, y_spline, 'b-', linewidth=2, label=f'Regression spline ({n_params_spline} params)')
ax.plot(x_test, y_poly, 'r-', linewidth=2, label=f'Global polynomial (degree {poly_degree})', alpha=0.7)
ax.plot(x_test, np.sin(2*np.pi*x_test/10), 'g--', alpha=0.5, linewidth=2, label='True function')

for knot in knots:
    ax.axvline(knot, color='b', linestyle='--', alpha=0.2)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Regression Spline vs Global Polynomial', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
ax.set_ylim(-2, 2)
plt.show()

mse_spline = mean_squared_error(y_train, spline_model.predict(x_train))
mse_poly = mean_squared_error(y_train, np.polyval(poly_coeffs, x_train))

print(f"\nTraining MSE:")
print(f"  Spline: {mse_spline:.4f}")
print(f"  Polynomial: {mse_poly:.4f}")

# 3. Natural Splines vs Regular Splines

**Natural splines** address the boundary variance problem by:
- Using degree $k$ polynomials between interior knots
- Using degree $(k-1)/2$ polynomials beyond boundaries
- For cubic natural splines (k=3): **linear beyond boundaries**

A way to remedy this problem is to force the piecewise polynomial function to have a lower degree to the left of the leftmost knot, and to the right of the rightmost knot.

**Key advantage**: Natural splines use only $m$ basis functions (vs $m+k+1$ for regular splines).

## Direct Comparison: Regular vs Natural Cubic Splines

In [None]:
np.random.seed(42)
x_train = np.concatenate([
    np.array([0.0, 0.5, 1.0]),
    np.linspace(2, 8, 35),
    np.array([9.0, 9.5, 10.0])
])
y_train = np.sin(2*np.pi*x_train/10) + np.random.normal(0, 0.2, len(x_train))

interior_knots = np.linspace(2, 8, 6)
boundary_knots = np.array([0.0, 10.0])
all_knots = np.sort(np.concatenate([boundary_knots, interior_knots]))

regular_model = RegressionSpline(degree=3)
regular_model.fit(x_train, y_train, interior_knots)

natural_model = NaturalCubicSpline()
natural_model.fit(x_train, y_train, all_knots)

x_test = np.linspace(0, 10, 500)
y_regular = regular_model.predict(x_test)
y_natural = natural_model.predict(x_test)
y_true = np.sin(2*np.pi*x_test/10)

fig, axes = plt.subplots(2, 1, figsize=(14, 10))

ax = axes[0]
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Training data', color='gray', zorder=3)
ax.plot(x_test, y_regular, 'b-', linewidth=2.5, label='Regular cubic spline')
ax.plot(x_test, y_true, 'g--', alpha=0.5, linewidth=2, label='True function')
for knot in interior_knots:
    ax.axvline(knot, color='r', linestyle='--', alpha=0.2)
ax.axvspan(-0.5, 1.5, alpha=0.1, color='orange', label='Boundary regions')
ax.axvspan(8.5, 10.5, alpha=0.1, color='orange')
ax.set_ylabel('y', fontsize=12)
ax.set_title('Regular Cubic Spline (may have high variance at boundaries)', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(-2, 2)

ax = axes[1]
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Training data', color='gray', zorder=3)
ax.plot(x_test, y_natural, 'purple', linewidth=2.5, label='Natural cubic spline')
ax.plot(x_test, y_true, 'g--', alpha=0.5, linewidth=2, label='True function')
for knot in all_knots:
    ax.axvline(knot, color='r', linestyle='--', alpha=0.2)
ax.axvspan(-0.5, 1.5, alpha=0.1, color='lightblue', label='Linear beyond boundaries')
ax.axvspan(8.5, 10.5, alpha=0.1, color='lightblue')
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Natural Cubic Spline (linear beyond boundaries → stable)', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim(-2, 2)

plt.tight_layout()
plt.show()

print(f"Regular spline: {len(interior_knots) + 3 + 1} = {len(regular_model.coefficients)} parameters")
print(f"Natural spline: {len(all_knots)} parameters (fewer!)")

## Zoom In on Boundary Behavior

Let's examine the boundary regions more closely.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

ax = axes[0]
mask_left = x_test <= 2.5
ax.scatter(x_train[x_train <= 2.5], y_train[x_train <= 2.5], 
          alpha=0.7, s=60, color='gray', zorder=3, label='Data')
ax.plot(x_test[mask_left], y_regular[mask_left], 'b-', linewidth=3, label='Regular spline')
ax.plot(x_test[mask_left], y_natural[mask_left], 'purple', linewidth=3, label='Natural spline', linestyle='--')
ax.plot(x_test[mask_left], y_true[mask_left], 'g:', linewidth=2, label='True', alpha=0.7)
ax.axvline(0, color='red', linestyle=':', alpha=0.5, label='Boundary')
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Left Boundary (x ∈ [0, 2.5])', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)

ax = axes[1]
mask_right = x_test >= 7.5
ax.scatter(x_train[x_train >= 7.5], y_train[x_train >= 7.5], 
          alpha=0.7, s=60, color='gray', zorder=3, label='Data')
ax.plot(x_test[mask_right], y_regular[mask_right], 'b-', linewidth=3, label='Regular spline')
ax.plot(x_test[mask_right], y_natural[mask_right], 'purple', linewidth=3, label='Natural spline', linestyle='--')
ax.plot(x_test[mask_right], y_true[mask_right], 'g:', linewidth=2, label='True', alpha=0.7)
ax.axvline(10, color='red', linestyle=':', alpha=0.5, label='Boundary')
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Right Boundary (x ∈ [7.5, 10])', fontsize=13)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Variance Reduction at Boundaries

Demonstrate variance reduction through bootstrap resampling.

In [None]:
np.random.seed(42)
n_samples = 40
x_base = np.concatenate([
    np.array([0.0, 0.5, 1.0]),
    np.linspace(2, 8, 30),
    np.array([9.0, 9.5, 10.0])
])
true_func = lambda x: np.sin(2*np.pi*x/10)

n_bootstrap = 50
x_test = np.linspace(0, 10, 500)

y_regular_samples = []
y_natural_samples = []

for i in range(n_bootstrap):
    y_train = true_func(x_base) + np.random.normal(0, 0.2, len(x_base))
    
    regular_model = RegressionSpline(degree=3)
    regular_model.fit(x_base, y_train, interior_knots)
    y_regular_samples.append(regular_model.predict(x_test))
    
    natural_model = NaturalCubicSpline()
    natural_model.fit(x_base, y_train, all_knots)
    y_natural_samples.append(natural_model.predict(x_test))

y_regular_samples = np.array(y_regular_samples)
y_natural_samples = np.array(y_natural_samples)

var_regular = np.var(y_regular_samples, axis=0)
var_natural = np.var(y_natural_samples, axis=0)

fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(x_test, var_regular, 'b-', linewidth=2, label='Regular spline variance')
ax.plot(x_test, var_natural, 'purple', linewidth=2, label='Natural spline variance')

ax.axvspan(0, 2, alpha=0.1, color='orange')
ax.axvspan(8, 10, alpha=0.1, color='orange')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('Variance', fontsize=12)
ax.set_title(f'Prediction Variance from {n_bootstrap} Bootstrap Samples', fontsize=14)
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)
plt.show()

boundary_mask = (x_test < 2) | (x_test > 8)
interior_mask = (x_test >= 2) & (x_test <= 8)

print("\nAverage variance at BOUNDARIES:")
print(f"  Regular: {np.mean(var_regular[boundary_mask]):.4f}")
print(f"  Natural: {np.mean(var_natural[boundary_mask]):.4f}")
print(f"  Reduction: {(1 - np.mean(var_natural[boundary_mask])/np.mean(var_regular[boundary_mask]))*100:.1f}%")

print("\nAverage variance in INTERIOR:")
print(f"  Regular: {np.mean(var_regular[interior_mask]):.4f}")
print(f"  Natural: {np.mean(var_natural[interior_mask]):.4f}")

## Parameter Efficiency

Natural splines use fewer parameters while maintaining good fit.

In [None]:
np.random.seed(123)
x_train, y_train = generate_sinusoidal_data(n_samples=50, noise_std=0.2, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)

n_knots_list = [4, 6, 8, 10]

results = []
for n_knots in n_knots_list:
    interior_knots = np.linspace(2, 8, n_knots)
    all_knots = np.sort(np.concatenate([[0, 10], interior_knots]))
    
    reg_model = RegressionSpline(degree=3)
    reg_model.fit(x_train, y_train, interior_knots)
    mse_reg = mean_squared_error(y_train, reg_model.predict(x_train))
    n_params_reg = len(reg_model.coefficients)
    
    nat_model = NaturalCubicSpline()
    nat_model.fit(x_train, y_train, all_knots)
    mse_nat = mean_squared_error(y_train, nat_model.predict(x_train))
    n_params_nat = len(nat_model.coefficients)
    
    results.append({
        'n_knots': n_knots,
        'regular_params': n_params_reg,
        'natural_params': n_params_nat,
        'regular_mse': mse_reg,
        'natural_mse': mse_nat
    })

print("\n" + "="*80)
print(f"{'Knots':<10} {'Regular':<20} {'Natural':<20} {'MSE Comparison':<30}")
print(f"{'(m)':<10} {'Params | MSE':<20} {'Params | MSE':<20} {'Regular vs Natural':<30}")
print("="*80)
for r in results:
    print(f"{r['n_knots']:<10} "
          f"{r['regular_params']:<7} | {r['regular_mse']:<11.4f} "
          f"{r['natural_params']:<7} | {r['natural_mse']:<11.4f} "
          f"{r['regular_mse']/r['natural_mse']:<30.2f}")

# 4. Smoothing Splines

Smoothing splines solve the regularized problem:
$$\min_\beta \|y - G\beta\|_2^2 + \lambda \beta^T \Omega \beta$$

where:
- $G$: natural cubic spline basis with **knots at ALL training points** $x_1, \dots, x_n$
- $\Omega_{ij} = \int g''_i(t) g''_j(t) dt$: penalty matrix (penalizes curvature)
- $\lambda \geq 0$: smoothing parameter

**Solution**: $\hat{\beta} = (G^T G + \lambda \Omega)^{-1} G^T y$

**Alternative formulation**: Minimize over ALL functions $f$:
$$\sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx$$

Smoothing splines circumvent the problem of knot selection (as they just use the inputs as knots), and simultaneously, they control for overfitting by shrinking the coefficients.

## Effect of Smoothing Parameter λ

The parameter $\lambda$ controls the bias-variance tradeoff:
- $\lambda \to 0$: More flexible (low bias, high variance) - interpolates data
- $\lambda \to \infty$: More smooth (high bias, low variance) - approaches linear fit

In [None]:
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=30, noise_std=0.3, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)
y_true = np.sin(2*np.pi*x_test/10)

lambdas = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
predictions = {}

for lam in lambdas:
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    predictions[f'λ={lam}'] = model.predict(x_test)

fig = plot_smoothing_comparison(
    x_train, y_train, x_test, predictions,
    y_true_func=lambda x: np.sin(2*np.pi*x/10),
    title="Effect of Smoothing Parameter λ on Smoothing Splines"
)
plt.show()

print("Observation:")
print("  Small λ → wiggly fit (overfitting)")
print("  Large λ → smooth fit (underfitting)")
print("  Need to select optimal λ via cross-validation!")

## Detailed View: λ Effect on Individual Fits

In [None]:
lambdas_detail = [0.001, 0.1, 10.0]

fig, axes = plt.subplots(1, 3, figsize=(16, 4))

for ax, lam in zip(axes, lambdas_detail):
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    
    y_pred_train = model.predict(x_train)
    mse = mean_squared_error(y_train, y_pred_train)
    r2 = r_squared(y_train, y_pred_train)
    
    ax.scatter(x_train, y_train, alpha=0.6, s=50, color='gray', zorder=3, label='Data')
    ax.plot(x_test, y_pred, 'b-', linewidth=2.5, label='Smoothing spline')
    ax.plot(x_test, y_true, 'g--', alpha=0.5, linewidth=2, label='True function')
    
    ax.set_xlabel('x', fontsize=11)
    ax.set_ylabel('y', fontsize=11)
    ax.set_title(f'λ = {lam}\nMSE={mse:.3f}, R²={r2:.3f}', fontsize=12)
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Cross-Validation for λ Selection

The optimal smoothing parameter is typically chosen via cross-validation.

In [None]:
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=60, noise_std=0.25, x_range=(0, 10))

lambdas_cv = np.logspace(-3, 2, 30)

print("Running 5-fold cross-validation...")
model = SmoothingSpline()
best_lambda, cv_errors = model.cross_validate(x_train, y_train, lambdas_cv, cv_folds=5)

print(f"\nOptimal λ = {best_lambda:.4f}")
print(f"CV error at optimal λ = {cv_errors[np.argmin(cv_errors)]:.4f}")

fig = plot_cv_curve(lambdas_cv, cv_errors, best_lambda,
                   title="5-Fold Cross-Validation for Smoothing Parameter Selection")
plt.show()

## Fit with Optimal λ

In [None]:
final_model = SmoothingSpline(lambda_=best_lambda)
final_model.fit(x_train, y_train)

x_test = np.linspace(0, 10, 500)
y_pred = final_model.predict(x_test)
y_true = np.sin(2*np.pi*x_test/10)

fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Training data', color='gray', zorder=3)
ax.plot(x_test, y_pred, 'b-', linewidth=2.5, label=f'Smoothing spline (λ={best_lambda:.4f})')
ax.plot(x_test, y_true, 'g--', alpha=0.6, linewidth=2, label='True function')

for xi in x_train[::5]:
    ax.axvline(xi, color='red', linestyle=':', alpha=0.1)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title(f'Smoothing Spline with Optimal λ (selected by CV)\n'
            f'Knots placed at ALL {len(x_train)} training points', fontsize=13)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.show()

y_pred_train = final_model.predict(x_train)
print(f"\nFinal model performance:")
print(f"  Training MSE: {mean_squared_error(y_train, y_pred_train):.4f}")
print(f"  Training R²: {r_squared(y_train, y_pred_train):.4f}")

## Functional Minimization Perspective

Smoothing splines can be derived from minimizing:
$$\sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx$$

Let's visualize the components of this objective.

In [None]:
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=20, noise_std=0.2, x_range=(0, 10))

lambdas_viz = [0.001, 0.1, 10.0]
x_test = np.linspace(0, 10, 500)

results = []
for lam in lambdas_viz:
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    y_pred_train = model.predict(x_train)
    
    data_fit = np.sum((y_train - y_pred_train)**2)
    
    second_deriv = np.diff(y_pred, n=2) / (x_test[1] - x_test[0])**2
    roughness = np.sum(second_deriv**2) * (x_test[1] - x_test[0])
    
    results.append({
        'lambda': lam,
        'data_fit': data_fit,
        'roughness': roughness,
        'objective': data_fit + lam * roughness
    })

print("\n" + "="*70)
print(f"{'λ':<12} {'Data Fit':<15} {'Roughness':<15} {'Total Objective':<15}")
print(f"{'(param)':<12} {'∑(y-f(x))²':<15} {'∫(f\'\')² dx':<15} {'(approx)':<15}")
print("="*70)
for r in results:
    print(f"{r['lambda']:<12.3f} {r['data_fit']:<15.2f} {r['roughness']:<15.2f} {r['objective']:<15.2f}")

## Comparison: Automatic vs Manual Knot Placement

In [None]:
np.random.seed(123)
x_train, y_train = generate_sinusoidal_data(n_samples=40, noise_std=0.25, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)

manual_knots = np.linspace(2, 8, 6)
reg_model = RegressionSpline(degree=3)
reg_model.fit(x_train, y_train, manual_knots)
y_reg = reg_model.predict(x_test)

smooth_model = SmoothingSpline(lambda_=0.1)
smooth_model.fit(x_train, y_train)
y_smooth = smooth_model.predict(x_test)

fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Data', color='gray', zorder=3)
ax.plot(x_test, y_reg, 'b-', linewidth=2, label=f'Regression spline ({len(manual_knots)} manual knots)', alpha=0.7)
ax.plot(x_test, y_smooth, 'purple', linewidth=2, label=f'Smoothing spline ({len(x_train)} automatic knots)', linestyle='--')
ax.plot(x_test, np.sin(2*np.pi*x_test/10), 'g:', linewidth=2, label='True function', alpha=0.5)

for knot in manual_knots:
    ax.axvline(knot, color='blue', linestyle='--', alpha=0.2)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Manual Knot Placement vs Automatic (Smoothing Spline)', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.show()

## Key Takeaways

### Basis Functions:
1. **Truncated power basis** provides a natural parametrization for splines
2. For degree $k$ with $m$ knots, we need $m + k + 1$ basis functions
3. The basis consists of:
   - Global polynomial terms: $1, x, x^2, \dots, x^k$
   - Local truncated terms: $(x - t_j)_+^k$ that "activate" at each knot
4. Higher degree → more basis functions → more flexible splines
5. Cubic splines (degree 3) are most common in practice

### Regression Splines:
1. **Regression splines** fit splines to data via least squares: $\hat{\beta} = (G^T G)^{-1} G^T y$
2. More knots → more flexibility → better training fit (but potential overfitting)
3. Higher polynomial degree → smoother derivatives but more parameters
4. **Main limitation**: High variance at boundaries (solved by natural splines)
5. **Splines vs polynomials**: Splines offer local control, polynomials are global

### Natural Splines:
1. **Natural splines** are linear beyond boundary knots → reduced variance at boundaries
2. Use only $m$ basis functions (vs $m+k+1$ for regular splines) → more parameter efficient
3. Provide more **stable extrapolation** beyond the data range
4. Particularly useful when data is sparse near boundaries
5. For cubic natural splines: linear extrapolation is often more reasonable than cubic

### Smoothing Splines:
1. **Smoothing splines** eliminate knot selection by using ALL training points as knots
2. **Regularization** via $\lambda \beta^T \Omega \beta$ prevents overfitting
3. Smoothing parameter $\lambda$ controls bias-variance tradeoff:
   - Small $\lambda$ → flexible, wiggly fit
   - Large $\lambda$ → smooth, rigid fit
4. Optimal $\lambda$ chosen via **cross-validation**
5. Equivalent to functional minimization: $\min_f \sum(y_i - f(x_i))^2 + \lambda \int(f'')^2 dx$
6. **Advantages over regression splines**:
   - Automatic knot placement
   - Only one tuning parameter (λ)
   - Computationally efficient with B-spline basis