# Smoothing Splines

## Mathematical Background

Smoothing splines solve the regularized problem:
$$\min_\beta \|y - G\beta\|_2^2 + \lambda \beta^T \Omega \beta$$

where:
- $G$: natural cubic spline basis with **knots at ALL training points** $x_1, \dots, x_n$
- $\Omega_{ij} = \int g''_i(t) g''_j(t) dt$: penalty matrix (penalizes curvature)
- $\lambda \geq 0$: smoothing parameter

**Solution**: $\hat{\beta} = (G^T G + \lambda \Omega)^{-1} G^T y$

**Alternative formulation**: Minimize over ALL functions $f$:
$$\sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx$$

Smoothing splines circumvent the problem of knot selection (as they just use the inputs as knots), and simultaneously, they control for overfitting by shrinking the coefficients.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys
sys.path.append('../src')

from splines import SmoothingSpline
from utils import (
    generate_sinusoidal_data,
    plot_smoothing_comparison,
    plot_cv_curve,
    mean_squared_error,
    r_squared
)

%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')

## 1. Effect of Smoothing Parameter λ

The parameter $\lambda$ controls the bias-variance tradeoff:
- $\lambda \to 0$: More flexible (low bias, high variance) - interpolates data
- $\lambda \to \infty$: More smooth (high bias, low variance) - approaches linear fit

In [None]:
# Generate data
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=30, noise_std=0.3, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)
y_true = np.sin(2*np.pi*x_test/10)

# Try different lambda values
lambdas = [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
predictions = {}

for lam in lambdas:
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    predictions[f'λ={lam}'] = model.predict(x_test)

# Plot comparison
fig = plot_smoothing_comparison(
    x_train, y_train, x_test, predictions,
    y_true_func=lambda x: np.sin(2*np.pi*x/10),
    title="Effect of Smoothing Parameter λ on Smoothing Splines"
)
plt.show()

print("Observation:")
print("  Small λ → wiggly fit (overfitting)")
print("  Large λ → smooth fit (underfitting)")
print("  Need to select optimal λ via cross-validation!")

## 2. Detailed View: λ Effect on Individual Fits

In [None]:
# Same data
lambdas_detail = [0.001, 0.1, 10.0]

fig, axes = plt.subplots(1, 3, figsize=(16, 4))

for ax, lam in zip(axes, lambdas_detail):
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    
    # Compute metrics
    y_pred_train = model.predict(x_train)
    mse = mean_squared_error(y_train, y_pred_train)
    r2 = r_squared(y_train, y_pred_train)
    
    # Plot
    ax.scatter(x_train, y_train, alpha=0.6, s=50, color='gray', zorder=3, label='Data')
    ax.plot(x_test, y_pred, 'b-', linewidth=2.5, label='Smoothing spline')
    ax.plot(x_test, y_true, 'g--', alpha=0.5, linewidth=2, label='True function')
    
    ax.set_xlabel('x', fontsize=11)
    ax.set_ylabel('y', fontsize=11)
    ax.set_title(f'λ = {lam}\nMSE={mse:.3f}, R²={r2:.3f}', fontsize=12)
    ax.legend(fontsize=9)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 3. Cross-Validation for λ Selection

The optimal smoothing parameter is typically chosen via cross-validation.

In [None]:
# Generate data
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=60, noise_std=0.25, x_range=(0, 10))

# Define candidate lambda values (log-scale)
lambdas_cv = np.logspace(-3, 2, 30)

# Perform cross-validation
print("Running 5-fold cross-validation...")
model = SmoothingSpline()
best_lambda, cv_errors = model.cross_validate(x_train, y_train, lambdas_cv, cv_folds=5)

print(f"\nOptimal λ = {best_lambda:.4f}")
print(f"CV error at optimal λ = {cv_errors[np.argmin(cv_errors)]:.4f}")

# Plot CV curve
fig = plot_cv_curve(lambdas_cv, cv_errors, best_lambda,
                   title="5-Fold Cross-Validation for Smoothing Parameter Selection")
plt.show()

## 4. Fit with Optimal λ

In [None]:
# Fit final model with optimal lambda
final_model = SmoothingSpline(lambda_=best_lambda)
final_model.fit(x_train, y_train)

# Predict
x_test = np.linspace(0, 10, 500)
y_pred = final_model.predict(x_test)
y_true = np.sin(2*np.pi*x_test/10)

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Training data', color='gray', zorder=3)
ax.plot(x_test, y_pred, 'b-', linewidth=2.5, label=f'Smoothing spline (λ={best_lambda:.4f})')
ax.plot(x_test, y_true, 'g--', alpha=0.6, linewidth=2, label='True function')

# Mark training points (automatic knots)
for xi in x_train[::5]:  # Show every 5th for clarity
    ax.axvline(xi, color='red', linestyle=':', alpha=0.1)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title(f'Smoothing Spline with Optimal λ (selected by CV)\n'
            f'Knots placed at ALL {len(x_train)} training points', fontsize=13)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.show()

# Metrics
y_pred_train = final_model.predict(x_train)
print(f"\nFinal model performance:")
print(f"  Training MSE: {mean_squared_error(y_train, y_pred_train):.4f}")
print(f"  Training R²: {r_squared(y_train, y_pred_train):.4f}")

## 5. Functional Minimization Perspective

Smoothing splines can be derived from minimizing:
$$\sum_{i=1}^n (y_i - f(x_i))^2 + \lambda \int (f''(x))^2 dx$$

Let's visualize the components of this objective.

In [None]:
# Generate simple data
np.random.seed(42)
x_train, y_train = generate_sinusoidal_data(n_samples=20, noise_std=0.2, x_range=(0, 10))

# Fit models with different lambdas
lambdas_viz = [0.001, 0.1, 10.0]
x_test = np.linspace(0, 10, 500)

results = []
for lam in lambdas_viz:
    model = SmoothingSpline(lambda_=lam)
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    y_pred_train = model.predict(x_train)
    
    # Data fit term
    data_fit = np.sum((y_train - y_pred_train)**2)
    
    # Roughness penalty (approximate via finite differences)
    second_deriv = np.diff(y_pred, n=2) / (x_test[1] - x_test[0])**2
    roughness = np.sum(second_deriv**2) * (x_test[1] - x_test[0])
    
    results.append({
        'lambda': lam,
        'data_fit': data_fit,
        'roughness': roughness,
        'objective': data_fit + lam * roughness
    })

# Display
print("\n" + "="*70)
print(f"{'λ':<12} {'Data Fit':<15} {'Roughness':<15} {'Total Objective':<15}")
print(f"{'(param)':<12} {'∑(y-f(x))²':<15} {'∫(f\'\')² dx':<15} {'(approx)':<15}")
print("="*70)
for r in results:
    print(f"{r['lambda']:<12.3f} {r['data_fit']:<15.2f} {r['roughness']:<15.2f} {r['objective']:<15.2f}")
print("="*70)
print("\nTrade-off: Data fit vs Roughness")
print("  Small λ → prioritize data fit (wiggly curve, high roughness)")
print("  Large λ → prioritize smoothness (poor data fit, low roughness)")

## 6. Comparison: Automatic vs Manual Knot Placement

In [None]:
from splines import RegressionSpline

# Generate data
np.random.seed(123)
x_train, y_train = generate_sinusoidal_data(n_samples=40, noise_std=0.25, x_range=(0, 10))
x_test = np.linspace(0, 10, 500)

# Method 1: Regression spline with manual knots
manual_knots = np.linspace(2, 8, 6)
reg_model = RegressionSpline(degree=3)
reg_model.fit(x_train, y_train, manual_knots)
y_reg = reg_model.predict(x_test)

# Method 2: Smoothing spline with automatic knots
smooth_model = SmoothingSpline(lambda_=0.1)
smooth_model.fit(x_train, y_train)
y_smooth = smooth_model.predict(x_test)

# Plot
fig, ax = plt.subplots(figsize=(12, 6))
ax.scatter(x_train, y_train, alpha=0.5, s=40, label='Data', color='gray', zorder=3)
ax.plot(x_test, y_reg, 'b-', linewidth=2, label=f'Regression spline ({len(manual_knots)} manual knots)', alpha=0.7)
ax.plot(x_test, y_smooth, 'purple', linewidth=2, label=f'Smoothing spline ({len(x_train)} automatic knots)', linestyle='--')
ax.plot(x_test, np.sin(2*np.pi*x_test/10), 'g:', linewidth=2, label='True function', alpha=0.5)

# Show manual knots
for knot in manual_knots:
    ax.axvline(knot, color='blue', linestyle='--', alpha=0.2)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Manual Knot Placement vs Automatic (Smoothing Spline)', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.show()

print("Smoothing splines: No need to choose knot locations!")
print("  → Uses ALL training points as knots")
print("  → Controls overfitting via regularization (λ)")

## Key Takeaways

1. **Smoothing splines** eliminate knot selection by using ALL training points as knots
2. **Regularization** via $\lambda \beta^T \Omega \beta$ prevents overfitting
3. Smoothing parameter $\lambda$ controls bias-variance tradeoff:
   - Small $\lambda$ → flexible, wiggly fit
   - Large $\lambda$ → smooth, rigid fit
4. Optimal $\lambda$ chosen via **cross-validation**
5. Equivalent to functional minimization: $\min_f \sum(y_i - f(x_i))^2 + \lambda \int(f'')^2 dx$
6. **Advantages over regression splines**:
   - Automatic knot placement
   - Only one tuning parameter (λ)
   - Computationally efficient with B-spline basis