# Nonlinear Models Uncertainty Quantification Demo

This notebook demonstrates comprehensive uncertainty quantification for nonlinear regression models, comparing:
1. **Delta Method (Asymptotic Theory)** - using scipy.optimize.curve_fit and asymptotic statistics
2. **Conformal Prediction** - distribution-free calibrated prediction intervals

We'll use exponential decay as our nonlinear function and explore both homoskedastic and heteroskedastic noise scenarios.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy import stats
import sys
sys.path.insert(0, '../..')

from src.datasets.nonlinear import ExponentialDecayDataset
from src.models.nonlinear_models import NonlinearModel, exponential_decay_func
from src.uq_methods.nonlinear_uq import NlinfitUQ, ConformalPredictionNonlinear
from src.visualization import setup_plot_style
from src.metrics import picp, mean_interval_width, root_mean_squared_error, r2_score
from src.utils.seeds import set_global_seed

# Set up plotting
setup_plot_style()
%matplotlib inline

## 1. Nonlinear Dataset Generation

We'll create an exponential decay dataset with the form:
$$y = a \cdot e^{-b \cdot x} + c$$

where $(a, b, c)$ are parameters to be estimated. We'll start with homoskedastic noise (constant variance).

In [None]:
# Set random seed for reproducibility
set_global_seed(42)

# Create exponential decay dataset with homoskedastic noise
dataset = ExponentialDecayDataset(
    n_samples=100,
    noise_model='homoskedastic',
    noise_level=0.08,  # 8% noise
    seed=42,
    a=2.0,  # Amplitude
    b=3.0,  # Decay rate
    c=0.5   # Offset
)

# Generate data
data = dataset.generate()

# Print information
print(f"Dataset: {dataset.get_function_form()}")
print(f"True parameters: {dataset.get_parameters()}")
print(f"\nTraining samples: {len(data.X_train)}")
print(f"Test samples: {len(data.X_test)}")
print(f"Noise model: {dataset.noise_model}")
print(f"Noise level: {dataset.noise_level}")

### Visualize the Dataset

Let's visualize the generated data with training points, test regions, and the true underlying function.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

# Plot training data
ax.scatter(data.X_train, data.y_train, alpha=0.6, s=50, label='Training data', 
           color='steelblue', edgecolors='black', linewidth=0.5)

# Plot true function
X_plot = np.linspace(0, 1, 500)
y_true = dataset._generate_clean(X_plot)
ax.plot(X_plot, y_true, 'k-', linewidth=2.5, label='True function', zorder=10)

# Shade test regions
ax.axvspan(0, 0.125, alpha=0.15, color='red', label='Extrapolation (low)')
ax.axvspan(0.875, 1.0, alpha=0.15, color='green', label='Extrapolation (high)')
ax.axvspan(0.375, 0.625, alpha=0.15, color='orange', label='Gap (interpolation)')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Exponential Decay Dataset with Homoskedastic Noise', fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 2. Nonlinear Least Squares Fitting with scipy.optimize.curve_fit

We'll fit the exponential decay model using `scipy.optimize.curve_fit`, which uses the Levenberg-Marquardt algorithm for nonlinear least squares optimization.

In [None]:
# Flatten data for fitting
X_train_flat = data.X_train.flatten()
y_train_flat = data.y_train.flatten()
X_test_flat = data.X_test.flatten()
y_test_flat = data.y_test.flatten()

# Define the model function (for curve_fit)
def exp_decay(x, a, b, c):
    """Exponential decay: y = a * exp(-b * x) + c"""
    return a * np.exp(-b * x) + c

# Initial guess for parameters
initial_guess = [1.5, 2.5, 0.3]

# Fit using curve_fit
params_fit, cov_matrix = curve_fit(
    exp_decay, 
    X_train_flat, 
    y_train_flat, 
    p0=initial_guess,
    maxfev=10000
)

# Extract fitted parameters
a_fit, b_fit, c_fit = params_fit

# Compute predictions
y_pred_train = exp_decay(X_train_flat, *params_fit)
y_pred_test = exp_decay(X_test_flat, *params_fit)

# Compute residuals and statistics
residuals = y_train_flat - y_pred_train
n = len(y_train_flat)
p = len(params_fit)
dof = n - p  # degrees of freedom
sigma_squared = np.sum(residuals**2) / dof
sigma = np.sqrt(sigma_squared)

# Compute parameter standard errors
param_std_errors = np.sqrt(np.diag(cov_matrix))

print("Fitted Parameters:")
print(f"  a = {a_fit:.4f} ± {param_std_errors[0]:.4f} (true: {dataset.a})")
print(f"  b = {b_fit:.4f} ± {param_std_errors[1]:.4f} (true: {dataset.b})")
print(f"  c = {c_fit:.4f} ± {param_std_errors[2]:.4f} (true: {dataset.c})")
print(f"\nResidual standard error: {sigma:.4f}")
print(f"Degrees of freedom: {dof}")
print(f"\nTraining RMSE: {np.sqrt(np.mean(residuals**2)):.4f}")
print(f"Training R²: {r2_score(y_train_flat, y_pred_train):.4f}")

### Computing Prediction Intervals Using the Delta Method

The delta method uses asymptotic theory to compute prediction intervals. For a nonlinear model $f(x, \theta)$, the prediction variance at a new point $x_{new}$ is:

$$\text{Var}[\hat{y}_{new}] = \sigma^2 \left(1 + \nabla f(x_{new}, \hat{\theta})^T \, \text{Cov}(\hat{\theta}) \, \nabla f(x_{new}, \hat{\theta})\right)$$

where:
- $\sigma^2$ is the residual variance
- $\nabla f$ is the Jacobian (gradient with respect to parameters)
- $\text{Cov}(\hat{\theta})$ is the parameter covariance matrix

The prediction interval is then: $\hat{y} \pm t_{\alpha/2, n-p} \cdot \sqrt{\text{Var}[\hat{y}_{new}]}$

In [None]:
def compute_jacobian(x, params):
    """
    Compute Jacobian matrix of exponential decay function.
    
    Returns: J[i, j] = d(f_i)/d(param_j)
    """
    a, b, c = params
    J = np.zeros((len(x), len(params)))
    
    exp_term = np.exp(-b * x)
    
    # df/da = exp(-b*x)
    J[:, 0] = exp_term
    
    # df/db = -a*x*exp(-b*x)
    J[:, 1] = -a * x * exp_term
    
    # df/dc = 1
    J[:, 2] = 1.0
    
    return J

# Compute Jacobian for test points
J_test = compute_jacobian(X_test_flat, params_fit)

# Compute prediction variance using delta method
# Var[y_pred] = sigma^2 * (1 + J * Cov(params) * J^T)
pred_var_delta = np.zeros(len(X_test_flat))
for i in range(len(X_test_flat)):
    # Variance from parameter uncertainty
    param_contrib = J_test[i:i+1, :] @ cov_matrix @ J_test[i:i+1, :].T
    # Total prediction variance (parameter + residual)
    pred_var_delta[i] = sigma_squared * (1 + param_contrib[0, 0])

pred_std_delta = np.sqrt(pred_var_delta)

# Compute prediction intervals using t-distribution
alpha = 0.05  # for 95% confidence
t_crit = stats.t.ppf(1 - alpha/2, dof)
margin_delta = t_crit * pred_std_delta

y_lower_delta = y_pred_test - margin_delta
y_upper_delta = y_pred_test + margin_delta

# Compute metrics
coverage_delta = np.mean((y_test_flat >= y_lower_delta) & (y_test_flat <= y_upper_delta))
mean_width_delta = np.mean(y_upper_delta - y_lower_delta)
rmse_delta = root_mean_squared_error(y_test_flat, y_pred_test)
r2_delta = r2_score(y_test_flat, y_pred_test)

print("Delta Method Results (95% prediction intervals):")
print(f"  Coverage: {coverage_delta:.3f}")
print(f"  Mean interval width: {mean_width_delta:.4f}")
print(f"  Test RMSE: {rmse_delta:.4f}")
print(f"  Test R²: {r2_delta:.4f}")
print(f"  t-statistic (α={alpha}, df={dof}): {t_crit:.3f}")

### Visualize Delta Method Prediction Intervals

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

# Sort test data for plotting
sort_idx = np.argsort(X_test_flat)
X_test_sorted = X_test_flat[sort_idx]
y_test_sorted = y_test_flat[sort_idx]
y_pred_sorted = y_pred_test[sort_idx]
y_lower_sorted = y_lower_delta[sort_idx]
y_upper_sorted = y_upper_delta[sort_idx]

# Plot prediction intervals
ax.fill_between(X_test_sorted, y_lower_sorted, y_upper_sorted, 
                alpha=0.3, color='steelblue', label='95% Prediction interval')

# Plot fitted curve
ax.plot(X_test_sorted, y_pred_sorted, 'b-', linewidth=2, label='Fitted curve')

# Plot true function
X_plot = np.linspace(0, 1, 500)
y_true = dataset._generate_clean(X_plot)
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)

# Plot training data
ax.scatter(X_train_flat, y_train_flat, alpha=0.6, s=50, 
           color='steelblue', edgecolors='black', linewidth=0.5, label='Training data')

# Plot test data
ax.scatter(X_test_flat, y_test_flat, alpha=0.4, s=20, 
           color='orange', label='Test data')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Delta Method Prediction Intervals (Asymptotic Theory)', 
             fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Coverage: {coverage_delta:.1%} (target: 95%)")

## 3. Conformal Prediction for Nonlinear Models

Conformal prediction is a distribution-free method that provides prediction intervals with guaranteed coverage under the exchangeability assumption. Unlike the delta method, it makes no assumptions about the error distribution.

### How Conformal Prediction Works:

1. **Split the training data** into proper training set and calibration set
2. **Fit the model** on the proper training set
3. **Compute nonconformity scores** on the calibration set: $s_i = |y_i - \hat{f}(x_i)|$
4. **Compute the quantile** of these scores at level $(n_{cal} + 1)(1-\alpha) / n_{cal}$
5. **Form prediction intervals**: $\hat{y}_{new} \pm \text{quantile}$

This provides **finite-sample coverage guarantees** rather than asymptotic ones.

In [None]:
# Create nonlinear model using our framework
model = NonlinearModel(
    model_func=exponential_decay_func,
    param_names=['a', 'b', 'c'],
    initial_guess=[1.5, 2.5, 0.3]
)

# Fit the model
model.fit(data.X_train, data.y_train)

print("Model fitted successfully!")
print(f"Fitted parameters: a={model.params[0]:.4f}, b={model.params[1]:.4f}, c={model.params[2]:.4f}")

In [None]:
# Apply conformal prediction
conformal_uq = ConformalPredictionNonlinear(
    confidence_level=0.95,
    calibration_fraction=0.2  # Use 20% of training data for calibration
)

# Compute prediction intervals
result_conformal = conformal_uq.compute_intervals(
    model, 
    data.X_train, 
    data.y_train, 
    data.X_test
)

# Compute metrics
coverage_conformal = picp(data.y_test, result_conformal)
mean_width_conformal = mean_interval_width(result_conformal.y_lower, result_conformal.y_upper)
rmse_conformal = root_mean_squared_error(data.y_test.flatten(), result_conformal.y_pred)
r2_conformal = r2_score(data.y_test.flatten(), result_conformal.y_pred)

print("Conformal Prediction Results (95% prediction intervals):")
print(f"  Coverage: {coverage_conformal:.3f}")
print(f"  Mean interval width: {mean_width_conformal:.4f}")
print(f"  Test RMSE: {rmse_conformal:.4f}")
print(f"  Test R²: {r2_conformal:.4f}")
print(f"  Quantile used: {result_conformal.metadata['quantile']:.4f}")
print(f"  Calibration samples: {result_conformal.metadata['n_calibration']}")

### Visualize Conformal Prediction Intervals

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

# Sort for plotting
sort_idx = np.argsort(X_test_flat)
X_test_sorted = X_test_flat[sort_idx]
y_test_sorted = y_test_flat[sort_idx]
y_pred_conf_sorted = result_conformal.y_pred[sort_idx]
y_lower_conf_sorted = result_conformal.y_lower[sort_idx]
y_upper_conf_sorted = result_conformal.y_upper[sort_idx]

# Plot prediction intervals
ax.fill_between(X_test_sorted, y_lower_conf_sorted, y_upper_conf_sorted, 
                alpha=0.3, color='forestgreen', label='95% Prediction interval')

# Plot fitted curve
ax.plot(X_test_sorted, y_pred_conf_sorted, 'g-', linewidth=2, label='Fitted curve')

# Plot true function
X_plot = np.linspace(0, 1, 500)
y_true = dataset._generate_clean(X_plot)
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)

# Plot training data
ax.scatter(X_train_flat, y_train_flat, alpha=0.6, s=50, 
           color='steelblue', edgecolors='black', linewidth=0.5, label='Training data')

# Plot test data
ax.scatter(X_test_flat, y_test_flat, alpha=0.4, s=20, 
           color='orange', label='Test data')

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Conformal Prediction Intervals (Distribution-Free)', 
             fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Coverage: {coverage_conformal:.1%} (target: 95%)")

## 4. Side-by-Side Comparison

Let's compare both methods directly on the same plot.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Sort data for plotting
sort_idx = np.argsort(X_test_flat)
X_sorted = X_test_flat[sort_idx]
y_sorted = y_test_flat[sort_idx]

# True function
X_plot = np.linspace(0, 1, 500)
y_true = dataset._generate_clean(X_plot)

# --- Delta Method Plot ---
ax = axes[0]
ax.fill_between(X_sorted, y_lower_delta[sort_idx], y_upper_delta[sort_idx], 
                alpha=0.3, color='steelblue', label='95% PI')
ax.plot(X_sorted, y_pred_test[sort_idx], 'b-', linewidth=2, label='Fitted curve')
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)
ax.scatter(X_train_flat, y_train_flat, alpha=0.5, s=40, 
           color='steelblue', edgecolors='black', linewidth=0.5)
ax.scatter(X_test_flat, y_test_flat, alpha=0.3, s=15, color='orange')
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title(f'Delta Method\nCoverage: {coverage_delta:.1%} | Width: {mean_width_delta:.4f}', 
             fontsize=12, fontweight='bold')
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3)

# --- Conformal Prediction Plot ---
ax = axes[1]
ax.fill_between(X_sorted, result_conformal.y_lower[sort_idx], result_conformal.y_upper[sort_idx], 
                alpha=0.3, color='forestgreen', label='95% PI')
ax.plot(X_sorted, result_conformal.y_pred[sort_idx], 'g-', linewidth=2, label='Fitted curve')
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)
ax.scatter(X_train_flat, y_train_flat, alpha=0.5, s=40, 
           color='steelblue', edgecolors='black', linewidth=0.5)
ax.scatter(X_test_flat, y_test_flat, alpha=0.3, s=15, color='orange')
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title(f'Conformal Prediction\nCoverage: {coverage_conformal:.1%} | Width: {mean_width_conformal:.4f}', 
             fontsize=12, fontweight='bold')
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Metrics Comparison

Let's create a comprehensive comparison table of all metrics.

In [None]:
import pandas as pd

# Create comparison DataFrame
metrics_comparison = pd.DataFrame({
    'Method': ['Delta Method', 'Conformal'],
    'Coverage': [coverage_delta, coverage_conformal],
    'RMSE': [rmse_delta, rmse_conformal],
    'R²': [r2_delta, r2_conformal],
    'Mean Width': [mean_width_delta, mean_width_conformal],
})

print("\n" + "="*70)
print("METRICS COMPARISON - Homoskedastic Noise".center(70))
print("="*70)
print(metrics_comparison.to_string(index=False))
print("="*70)
print(f"Target Coverage: 0.95\n")

# Visualize metrics
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

methods = ['Delta Method', 'Conformal']
colors = ['steelblue', 'forestgreen']

# Coverage plot
ax = axes[0]
bars = ax.bar(methods, [coverage_delta, coverage_conformal], color=colors, alpha=0.7, edgecolor='black')
ax.axhline(0.95, color='red', linestyle='--', linewidth=2, label='Target (0.95)')
ax.set_ylabel('Coverage', fontsize=11)
ax.set_title('Coverage Comparison', fontsize=12, fontweight='bold')
ax.set_ylim([0.85, 1.0])
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.3f}', ha='center', va='bottom', fontsize=10, fontweight='bold')

# RMSE plot
ax = axes[1]
bars = ax.bar(methods, [rmse_delta, rmse_conformal], color=colors, alpha=0.7, edgecolor='black')
ax.set_ylabel('RMSE', fontsize=11)
ax.set_title('Prediction Accuracy (RMSE)', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.4f}', ha='center', va='bottom', fontsize=10, fontweight='bold')

# Mean Width plot
ax = axes[2]
bars = ax.bar(methods, [mean_width_delta, mean_width_conformal], color=colors, alpha=0.7, edgecolor='black')
ax.set_ylabel('Mean Interval Width', fontsize=11)
ax.set_title('Interval Efficiency', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.4f}', ha='center', va='bottom', fontsize=10, fontweight='bold')

plt.tight_layout()
plt.show()

## 6. Heteroskedastic Noise Scenario

Now let's test both methods with heteroskedastic (non-constant) noise, where the variance changes with x. This is a more challenging scenario that violates the assumptions of the delta method.

In [None]:
# Create dataset with heteroskedastic noise
dataset_hetero = ExponentialDecayDataset(
    n_samples=100,
    noise_model='heteroskedastic',
    noise_level=0.08,
    seed=43,
    a=2.0,
    b=3.0,
    c=0.5
)

# Generate data
data_hetero = dataset_hetero.generate()

print(f"Dataset: {dataset_hetero.get_function_form()}")
print(f"Noise model: {dataset_hetero.noise_model}")
print(f"Training samples: {len(data_hetero.X_train)}")
print(f"Test samples: {len(data_hetero.X_test)}")

In [None]:
# Visualize heteroskedastic data
fig, ax = plt.subplots(figsize=(12, 6))

ax.scatter(data_hetero.X_train, data_hetero.y_train, alpha=0.6, s=50, 
           label='Training data', color='steelblue', edgecolors='black', linewidth=0.5)

X_plot = np.linspace(0, 1, 500)
y_true = dataset_hetero._generate_clean(X_plot)
ax.plot(X_plot, y_true, 'k-', linewidth=2.5, label='True function', zorder=10)

ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('Exponential Decay Dataset with Heteroskedastic Noise', 
             fontsize=14, fontweight='bold')
ax.legend(loc='upper right', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("Note: Variance increases with x (heteroskedastic)")

In [None]:
# Fit model with heteroskedastic data using scipy curve_fit
X_train_hetero_flat = data_hetero.X_train.flatten()
y_train_hetero_flat = data_hetero.y_train.flatten()
X_test_hetero_flat = data_hetero.X_test.flatten()
y_test_hetero_flat = data_hetero.y_test.flatten()

params_hetero, cov_hetero = curve_fit(
    exp_decay, 
    X_train_hetero_flat, 
    y_train_hetero_flat, 
    p0=initial_guess,
    maxfev=10000
)

y_pred_train_hetero = exp_decay(X_train_hetero_flat, *params_hetero)
y_pred_test_hetero = exp_decay(X_test_hetero_flat, *params_hetero)

residuals_hetero = y_train_hetero_flat - y_pred_train_hetero
n_hetero = len(y_train_hetero_flat)
p_hetero = len(params_hetero)
dof_hetero = n_hetero - p_hetero
sigma_hetero = np.sqrt(np.sum(residuals_hetero**2) / dof_hetero)

# Delta method intervals
J_test_hetero = compute_jacobian(X_test_hetero_flat, params_hetero)
pred_var_delta_hetero = np.zeros(len(X_test_hetero_flat))
for i in range(len(X_test_hetero_flat)):
    param_contrib = J_test_hetero[i:i+1, :] @ cov_hetero @ J_test_hetero[i:i+1, :].T
    pred_var_delta_hetero[i] = sigma_hetero**2 * (1 + param_contrib[0, 0])

pred_std_delta_hetero = np.sqrt(pred_var_delta_hetero)
t_crit_hetero = stats.t.ppf(0.975, dof_hetero)
margin_delta_hetero = t_crit_hetero * pred_std_delta_hetero

y_lower_delta_hetero = y_pred_test_hetero - margin_delta_hetero
y_upper_delta_hetero = y_pred_test_hetero + margin_delta_hetero

coverage_delta_hetero = np.mean((y_test_hetero_flat >= y_lower_delta_hetero) & 
                                 (y_test_hetero_flat <= y_upper_delta_hetero))
width_delta_hetero = np.mean(y_upper_delta_hetero - y_lower_delta_hetero)

print(f"Delta Method (heteroskedastic): Coverage = {coverage_delta_hetero:.3f}, Width = {width_delta_hetero:.4f}")

In [None]:
# Conformal prediction with heteroskedastic data
model_hetero = NonlinearModel(
    model_func=exponential_decay_func,
    param_names=['a', 'b', 'c'],
    initial_guess=[1.5, 2.5, 0.3]
)

model_hetero.fit(data_hetero.X_train, data_hetero.y_train)

result_conformal_hetero = conformal_uq.compute_intervals(
    model_hetero, 
    data_hetero.X_train, 
    data_hetero.y_train, 
    data_hetero.X_test
)

coverage_conformal_hetero = picp(data_hetero.y_test, result_conformal_hetero)
width_conformal_hetero = mean_interval_width(result_conformal_hetero.y_lower, 
                                               result_conformal_hetero.y_upper)

print(f"Conformal (heteroskedastic): Coverage = {coverage_conformal_hetero:.3f}, Width = {width_conformal_hetero:.4f}")

### Visualize Heteroskedastic Results

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

sort_idx = np.argsort(X_test_hetero_flat)
X_sorted = X_test_hetero_flat[sort_idx]
y_sorted = y_test_hetero_flat[sort_idx]

X_plot = np.linspace(0, 1, 500)
y_true = dataset_hetero._generate_clean(X_plot)

# Delta Method
ax = axes[0]
ax.fill_between(X_sorted, y_lower_delta_hetero[sort_idx], y_upper_delta_hetero[sort_idx], 
                alpha=0.3, color='steelblue', label='95% PI')
ax.plot(X_sorted, y_pred_test_hetero[sort_idx], 'b-', linewidth=2, label='Fitted curve')
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)
ax.scatter(X_train_hetero_flat, y_train_hetero_flat, alpha=0.5, s=40, 
           color='steelblue', edgecolors='black', linewidth=0.5)
ax.scatter(X_test_hetero_flat, y_test_hetero_flat, alpha=0.3, s=15, color='orange')
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title(f'Delta Method (Heteroskedastic)\nCoverage: {coverage_delta_hetero:.1%} | Width: {width_delta_hetero:.4f}', 
             fontsize=12, fontweight='bold')
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3)

# Conformal Prediction
ax = axes[1]
ax.fill_between(X_sorted, result_conformal_hetero.y_lower[sort_idx], 
                result_conformal_hetero.y_upper[sort_idx], 
                alpha=0.3, color='forestgreen', label='95% PI')
ax.plot(X_sorted, result_conformal_hetero.y_pred[sort_idx], 'g-', linewidth=2, label='Fitted curve')
ax.plot(X_plot, y_true, 'k--', linewidth=2, label='True function', alpha=0.7)
ax.scatter(X_train_hetero_flat, y_train_hetero_flat, alpha=0.5, s=40, 
           color='steelblue', edgecolors='black', linewidth=0.5)
ax.scatter(X_test_hetero_flat, y_test_hetero_flat, alpha=0.3, s=15, color='orange')
ax.set_xlabel('x', fontsize=11)
ax.set_ylabel('y', fontsize=11)
ax.set_title(f'Conformal (Heteroskedastic)\nCoverage: {coverage_conformal_hetero:.1%} | Width: {width_conformal_hetero:.4f}', 
             fontsize=12, fontweight='bold')
ax.legend(loc='upper right', fontsize=9)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 7. Final Summary and Comparison

Let's create a comprehensive summary comparing both methods across both noise scenarios.

In [None]:
# Create comprehensive comparison
summary_df = pd.DataFrame({
    'Method': ['Delta Method', 'Conformal', 'Delta Method', 'Conformal'],
    'Noise Type': ['Homoskedastic', 'Homoskedastic', 'Heteroskedastic', 'Heteroskedastic'],
    'Coverage': [coverage_delta, coverage_conformal, 
                 coverage_delta_hetero, coverage_conformal_hetero],
    'Mean Width': [mean_width_delta, mean_width_conformal,
                   width_delta_hetero, width_conformal_hetero],
    'RMSE': [rmse_delta, rmse_conformal,
             root_mean_squared_error(y_test_hetero_flat, y_pred_test_hetero),
             root_mean_squared_error(data_hetero.y_test.flatten(), result_conformal_hetero.y_pred)]
})

print("\n" + "="*80)
print("COMPREHENSIVE COMPARISON: Delta Method vs Conformal Prediction".center(80))
print("="*80)
print(summary_df.to_string(index=False))
print("="*80)
print(f"Target Coverage: 0.95")
print("\nKey Observations:")
print("  - Delta method assumes normally distributed errors with constant variance")
print("  - Conformal prediction is distribution-free and adapts to any error pattern")
print("  - Both methods achieve good coverage under homoskedastic noise")
print("  - Conformal prediction maintains coverage better with heteroskedastic noise")
print("="*80)

## Summary

This notebook demonstrated:

### 1. Nonlinear Dataset Generation
- Created exponential decay datasets with controlled parameters
- Explored both homoskedastic and heteroskedastic noise models
- Visualized data structure with training/test/gap regions

### 2. Delta Method (Asymptotic Theory)
- Used `scipy.optimize.curve_fit` for nonlinear least squares fitting
- Computed prediction intervals using the delta method:
  - Jacobian matrix for parameter sensitivity
  - Covariance matrix from optimization
  - t-distribution for finite sample correction
- Assumptions: normally distributed errors, constant variance

### 3. Conformal Prediction
- Distribution-free method with finite-sample guarantees
- Split conformal approach:
  - Proper training set for model fitting
  - Calibration set for quantile estimation
- No parametric assumptions about error distribution

### 4. Key Findings

**Homoskedastic Noise:**
- Both methods achieve good coverage (close to 95%)
- Delta method typically produces slightly narrower intervals
- Similar prediction accuracy (RMSE, R²)

**Heteroskedastic Noise:**
- Delta method may undercover due to violated assumptions
- Conformal prediction maintains robust coverage
- Conformal is more conservative but provides guarantees

### 5. Method Selection Guidelines

**Choose Delta Method when:**
- Errors are approximately normal with constant variance
- You need the tightest possible intervals
- Computational efficiency is critical
- You want parameter uncertainty quantification

**Choose Conformal Prediction when:**
- Error distribution is unknown or non-normal
- Heteroskedastic noise is present
- Finite-sample coverage guarantees are needed
- Distribution-free methods are preferred
- Robustness is more important than efficiency