# SciPy Interpolation and Curve Fitting

## Learning Objectives

By the end of this notebook, you will be able to:

1. Use `interp1d` for 1D interpolation with various methods
2. Fit curves to data using `curve_fit`
3. Work with splines for smooth data approximation
4. Perform polynomial fitting and understand its limitations
5. Apply interpolation and fitting to real scientific data

---

## 1. Introduction to Interpolation

Interpolation is the process of estimating values between known data points. It's essential when:
- You have sparse data and need intermediate values
- Smoothing noisy measurements
- Resampling data to a different resolution

In [None]:
import numpy as np
from scipy import interpolate
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Set up plotting style
plt.style.use('seaborn-v0_8-whitegrid')

print("SciPy interpolation and fitting tools loaded!")

---

## 2. 1D Interpolation with interp1d

The `interp1d` class creates an interpolation function from discrete data points.

### 2.1 Basic Linear Interpolation

In [None]:
# Example: Temperature measurements at specific times
hours = np.array([0, 4, 8, 12, 16, 20, 24])
temperatures = np.array([15, 12, 14, 22, 25, 20, 16])

# Create linear interpolation function
f_linear = interpolate.interp1d(hours, temperatures, kind='linear')

# Interpolate at new points
hours_new = np.linspace(0, 24, 100)
temp_interp = f_linear(hours_new)

# Plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(hours, temperatures, 'ro', markersize=10, label='Measured data')
ax.plot(hours_new, temp_interp, 'b-', linewidth=2, label='Linear interpolation')
ax.set_xlabel('Hour of day')
ax.set_ylabel('Temperature (°C)')
ax.set_title('Temperature Throughout the Day')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

# Get temperature at specific time
print(f"Temperature at 10:00: {f_linear(10):.1f}°C")
print(f"Temperature at 14:00: {f_linear(14):.1f}°C")

### 2.2 Interpolation Methods

`interp1d` supports several interpolation methods:
- `'linear'`: Linear interpolation (default)
- `'nearest'`: Nearest neighbor
- `'zero'`: Zero-order (step function)
- `'quadratic'`: Quadratic spline
- `'cubic'`: Cubic spline

In [None]:
# Compare different interpolation methods
methods = ['linear', 'nearest', 'zero', 'quadratic', 'cubic']

fig, axes = plt.subplots(2, 3, figsize=(15, 8))
axes = axes.flatten()

# Original data
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([0, 0.8, 0.9, 0.1, -0.8, -1])
x_new = np.linspace(0, 5, 100)

for i, method in enumerate(methods):
    f = interpolate.interp1d(x, y, kind=method)
    y_new = f(x_new)
    
    axes[i].plot(x, y, 'ro', markersize=10, label='Data')
    axes[i].plot(x_new, y_new, 'b-', linewidth=2, label=f'{method}')
    axes[i].set_title(f'{method.capitalize()} Interpolation')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

# Hide the last empty subplot
axes[5].axis('off')

plt.tight_layout()
plt.show()

### 2.3 Handling Boundaries

By default, `interp1d` raises an error for values outside the data range. You can change this behavior:

In [None]:
# Sample data
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 1, 3, 2])

# Different boundary handling options
f_error = interpolate.interp1d(x, y, kind='linear', bounds_error=True)  # Default
f_constant = interpolate.interp1d(x, y, kind='linear', bounds_error=False, fill_value=0)
f_extrapolate = interpolate.interp1d(x, y, kind='linear', fill_value='extrapolate')
f_tuple = interpolate.interp1d(x, y, kind='linear', bounds_error=False, fill_value=(-1, 5))

x_test = np.linspace(-1, 5, 100)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Constant fill value
axes[0].plot(x, y, 'ro', markersize=10, label='Data')
axes[0].plot(x_test, f_constant(x_test), 'b-', linewidth=2)
axes[0].axvline(x[0], color='gray', linestyle='--', alpha=0.5)
axes[0].axvline(x[-1], color='gray', linestyle='--', alpha=0.5)
axes[0].set_title('fill_value=0')
axes[0].legend()

# Extrapolate
axes[1].plot(x, y, 'ro', markersize=10, label='Data')
axes[1].plot(x_test, f_extrapolate(x_test), 'b-', linewidth=2)
axes[1].axvline(x[0], color='gray', linestyle='--', alpha=0.5)
axes[1].axvline(x[-1], color='gray', linestyle='--', alpha=0.5)
axes[1].set_title("fill_value='extrapolate'")
axes[1].legend()

# Tuple fill value (different for below/above)
axes[2].plot(x, y, 'ro', markersize=10, label='Data')
axes[2].plot(x_test, f_tuple(x_test), 'b-', linewidth=2)
axes[2].axvline(x[0], color='gray', linestyle='--', alpha=0.5)
axes[2].axvline(x[-1], color='gray', linestyle='--', alpha=0.5)
axes[2].set_title('fill_value=(-1, 5)')
axes[2].legend()

plt.tight_layout()
plt.show()

---

## 3. Spline Interpolation

Splines are piecewise polynomial functions that provide smooth interpolation. SciPy offers several spline classes for different needs.

### 3.1 Cubic Splines

In [None]:
# Generate sample data with some curvature
x = np.array([0, 1, 2, 3, 4, 5, 6])
y = np.array([1, 2.5, 2, 4, 3.5, 5, 4])

# Create cubic spline
cs = interpolate.CubicSpline(x, y)

x_new = np.linspace(0, 6, 200)

# Plot spline and its derivatives
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Original spline
axes[0, 0].plot(x, y, 'ro', markersize=10, label='Data points')
axes[0, 0].plot(x_new, cs(x_new), 'b-', linewidth=2, label='Cubic spline')
axes[0, 0].set_title('Cubic Spline Interpolation')
axes[0, 0].legend()

# First derivative
axes[0, 1].plot(x_new, cs(x_new, 1), 'g-', linewidth=2)
axes[0, 1].axhline(0, color='gray', linestyle='--', alpha=0.5)
axes[0, 1].set_title('First Derivative (Slope)')

# Second derivative
axes[1, 0].plot(x_new, cs(x_new, 2), 'r-', linewidth=2)
axes[1, 0].axhline(0, color='gray', linestyle='--', alpha=0.5)
axes[1, 0].set_title('Second Derivative (Curvature)')

# Integral
integral = cs.antiderivative()
axes[1, 1].plot(x_new, integral(x_new) - integral(0), 'm-', linewidth=2)
axes[1, 1].set_title('Integral (from x=0)')

plt.tight_layout()
plt.show()

### 3.2 Spline Boundary Conditions

In [None]:
# Compare different boundary conditions
x = np.array([0, 1, 2, 3, 4, 5])
y = np.array([1, 2, 0.5, 2.5, 1.5, 3])

# Different boundary conditions
bc_types = [
    ('not-a-knot', 'Not-a-knot (default)'),
    ('natural', 'Natural (zero second derivative at ends)'),
    ('clamped', 'Clamped (zero first derivative at ends)'),
    ('periodic', 'Periodic (for periodic data)'),
]

x_new = np.linspace(0, 5, 200)

fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.flatten()

for i, (bc, title) in enumerate(bc_types):
    if bc == 'periodic':
        # For periodic, first and last y values should be equal
        y_periodic = np.array([1, 2, 0.5, 2.5, 1.5, 1])  # Modified
        cs = interpolate.CubicSpline(x, y_periodic, bc_type=bc)
        axes[i].plot(x, y_periodic, 'ro', markersize=10)
    else:
        cs = interpolate.CubicSpline(x, y, bc_type=bc)
        axes[i].plot(x, y, 'ro', markersize=10)
    
    axes[i].plot(x_new, cs(x_new), 'b-', linewidth=2)
    axes[i].set_title(title)
    axes[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 3.3 B-Splines and UnivariateSpline

In [None]:
# UnivariateSpline for smoothing noisy data
np.random.seed(42)

# Generate noisy sinusoidal data
x = np.linspace(0, 2*np.pi, 30)
y_true = np.sin(x)
y_noisy = y_true + np.random.normal(0, 0.15, size=len(x))

# Different smoothing levels
# s parameter controls smoothing: larger s = smoother curve
spline_exact = interpolate.UnivariateSpline(x, y_noisy, s=0)  # Interpolate exactly
spline_smooth1 = interpolate.UnivariateSpline(x, y_noisy, s=0.5)
spline_smooth2 = interpolate.UnivariateSpline(x, y_noisy, s=2)

x_new = np.linspace(0, 2*np.pi, 200)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Exact interpolation (no smoothing)
axes[0].plot(x, y_noisy, 'ro', alpha=0.6, label='Noisy data')
axes[0].plot(x_new, y_true[0]*np.sin(x_new/x[-1]*2*np.pi), 'g--', label='True function')
axes[0].plot(x_new, spline_exact(x_new), 'b-', linewidth=2, label='s=0')
axes[0].set_title('No Smoothing (s=0)')
axes[0].legend()

# Moderate smoothing
axes[1].plot(x, y_noisy, 'ro', alpha=0.6)
axes[1].plot(x_new, np.sin(x_new), 'g--')
axes[1].plot(x_new, spline_smooth1(x_new), 'b-', linewidth=2)
axes[1].set_title('Moderate Smoothing (s=0.5)')

# Heavy smoothing
axes[2].plot(x, y_noisy, 'ro', alpha=0.6)
axes[2].plot(x_new, np.sin(x_new), 'g--')
axes[2].plot(x_new, spline_smooth2(x_new), 'b-', linewidth=2)
axes[2].set_title('Heavy Smoothing (s=2)')

plt.tight_layout()
plt.show()

---

## 4. Curve Fitting with curve_fit

`curve_fit` fits a user-defined function to data using nonlinear least squares.

### 4.1 Basic Curve Fitting

In [None]:
# Example: Fitting exponential decay
# Model: y = A * exp(-k * t) + C

def exponential_decay(t, A, k, C):
    """Exponential decay model."""
    return A * np.exp(-k * t) + C

# Generate synthetic data with noise
np.random.seed(42)
t_data = np.linspace(0, 5, 25)
A_true, k_true, C_true = 10, 0.8, 2
y_true = exponential_decay(t_data, A_true, k_true, C_true)
y_data = y_true + np.random.normal(0, 0.5, len(t_data))

# Fit the model
popt, pcov = curve_fit(exponential_decay, t_data, y_data, p0=[8, 1, 1])

# Extract fitted parameters and uncertainties
A_fit, k_fit, C_fit = popt
A_err, k_err, C_err = np.sqrt(np.diag(pcov))  # Standard errors

print("Exponential Decay Fit Results")
print("=" * 40)
print(f"{'Parameter':<10} {'True':>10} {'Fitted':>10} {'Error':>10}")
print("-" * 40)
print(f"{'A':<10} {A_true:>10.4f} {A_fit:>10.4f} {A_err:>10.4f}")
print(f"{'k':<10} {k_true:>10.4f} {k_fit:>10.4f} {k_err:>10.4f}")
print(f"{'C':<10} {C_true:>10.4f} {C_fit:>10.4f} {C_err:>10.4f}")

# Plot
t_plot = np.linspace(0, 5, 200)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(t_data, y_data, 'ro', markersize=8, label='Data')
ax.plot(t_plot, exponential_decay(t_plot, *popt), 'b-', linewidth=2, 
        label=f'Fit: A={A_fit:.2f}, k={k_fit:.2f}, C={C_fit:.2f}')
ax.plot(t_plot, exponential_decay(t_plot, A_true, k_true, C_true), 'g--', 
        linewidth=2, alpha=0.7, label='True function')
ax.set_xlabel('Time')
ax.set_ylabel('Signal')
ax.set_title('Exponential Decay Curve Fitting')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

### 4.2 Fitting with Bounds

In [None]:
# Example: Gaussian peak fitting with bounds
def gaussian(x, amplitude, center, sigma):
    """Gaussian function."""
    return amplitude * np.exp(-(x - center)**2 / (2 * sigma**2))

# Generate data
np.random.seed(42)
x_data = np.linspace(-5, 5, 50)
y_data = gaussian(x_data, 5, 0.5, 1.2) + np.random.normal(0, 0.3, len(x_data))

# Fit with bounds (all parameters must be positive)
bounds = ([0, -3, 0], [10, 3, 5])  # (lower bounds, upper bounds)
popt, pcov = curve_fit(gaussian, x_data, y_data, p0=[4, 0, 1], bounds=bounds)

print("Gaussian Fit with Bounds")
print("=" * 35)
print(f"Amplitude: {popt[0]:.4f} +/- {np.sqrt(pcov[0,0]):.4f}")
print(f"Center: {popt[1]:.4f} +/- {np.sqrt(pcov[1,1]):.4f}")
print(f"Sigma: {popt[2]:.4f} +/- {np.sqrt(pcov[2,2]):.4f}")

# Plot
x_plot = np.linspace(-5, 5, 200)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x_data, y_data, 'ro', alpha=0.6, label='Data')
ax.plot(x_plot, gaussian(x_plot, *popt), 'b-', linewidth=2, label='Gaussian fit')
ax.axvline(popt[1], color='green', linestyle='--', label=f'Center = {popt[1]:.2f}')
ax.fill_between(x_plot, gaussian(x_plot, *popt), alpha=0.2)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Gaussian Peak Fitting')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

### 4.3 Weighted Fitting

In [None]:
# Fitting with uncertainties (weighted least squares)
np.random.seed(42)

# Linear data with varying uncertainties
x_data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y_true = 2 * x_data + 1
uncertainties = np.array([0.5, 0.3, 0.4, 0.8, 0.2, 0.5, 0.6, 0.3])  # Different errors
y_data = y_true + np.random.normal(0, 1, len(x_data)) * uncertainties

def linear(x, m, b):
    return m * x + b

# Unweighted fit
popt_uw, _ = curve_fit(linear, x_data, y_data)

# Weighted fit (sigma parameter)
popt_w, pcov_w = curve_fit(linear, x_data, y_data, sigma=uncertainties, absolute_sigma=True)

print("Linear Fit Comparison")
print("=" * 40)
print(f"True parameters: m = 2.000, b = 1.000")
print(f"Unweighted fit:  m = {popt_uw[0]:.3f}, b = {popt_uw[1]:.3f}")
print(f"Weighted fit:    m = {popt_w[0]:.3f}, b = {popt_w[1]:.3f}")

# Plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.errorbar(x_data, y_data, yerr=uncertainties, fmt='ro', capsize=5, 
            markersize=8, label='Data with uncertainties')
x_plot = np.linspace(0, 9, 100)
ax.plot(x_plot, linear(x_plot, *popt_uw), 'g--', linewidth=2, label='Unweighted fit')
ax.plot(x_plot, linear(x_plot, *popt_w), 'b-', linewidth=2, label='Weighted fit')
ax.plot(x_plot, 2*x_plot + 1, 'k:', linewidth=2, alpha=0.5, label='True line')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Weighted vs Unweighted Fitting')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

### 4.4 Multi-Peak Fitting

In [None]:
# Fitting multiple Gaussian peaks
def multi_gaussian(x, *params):
    """Sum of Gaussian peaks.
    
    params: (a1, c1, s1, a2, c2, s2, ..., baseline)
    """
    y = np.zeros_like(x)
    n_peaks = (len(params) - 1) // 3
    for i in range(n_peaks):
        a = params[3*i]
        c = params[3*i + 1]
        s = params[3*i + 2]
        y += a * np.exp(-(x - c)**2 / (2 * s**2))
    y += params[-1]  # baseline
    return y

# Generate data with two peaks
np.random.seed(42)
x_data = np.linspace(0, 10, 100)
y_true = 3 * np.exp(-(x_data - 3)**2 / (2 * 0.5**2)) + \
         5 * np.exp(-(x_data - 7)**2 / (2 * 0.8**2)) + 0.5
y_data = y_true + np.random.normal(0, 0.2, len(x_data))

# Initial guesses: (a1, c1, s1, a2, c2, s2, baseline)
p0 = [3, 3, 0.5, 4, 7, 1, 0.5]
popt, pcov = curve_fit(multi_gaussian, x_data, y_data, p0=p0)

print("Two-Peak Gaussian Fit")
print("=" * 35)
print(f"Peak 1: A={popt[0]:.2f}, center={popt[1]:.2f}, sigma={popt[2]:.2f}")
print(f"Peak 2: A={popt[3]:.2f}, center={popt[4]:.2f}, sigma={popt[5]:.2f}")
print(f"Baseline: {popt[6]:.2f}")

# Plot with individual components
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(x_data, y_data, 'ko', alpha=0.5, markersize=4, label='Data')
ax.plot(x_data, multi_gaussian(x_data, *popt), 'r-', linewidth=2, label='Total fit')

# Individual peaks
peak1 = popt[0] * np.exp(-(x_data - popt[1])**2 / (2 * popt[2]**2)) + popt[6]
peak2 = popt[3] * np.exp(-(x_data - popt[4])**2 / (2 * popt[5]**2)) + popt[6]
ax.fill_between(x_data, popt[6], peak1, alpha=0.3, label='Peak 1')
ax.fill_between(x_data, popt[6], peak2, alpha=0.3, label='Peak 2')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_title('Multi-Peak Gaussian Fitting')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

---

## 5. Polynomial Fitting

Polynomial fitting is a simple and commonly used technique, but it has important limitations.

### 5.1 Using numpy.polyfit

In [None]:
# Generate sample data
np.random.seed(42)
x = np.linspace(0, 10, 20)
y_true = 0.5 * x**2 - 3*x + 2
y = y_true + np.random.normal(0, 2, len(x))

# Fit polynomials of different degrees
degrees = [1, 2, 3, 5]

fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes = axes.flatten()

x_plot = np.linspace(0, 10, 100)

for i, deg in enumerate(degrees):
    # Fit polynomial
    coeffs = np.polyfit(x, y, deg)
    poly = np.poly1d(coeffs)
    
    # Calculate R-squared
    y_pred = poly(x)
    ss_res = np.sum((y - y_pred)**2)
    ss_tot = np.sum((y - np.mean(y))**2)
    r_squared = 1 - ss_res / ss_tot
    
    axes[i].plot(x, y, 'ro', markersize=8, label='Data')
    axes[i].plot(x_plot, poly(x_plot), 'b-', linewidth=2, 
                 label=f'Degree {deg} (R² = {r_squared:.4f})')
    axes[i].plot(x_plot, 0.5*x_plot**2 - 3*x_plot + 2, 'g--', 
                 alpha=0.5, label='True function')
    axes[i].set_title(f'Polynomial Degree {deg}')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 5.2 Overfitting Problem

In [None]:
# Demonstration of overfitting with high-degree polynomials
np.random.seed(42)

# Sparse data
x = np.array([0, 1, 2, 3, 4, 5, 6])
y = np.array([1, 2.1, 1.8, 3.2, 3.0, 4.1, 4.2])

x_plot = np.linspace(-0.5, 6.5, 200)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

degrees = [2, 4, 6]
for i, deg in enumerate(degrees):
    coeffs = np.polyfit(x, y, deg)
    poly = np.poly1d(coeffs)
    
    axes[i].plot(x, y, 'ro', markersize=12, label='Data')
    axes[i].plot(x_plot, poly(x_plot), 'b-', linewidth=2)
    axes[i].set_ylim(-2, 8)
    axes[i].set_title(f'Degree {deg} Polynomial')
    axes[i].grid(True, alpha=0.3)
    
    # Highlight extrapolation issues
    axes[i].axvspan(-0.5, 0, alpha=0.2, color='red')
    axes[i].axvspan(6, 6.5, alpha=0.2, color='red')

plt.suptitle('Overfitting: Higher Degree = Worse Extrapolation', fontsize=12, y=1.02)
plt.tight_layout()
plt.show()

print("Red regions show extrapolation - notice how higher degree polynomials")
print("can have wild behavior outside the data range!")

---

## 6. Practical Examples

### 6.1 Radioactive Decay Analysis

In [None]:
# Radioactive decay: N(t) = N0 * exp(-lambda * t)
# Half-life: t_half = ln(2) / lambda

def decay_model(t, N0, half_life):
    """Radioactive decay model."""
    decay_constant = np.log(2) / half_life
    return N0 * np.exp(-decay_constant * t)

# Simulated experimental data
np.random.seed(42)
t_data = np.array([0, 5, 10, 15, 20, 25, 30, 40, 50, 60])
N0_true, half_life_true = 1000, 15  # Initial count, half-life in seconds
N_true = decay_model(t_data, N0_true, half_life_true)
# Add Poisson noise (appropriate for counting experiments)
N_data = np.random.poisson(N_true)
N_errors = np.sqrt(N_data)  # Poisson counting error

# Fit the model
popt, pcov = curve_fit(decay_model, t_data, N_data, p0=[900, 20], 
                       sigma=N_errors, absolute_sigma=True)
N0_fit, half_life_fit = popt
N0_err, half_life_err = np.sqrt(np.diag(pcov))

print("Radioactive Decay Analysis")
print("=" * 45)
print(f"Initial count N0: {N0_fit:.1f} +/- {N0_err:.1f}")
print(f"Half-life: {half_life_fit:.2f} +/- {half_life_err:.2f} seconds")
print(f"True half-life: {half_life_true} seconds")

# Plot
t_plot = np.linspace(0, 60, 200)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Linear scale
ax1 = axes[0]
ax1.errorbar(t_data, N_data, yerr=N_errors, fmt='ro', capsize=5, 
             markersize=8, label='Data')
ax1.plot(t_plot, decay_model(t_plot, *popt), 'b-', linewidth=2, 
         label=f't½ = {half_life_fit:.1f}s')
ax1.set_xlabel('Time (s)')
ax1.set_ylabel('Count')
ax1.set_title('Radioactive Decay (Linear Scale)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Log scale
ax2 = axes[1]
ax2.errorbar(t_data, N_data, yerr=N_errors, fmt='ro', capsize=5, markersize=8)
ax2.plot(t_plot, decay_model(t_plot, *popt), 'b-', linewidth=2)
ax2.set_xlabel('Time (s)')
ax2.set_ylabel('Count')
ax2.set_yscale('log')
ax2.set_title('Radioactive Decay (Log Scale)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 6.2 Enzyme Kinetics (Michaelis-Menten)

In [None]:
# Michaelis-Menten equation: v = Vmax * [S] / (Km + [S])

def michaelis_menten(S, Vmax, Km):
    """Michaelis-Menten enzyme kinetics."""
    return Vmax * S / (Km + S)

# Experimental data (substrate concentration vs reaction rate)
np.random.seed(42)
S_data = np.array([0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0, 20.0])
Vmax_true, Km_true = 100, 2.5  # True parameters
v_true = michaelis_menten(S_data, Vmax_true, Km_true)
v_data = v_true + np.random.normal(0, 3, len(S_data))

# Fit
popt, pcov = curve_fit(michaelis_menten, S_data, v_data, p0=[80, 3])
Vmax_fit, Km_fit = popt
Vmax_err, Km_err = np.sqrt(np.diag(pcov))

print("Michaelis-Menten Kinetics Analysis")
print("=" * 45)
print(f"Vmax: {Vmax_fit:.2f} +/- {Vmax_err:.2f} (true: {Vmax_true})")
print(f"Km: {Km_fit:.2f} +/- {Km_err:.2f} (true: {Km_true})")

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Standard plot
S_plot = np.linspace(0, 25, 200)
ax1 = axes[0]
ax1.plot(S_data, v_data, 'ro', markersize=10, label='Data')
ax1.plot(S_plot, michaelis_menten(S_plot, *popt), 'b-', linewidth=2, label='Fit')
ax1.axhline(Vmax_fit, color='gray', linestyle='--', alpha=0.5, label=f'Vmax = {Vmax_fit:.1f}')
ax1.axvline(Km_fit, color='green', linestyle='--', alpha=0.5, label=f'Km = {Km_fit:.1f}')
ax1.set_xlabel('[S] (mM)')
ax1.set_ylabel('v (rate)')
ax1.set_title('Michaelis-Menten Plot')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Lineweaver-Burk plot (double reciprocal)
ax2 = axes[1]
ax2.plot(1/S_data, 1/v_data, 'ro', markersize=10, label='Data')
S_lb = np.linspace(0.08, 15, 100)
ax2.plot(1/S_lb, 1/michaelis_menten(S_lb, *popt), 'b-', linewidth=2, label='Fit')
ax2.set_xlabel('1/[S]')
ax2.set_ylabel('1/v')
ax2.set_title('Lineweaver-Burk Plot')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Exercises

Practice what you've learned with these exercises.

### Exercise 1: Temperature Interpolation

Weather stations recorded temperatures at 6-hour intervals. Use interpolation to estimate temperatures at any time.

1. Create a cubic spline interpolation
2. Find the estimated temperature at 10:00 AM and 3:30 PM
3. Find the time of maximum temperature
4. Plot the data and interpolation

In [None]:
# Time in hours from midnight
hours = np.array([0, 6, 12, 18, 24])
temperatures = np.array([12, 10, 22, 18, 13])  # Celsius

# Your code here


<details>
<summary>Click to see solution</summary>

```python
# 1. Create cubic spline interpolation
cs = interpolate.CubicSpline(hours, temperatures)

# 2. Estimate temperatures
temp_10am = cs(10)
temp_330pm = cs(15.5)
print("Temperature Interpolation Results")
print("=" * 40)
print(f"Temperature at 10:00 AM: {temp_10am:.1f}°C")
print(f"Temperature at 3:30 PM: {temp_330pm:.1f}°C")

# 3. Find maximum temperature
# First derivative = 0 at maximum
from scipy.optimize import minimize_scalar

# Minimize the negative to find maximum
result = minimize_scalar(lambda x: -cs(x), bounds=(0, 24), method='bounded')
max_time = result.x
max_temp = cs(max_time)

print(f"\nMaximum temperature: {max_temp:.1f}°C at {max_time:.1f}:00")
print(f"({int(max_time)}:{int((max_time % 1) * 60):02d})")

# 4. Plot
fig, ax = plt.subplots(figsize=(10, 5))
hours_plot = np.linspace(0, 24, 200)
ax.plot(hours, temperatures, 'ro', markersize=12, label='Measurements')
ax.plot(hours_plot, cs(hours_plot), 'b-', linewidth=2, label='Cubic spline')
ax.plot(10, temp_10am, 'g^', markersize=10, label=f'10:00 AM: {temp_10am:.1f}°C')
ax.plot(15.5, temp_330pm, 'g^', markersize=10, label=f'3:30 PM: {temp_330pm:.1f}°C')
ax.plot(max_time, max_temp, 'r*', markersize=15, label=f'Max: {max_temp:.1f}°C')
ax.set_xlabel('Hour of day')
ax.set_ylabel('Temperature (°C)')
ax.set_title('Daily Temperature Interpolation')
ax.set_xticks([0, 6, 12, 18, 24])
ax.set_xticklabels(['00:00', '06:00', '12:00', '18:00', '24:00'])
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
```
</details>

### Exercise 2: Exponential Growth Fitting

Bacteria population data follows exponential growth: N(t) = N0 * exp(r*t)

1. Fit the exponential model to the data
2. Calculate the doubling time (t_double = ln(2)/r)
3. Predict the population at t=12 hours
4. Plot with confidence bands

In [None]:
# Time in hours
t_data = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8])
# Population (thousands)
population = np.array([10, 15, 22, 35, 51, 78, 110, 170, 255])

# Your code here


<details>
<summary>Click to see solution</summary>

```python
def exponential_growth(t, N0, r):
    """Exponential growth model."""
    return N0 * np.exp(r * t)

# 1. Fit the model
popt, pcov = curve_fit(exponential_growth, t_data, population, p0=[10, 0.4])
N0_fit, r_fit = popt
N0_err, r_err = np.sqrt(np.diag(pcov))

print("Exponential Growth Fit Results")
print("=" * 40)
print(f"Initial population N0: {N0_fit:.2f} +/- {N0_err:.2f} thousand")
print(f"Growth rate r: {r_fit:.4f} +/- {r_err:.4f} per hour")

# 2. Calculate doubling time
t_double = np.log(2) / r_fit
t_double_err = np.log(2) / r_fit**2 * r_err  # Error propagation
print(f"\nDoubling time: {t_double:.2f} +/- {t_double_err:.2f} hours")

# 3. Predict at t=12
pop_12 = exponential_growth(12, *popt)
print(f"\nPredicted population at t=12: {pop_12:.0f} thousand")

# 4. Plot with confidence bands
t_plot = np.linspace(0, 12, 100)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(t_data, population, 'ro', markersize=10, label='Data')
ax.plot(t_plot, exponential_growth(t_plot, *popt), 'b-', linewidth=2, 
        label=f'Fit: N0={N0_fit:.1f}, r={r_fit:.3f}')

# Confidence bands (approximate using parameter uncertainties)
y_upper = exponential_growth(t_plot, N0_fit + N0_err, r_fit + r_err)
y_lower = exponential_growth(t_plot, N0_fit - N0_err, r_fit - r_err)
ax.fill_between(t_plot, y_lower, y_upper, alpha=0.2, color='blue', 
                label='Confidence band')

# Mark prediction
ax.plot(12, pop_12, 'g*', markersize=15, label=f'Prediction at t=12: {pop_12:.0f}k')

ax.set_xlabel('Time (hours)')
ax.set_ylabel('Population (thousands)')
ax.set_title(f'Bacterial Growth (Doubling time: {t_double:.2f} hours)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
```
</details>

### Exercise 3: Smoothing Noisy Data

Signal processing often requires smoothing noisy data while preserving the underlying trend.

1. Create a UnivariateSpline with appropriate smoothing
2. Compare different smoothing levels
3. Calculate the residuals (data - fit)
4. Choose the best smoothing parameter

In [None]:
# Noisy signal data
np.random.seed(42)
x = np.linspace(0, 4*np.pi, 50)
y_true = np.sin(x) + 0.5 * np.sin(3*x)
y_noisy = y_true + np.random.normal(0, 0.3, len(x))

# Your code here


<details>
<summary>Click to see solution</summary>

```python
# Try different smoothing levels
smoothing_values = [0, 1, 5, 20]

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

x_plot = np.linspace(0, 4*np.pi, 200)

for i, s in enumerate(smoothing_values):
    spline = interpolate.UnivariateSpline(x, y_noisy, s=s)
    y_spline = spline(x_plot)
    
    # Calculate residuals and RMSE
    residuals = y_noisy - spline(x)
    rmse = np.sqrt(np.mean(residuals**2))
    
    axes[i].plot(x, y_noisy, 'ko', alpha=0.5, markersize=5, label='Noisy data')
    axes[i].plot(x_plot, y_true[0]*np.sin(x_plot) + 0.5*np.sin(3*x_plot), 
                 'g--', alpha=0.5, label='True signal')
    axes[i].plot(x_plot, y_spline, 'r-', linewidth=2, label=f's={s}')
    axes[i].set_title(f'Smoothing s={s} (RMSE={rmse:.3f})')
    axes[i].legend()
    axes[i].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Compare residuals
print("Residual Analysis")
print("=" * 40)
for s in smoothing_values:
    spline = interpolate.UnivariateSpline(x, y_noisy, s=s)
    residuals = y_noisy - spline(x)
    print(f"s={s:>2}: RMSE={np.sqrt(np.mean(residuals**2)):.4f}, "
          f"Std={np.std(residuals):.4f}")

# Cross-validation approach to find optimal smoothing
# (Leave-one-out cross-validation)
from scipy.interpolate import UnivariateSpline

print("\nOptimal Smoothing Selection:")
print("Smoothing s=1-5 typically works well for this noise level.")
print("Visual inspection suggests s=5 balances smoothing and fidelity.")
```
</details>

### Exercise 4: Polynomial vs Spline Comparison

Compare polynomial fitting and spline interpolation for the same data.

1. Fit polynomials of degrees 3, 5, and 7
2. Create a cubic spline
3. Compare extrapolation behavior
4. Discuss which method is more appropriate

In [None]:
# Sample data
x_data = np.array([0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4])
y_data = np.array([1, 1.8, 2.5, 2.3, 2.0, 2.2, 2.8, 3.5, 4.2])

# Your code here


<details>
<summary>Click to see solution</summary>

```python
# Extended range for extrapolation
x_plot = np.linspace(-0.5, 5, 200)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1 & 2. Fit polynomials and spline
poly3 = np.poly1d(np.polyfit(x_data, y_data, 3))
poly5 = np.poly1d(np.polyfit(x_data, y_data, 5))
poly7 = np.poly1d(np.polyfit(x_data, y_data, 7))
spline = interpolate.CubicSpline(x_data, y_data)

# Plot polynomial degree 3
axes[0, 0].plot(x_data, y_data, 'ro', markersize=10, label='Data')
axes[0, 0].plot(x_plot, poly3(x_plot), 'b-', linewidth=2, label='Degree 3')
axes[0, 0].set_title('Polynomial Degree 3')
axes[0, 0].axvspan(-0.5, 0, alpha=0.2, color='yellow')
axes[0, 0].axvspan(4, 5, alpha=0.2, color='yellow')
axes[0, 0].set_ylim(-2, 10)
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot polynomial degree 5
axes[0, 1].plot(x_data, y_data, 'ro', markersize=10, label='Data')
axes[0, 1].plot(x_plot, poly5(x_plot), 'g-', linewidth=2, label='Degree 5')
axes[0, 1].set_title('Polynomial Degree 5')
axes[0, 1].axvspan(-0.5, 0, alpha=0.2, color='yellow')
axes[0, 1].axvspan(4, 5, alpha=0.2, color='yellow')
axes[0, 1].set_ylim(-2, 10)
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Plot polynomial degree 7
axes[1, 0].plot(x_data, y_data, 'ro', markersize=10, label='Data')
axes[1, 0].plot(x_plot, poly7(x_plot), 'm-', linewidth=2, label='Degree 7')
axes[1, 0].set_title('Polynomial Degree 7')
axes[1, 0].axvspan(-0.5, 0, alpha=0.2, color='yellow')
axes[1, 0].axvspan(4, 5, alpha=0.2, color='yellow')
axes[1, 0].set_ylim(-2, 10)
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot cubic spline
# Spline only for interpolation range
x_interp = np.linspace(0, 4, 200)
axes[1, 1].plot(x_data, y_data, 'ro', markersize=10, label='Data')
axes[1, 1].plot(x_interp, spline(x_interp), 'c-', linewidth=2, label='Cubic Spline')
axes[1, 1].set_title('Cubic Spline (Interpolation Only)')
axes[1, 1].set_ylim(-2, 10)
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 4. Discussion
print("Comparison Summary")
print("=" * 50)
print("\nPolynomial Fitting:")
print("- Simple and fast")
print("- Can have wild extrapolation behavior")
print("- Higher degrees can overfit and oscillate")
print("- Global fit: changing one point affects entire curve")
print("\nCubic Spline:")
print("- Local fit: changes affect only nearby region")
print("- Smooth (continuous second derivative)")
print("- Better behaved for interpolation")
print("- Should not be used for extrapolation")
print("\nRecommendation: Use splines for interpolation,")
print("low-degree polynomials for simple trends.")
```
</details>

### Exercise 5: Fitting a Custom Model

A chemical reaction follows the Arrhenius equation: k = A * exp(-Ea / (R * T))

Where:
- k = rate constant
- A = pre-exponential factor
- Ea = activation energy (J/mol)
- R = gas constant (8.314 J/(mol*K))
- T = temperature (K)

1. Define the Arrhenius model
2. Fit to experimental data
3. Calculate activation energy with uncertainty
4. Create an Arrhenius plot (ln(k) vs 1/T)

In [None]:
# Experimental data
# Temperature in Kelvin
T_data = np.array([300, 320, 340, 360, 380, 400, 420, 440])
# Rate constant (arbitrary units)
k_data = np.array([0.0015, 0.0045, 0.012, 0.028, 0.058, 0.11, 0.19, 0.32])

R = 8.314  # Gas constant J/(mol*K)

# Your code here


<details>
<summary>Click to see solution</summary>

```python
# 1. Define Arrhenius model
def arrhenius(T, A, Ea):
    """Arrhenius equation for reaction rate."""
    return A * np.exp(-Ea / (R * T))

# 2. Fit the model
# Initial guess: A=1e10, Ea=50000 J/mol
popt, pcov = curve_fit(arrhenius, T_data, k_data, p0=[1e10, 50000])
A_fit, Ea_fit = popt
A_err, Ea_err = np.sqrt(np.diag(pcov))

# 3. Results
print("Arrhenius Analysis Results")
print("=" * 50)
print(f"Pre-exponential factor A: {A_fit:.2e} +/- {A_err:.2e}")
print(f"Activation energy Ea: {Ea_fit/1000:.2f} +/- {Ea_err/1000:.2f} kJ/mol")

# 4. Create Arrhenius plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Standard plot
T_plot = np.linspace(290, 450, 100)
ax1 = axes[0]
ax1.plot(T_data, k_data, 'ro', markersize=10, label='Data')
ax1.plot(T_plot, arrhenius(T_plot, *popt), 'b-', linewidth=2, label='Arrhenius fit')
ax1.set_xlabel('Temperature (K)')
ax1.set_ylabel('Rate constant k')
ax1.set_title('Rate Constant vs Temperature')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Arrhenius plot (linearized)
ax2 = axes[1]
ax2.plot(1000/T_data, np.log(k_data), 'ro', markersize=10, label='Data')
ax2.plot(1000/T_plot, np.log(arrhenius(T_plot, *popt)), 'b-', linewidth=2, 
         label=f'Slope = -Ea/R = {-Ea_fit/(R*1000):.2f}')
ax2.set_xlabel('1000/T (K⁻¹)')
ax2.set_ylabel('ln(k)')
ax2.set_title('Arrhenius Plot (Linearized)')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Verify with linear fit on Arrhenius plot
print("\nVerification using linear fit on Arrhenius plot:")
slope, intercept = np.polyfit(1/T_data, np.log(k_data), 1)
Ea_linear = -slope * R
print(f"Ea from linear fit: {Ea_linear/1000:.2f} kJ/mol")
```
</details>

---

## Summary

In this notebook, you learned:

1. **1D Interpolation with interp1d**
   - Linear, nearest, quadratic, and cubic interpolation
   - Boundary handling and extrapolation options

2. **Spline Interpolation**
   - CubicSpline for smooth interpolation with derivatives
   - Different boundary conditions
   - UnivariateSpline for smoothing noisy data

3. **Curve Fitting with curve_fit**
   - Fitting custom models to data
   - Parameter bounds and constraints
   - Weighted fitting with uncertainties
   - Multi-peak fitting

4. **Polynomial Fitting**
   - Using numpy.polyfit
   - Understanding overfitting limitations

5. **Practical Applications**
   - Radioactive decay analysis
   - Enzyme kinetics (Michaelis-Menten)
   - Arrhenius equation fitting

---

## Next Steps

Continue your SciPy journey with the next notebook:

**[03_optimization.ipynb](03_optimization.ipynb)** - Learn about optimization algorithms, including minimization, root finding, and constrained optimization.