# Model Uncertainty Analysis in Nonlinear Optimization

Understanding parameter uncertainties is crucial in nonlinear inverse problems because:
1. **Data contain noise** → Parameter estimates contain uncertainty
2. **Nonlinear models** → Uncertainty propagation is complex
3. **Decision making** requires confidence intervals and reliability estimates

## 🎯 Overview of Uncertainty Methods

| Method | Type | Computational Cost | Assumptions | Best For |
|--------|------|-------------------|-------------|----------|
| **Linear Approximation** | Local | Low | Near-linear behavior | Quick estimates |
| **Bootstrap** | Resampling | High | None | Robust estimates |
| **Jackknife** | Resampling | Medium | Smooth statistics | Bias correction |
| **Monte Carlo** | Simulation | Very High | Known data distribution | Full uncertainty |

---

## 1. 📐 Local (Approximate) Linearization Methods

### 1.1 Jacobian-Based Covariance Estimation

For a nonlinear least-squares problem: $E(\mathbf{m}) = ||\mathbf{d} - \mathbf{g}(\mathbf{m})||^2$

**Theoretical Foundation:**
- **Assumption**: Near the minimum, the objective function behaves quadratically
- **Linear approximation**: $\mathbf{g}(\mathbf{m}) \approx \mathbf{g}(\mathbf{m}_0) + \mathbf{J}(\mathbf{m} - \mathbf{m}_0)$
- **Result**: Parameter covariance matrix becomes analytically tractable

**Mathematical Framework:**

The **Jacobian matrix** at the solution $\hat{\mathbf{m}}$:
$$\mathbf{J}_{ij} = \frac{\partial g_i}{\partial m_j}\bigg|_{\mathbf{m}=\hat{\mathbf{m}}}$$

**Parameter covariance matrix**:
$$\mathbf{C}_m = \sigma^2 (\mathbf{J}^T\mathbf{J})^{-1}$$

where $\sigma^2$ is the data variance estimate:
$$\sigma^2 = \frac{E(\hat{\mathbf{m}})}{N - M}$$
- $N$ = number of data points
- $M$ = number of model parameters

**Parameter standard errors**:
$$\sigma_{m_i} = \sqrt{[\mathbf{C}_m]_{ii}}$$

**95% Confidence intervals**:
$$\hat{m}_i \pm 1.96 \sigma_{m_i}$$

### 1.2 Finite-Difference Hessian Method

When analytical derivatives are difficult, use **numerical approximation**:

**Hessian matrix** (second derivatives):
$$H_{ij} = \frac{\partial^2 E}{\partial m_i \partial m_j}$$

**Finite-difference approximation**:
$$H_{ij} \approx \frac{E(m_i + \epsilon, m_j + \epsilon) - E(m_i + \epsilon, m_j - \epsilon) - E(m_i - \epsilon, m_j + \epsilon) + E(m_i - \epsilon, m_j - \epsilon)}{4\epsilon^2}$$

**Parameter covariance**:
$$\mathbf{C}_m = \frac{1}{2}\mathbf{H}^{-1}$$

**Advantages**: No analytical derivatives required
**Disadvantages**: $(M^2)$ function evaluations needed

---

## 2. 🔄 Bootstrap Resampling Method

### 2.1 How Bootstrap Works

#### 🔹 The Analogy

**"The sample is to the population as the bootstrap sample is to the sample."**

Formally:

$$\text{Population} \longrightarrow \text{Sample} \longrightarrow \text{Bootstrap Sample}$$

This expresses a nested analogy of how information flows:

| Level | What it Represents | What We Know | What We Want |
|-------|-------------------|--------------|--------------|
| **Population** | The real world (true distribution $F$) | Unknown | Theoretical sampling distribution of our estimator |
| **Sample** | Our finite dataset (empirical distribution $\hat{F}$) | Known | Approximation of the population |
| **Bootstrap Sample** | A resample from the sample (with replacement) | We can generate many | Approximation of how the estimator would vary across repeated samples from $F$ |

#### 🔹 In Words

1. **We only have one dataset** from an unknown population.

2. If we could **repeatedly sample from the true population**, we'd see how our estimator (mean, regression coefficient, slip rate, etc.) fluctuates — that's the **true sampling distribution**.

3. **But we can't resample the population** — it's gone! 

4. So we **pretend the sample we have is a mini-version of the population**.

5. Then:
   - Each **bootstrap sample** (drawn with replacement from our data) **mimics what would happen** if we took a new sample from the population.

So, conceptually:

$$\text{Population} \xrightarrow{\text{sample once}} \text{Sample} \xrightarrow{\text{resample many times}} \text{Bootstrap samples}$$

and the **distribution of estimates from bootstrap samples** approximates the **distribution of estimates from true samples**.

#### 🔹 Example

Imagine measuring **vertical GPS velocities at 20 stations** near a fault.

- The **true region** (all possible GPS sites) = **population**
- Your **20 observed stations** = **sample**  
- Each **bootstrap resample** (20 points drawn with replacement from those 20) = **bootstrap sample**

You fit your fault-slip model to each bootstrap sample. The **spread of estimated slip rates** across those resamples tells you:

*"If I had gone back and measured a different random set of 20 GPS stations from the same population, how much might my estimated slip rate have varied?"*

**That's the bootstrap principle in action.**

#### 🔹 Bootstrap Algorithm

**Step-by-step procedure**:
1. **Original dataset**: $\{\mathbf{d}, \mathbf{t}\}$ with $N$ observations
2. **Create bootstrap sample**: Randomly sample $N$ points **with replacement**
3. **Solve inverse problem**: Find $\hat{\mathbf{m}}_b$ for bootstrap sample
4. **Repeat**: Generate $B$ bootstrap samples (typically $B = 1000-10000$)
5. **Analyze distribution**: Compute statistics from $\{\hat{\mathbf{m}}_1, \hat{\mathbf{m}}_2, ..., \hat{\mathbf{m}}_B\}$

### 2.2 Mathematical Foundation

**Central Limit Theorem Application**:
If $\hat{\mathbf{m}}$ is an estimator of $\mathbf{m}_{true}$, then for large sample sizes:

$$\hat{\mathbf{m}} \sim \mathcal{N}(\mathbf{m}_{true}, \mathbf{C}_m)$$

**Bootstrap estimators**:

**Mean**: $\bar{\mathbf{m}}_{boot} = \frac{1}{B}\sum_{b=1}^{B} \hat{\mathbf{m}}_b$

**Covariance**: $\mathbf{C}_{boot} = \frac{1}{B-1}\sum_{b=1}^{B} (\hat{\mathbf{m}}_b - \bar{\mathbf{m}}_{boot})(\hat{\mathbf{m}}_b - \bar{\mathbf{m}}_{boot})^T$

**Confidence intervals**: Use empirical quantiles from bootstrap distribution

### 2.3 Bootstrap Advantages

✅ **Model-free**: No assumptions about parameter distributions  
✅ **Nonlinear-friendly**: Captures asymmetric uncertainties  
✅ **Realistic**: Accounts for actual data distribution  
✅ **Flexible**: Works with any optimization algorithm  
✅ **Bias correction**: Can detect and correct estimator bias  

### 2.4 Bootstrap Limitations

❌ **Computational cost**: Requires $B$ complete optimizations  
❌ **Sample dependence**: Quality depends on original sample representativeness  
❌ **Convergence issues**: Each bootstrap optimization must converge  

---

## 3. 🔪 Jackknife Method

### 3.1 Jackknife Procedure

**Leave-one-out resampling**:
1. **Original dataset**: $N$ observations
2. **Create jackknife sample**: Remove observation $i$, keep remaining $N-1$
3. **Solve**: Find $\hat{\mathbf{m}}_{-i}$ using reduced dataset
4. **Repeat**: For all $i = 1, 2, ..., N$ (exactly $N$ samples)
5. **Compute statistics**: Analyze $\{\hat{\mathbf{m}}_{-1}, \hat{\mathbf{m}}_{-2}, ..., \hat{\mathbf{m}}_{-N}\}$

### 3.2 Jackknife Estimators

**Bias-corrected estimate**:
$$\hat{\mathbf{m}}_{jack} = N\hat{\mathbf{m}} - \frac{N-1}{N}\sum_{i=1}^{N} \hat{\mathbf{m}}_{-i}$$

**Variance estimate**:
$$\text{Var}_{jack}(\hat{\mathbf{m}}) = \frac{N-1}{N}\sum_{i=1}^{N} (\hat{\mathbf{m}}_{-i} - \bar{\mathbf{m}}_{jack})^2$$

where $\bar{\mathbf{m}}_{jack} = \frac{1}{N}\sum_{i=1}^{N} \hat{\mathbf{m}}_{-i}$

### 3.3 Jackknife vs Bootstrap

| Aspect | Jackknife | Bootstrap |
|--------|-----------|-----------|
| **Samples** | Exactly $N$ | Typically $B >> N$ |
| **Sampling** | Deterministic | Random |
| **Bias correction** | Built-in | Optional |
| **Computational cost** | Lower | Higher |
| **Uncertainty estimates** | More conservative | More detailed |

---

## 4. 🎲 Theoretical Comparison

### 4.1 When Each Method Works Best

**Linear Approximation**:
- ✅ Objective function is nearly quadratic near minimum
- ✅ Large datasets (asymptotic regime)
- ✅ Quick uncertainty estimates needed

**Bootstrap**:
- ✅ Nonlinear problems with complex parameter distributions
- ✅ Sufficient computational resources
- ✅ Robust uncertainty quantification needed

**Jackknife**:
- ✅ Small to moderate datasets
- ✅ Bias correction important
- ✅ Computational budget limited
