In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg
np.random.seed(42)  # For reproducibility

---
## Exercise 1: Scalar Kalman Update (1D BLUE)

### Context
Consider estimating the temperature $x$ at a single location. This is the simplest case of data assimilation.

**Given:**
- Background (prior) estimate: $x^b = 20°C$ with variance $\sigma_b^2 = 4$ (i.e., std. dev. = 2°C)
- Observation: $y = 23°C$ with variance $\sigma_o^2 = 1$ (i.e., std. dev. = 1°C)

### Questions

**Q1.1** Write down the 1D Kalman gain formula:
$$K = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_o^2}$$

Compute $K$ for the given values. What does this value tell you about the relative trust in model vs. observation?

**Q1.2** Compute the analysis (posterior mean):
$$x^a = x^b + K(y - x^b)$$

**Q1.3** Compute the analysis variance (posterior variance):
$$\sigma_a^2 = (1 - K)\sigma_b^2$$

Is the posterior uncertainty smaller than both prior and observation uncertainties? Why?

**Q1.4** Now suppose the sensor is less accurate: $\sigma_o^2 = 16$. Recompute $K$, $x^a$, and $\sigma_a^2$. How does this change the analysis?

### Guidelines
- Start by implementing the formulas step by step
- Compare your results with intuition: if $\sigma_o^2 \ll \sigma_b^2$, the analysis should be close to $y$
- The analysis variance should always be smaller than both input variances

In [None]:
# Exercise 1: Your code here

# Given values
x_b = 20.0  # Background estimate (°C)
sigma_b_sq = 4.0  # Background variance
y = 23.0  # Observation (°C)
sigma_o_sq = 1.0  # Observation variance

# Q1.1: Compute Kalman gain K
# K = ...

# Q1.2: Compute analysis x_a
# x_a = ...

# Q1.3: Compute analysis variance
# sigma_a_sq = ...

# Print your results

In [None]:
# Q1.4: Less accurate sensor
sigma_o_sq_noisy = 16.0

# Recompute K, x_a, sigma_a_sq

---
## Exercise 2: Vector State with Multiple Observations

### Context
This exercise extends to a 2D state, similar to having two parameters in a reduced basis context.

**Given:**
- State: $\mathbf{x} = [x_1, x_2]^\top$ (e.g., temperature at two locations)
- Background: $\mathbf{x}^b = [20, 15]^\top$
- Background covariance: $\mathbf{B} = \begin{pmatrix} 4 & 1 \\ 1 & 2 \end{pmatrix}$ (note the correlation!)
- Observation: We only observe the first component: $y = 23$ with variance $R = 1$
- Observation operator: $\mathbf{H} = [1, 0]$ (like a GEIM sensor $\sigma_1$ that measures only $x_1$)

### Questions

**Q2.1** Compute $\mathbf{H}\mathbf{B}\mathbf{H}^\top + R$. What is the dimension of this matrix?

**Q2.2** Compute the Kalman gain:
$$\mathbf{K} = \mathbf{B}\mathbf{H}^\top(\mathbf{H}\mathbf{B}\mathbf{H}^\top + R)^{-1}$$

What is the dimension of $\mathbf{K}$? Why does the observation at location 1 affect the analysis at location 2?

**Q2.3** Compute the analysis:
$$\mathbf{x}^a = \mathbf{x}^b + \mathbf{K}(y - \mathbf{H}\mathbf{x}^b)$$

**Q2.4** Compute the analysis covariance:
$$\mathbf{P}^a = (\mathbf{I} - \mathbf{K}\mathbf{H})\mathbf{B}$$

Has the uncertainty decreased for both components?

**Q2.5** (Bonus) What happens if $\mathbf{B}$ were diagonal (no correlation)? Would the observation at location 1 still affect the estimate at location 2?

### Guidelines
- Use numpy for matrix operations
- Remember: $\mathbf{H}$ should be a 2D array even for a single observation (shape `(1, 2)`)
- The correlation in $\mathbf{B}$ is what allows information to spread spatially

In [None]:
# Exercise 2: Your code here

# Given values
x_b = np.array([20.0, 15.0])
B = np.array([[4.0, 1.0],
              [1.0, 2.0]])
H = np.array([[1.0, 0.0]])  # Shape (1, 2)
R = np.array([[1.0]])  # Shape (1, 1)
y = np.array([23.0])  # Shape (1,)

# Q2.1: Compute H @ B @ H.T + R

# Q2.2: Compute Kalman gain K

# Q2.3: Compute analysis x_a

# Q2.4: Compute analysis covariance P_a

In [None]:
# Q2.5: Bonus - Diagonal B (no correlation)
B_diag = np.array([[4.0, 0.0],
                   [0.0, 2.0]])

# Recompute K and x_a

---
## Exercise 3: Connection to PBDW

### Context
Recall that PBDW solves:
$$u^* = \arg\min_{z \in u_{\mathrm{bg}} + V_N} \|z - u_{\mathrm{bg}}\|^2 \quad \text{s.t.} \quad \sigma_m(z) = y_m$$

while BLUE solves:
$$\mathbf{x}^a = \arg\min_{\mathbf{x}} \|\mathbf{x} - \mathbf{x}^b\|_{\mathbf{B}^{-1}}^2 + \|\mathbf{y} - \mathbf{H}\mathbf{x}\|_{\mathbf{R}^{-1}}^2$$

### Questions

**Q3.1** Implement the BLUE cost function $J(\mathbf{x})$ and evaluate it at:
- $\mathbf{x} = \mathbf{x}^b$ (background)
- $\mathbf{x} = \mathbf{x}^a$ (analysis from Exercise 2)
- A point that perfectly matches observations: $\mathbf{x}_{\text{obs}} = [23, 15]^\top$

Which point has the lowest cost?

**Q3.2** What happens to BLUE when $\mathbf{R} \to 0$ (perfect observations)? How does this relate to PBDW?

**Q3.3** What happens when $\mathbf{B} \to \infty$ (no prior information)? What is $\mathbf{K}$ in this limit?

### Guidelines
- Use the Mahalanobis norm: $\|\mathbf{x}\|_{\mathbf{A}^{-1}}^2 = \mathbf{x}^\top \mathbf{A}^{-1} \mathbf{x}$
- Think about the limiting cases physically

In [None]:
# Exercise 3: Your code here

def blue_cost(x, x_b, B, y, H, R):
    """Compute the BLUE cost function J(x).
    
    J(x) = (x - x_b)^T B^{-1} (x - x_b) + (y - Hx)^T R^{-1} (y - Hx)
    """
    # Your implementation here
    pass

# Use values from Exercise 2
# Evaluate J at different points

---
## Exercise 4: Sequential Kalman Filter

### Context
Now we move to the time-dependent case. Consider a simple 1D dynamical system:
$$x_{k+1} = a \cdot x_k + w_k, \quad w_k \sim \mathcal{N}(0, Q)$$
$$y_k = x_k + v_k, \quad v_k \sim \mathcal{N}(0, R)$$

This is like tracking a temperature that evolves slowly in time.

**Given:**
- Model parameter: $a = 0.95$ (slight decay toward zero)
- Process noise variance: $Q = 0.5$
- Observation noise variance: $R = 2.0$
- Initial estimate: $x_0^a = 10$, $P_0^a = 1$
- True initial state: $x_0^{\text{true}} = 12$

### Questions

**Q4.1** Implement the forecast step:
- $x_k^f = a \cdot x_{k-1}^a$
- $P_k^f = a^2 \cdot P_{k-1}^a + Q$

**Q4.2** Implement the analysis step:
- $K_k = P_k^f / (P_k^f + R)$
- $x_k^a = x_k^f + K_k(y_k - x_k^f)$
- $P_k^a = (1 - K_k) P_k^f$

**Q4.3** Generate synthetic observations for 50 time steps and run the Kalman filter.

**Q4.4** Plot:
- True state
- Observations
- Analysis (with ±2σ confidence interval)

### Guidelines
- Generate true state: $x_{k+1}^{\text{true}} = a \cdot x_k^{\text{true}} + w_k$
- Generate observations: $y_k = x_k^{\text{true}} + v_k$
- Store results in arrays for plotting

In [None]:
# Exercise 4: Your code here

# Parameters
a = 0.95  # Model parameter
Q = 0.5   # Process noise variance
R = 2.0   # Observation noise variance
n_steps = 50

# Initial conditions
x_true_0 = 12.0
x_a_0 = 10.0
P_a_0 = 1.0

# Generate true state and observations
# ...

# Run Kalman filter
# ...

# Plot results
# ...

---
## Exercise 5: Effect of Observation Density

### Context
In GEIM/PBDW, we discussed optimal sensor placement. Here we explore how observation frequency affects Kalman filter performance.

### Questions

**Q5.1** Modify Exercise 4 to assimilate observations only every $k_{\text{obs}}$ time steps (e.g., every 5 steps).

**Q5.2** Compare the filter performance (RMSE) for:
- Observations at every time step
- Observations every 5 time steps
- Observations every 10 time steps

**Q5.3** What is the connection to GEIM sensor placement? How does the correlation length in $\mathbf{B}$ relate to optimal observation spacing?

### Guidelines
- RMSE = $\sqrt{\frac{1}{N}\sum_k (x_k^a - x_k^{\text{true}})^2}$
- When no observation is available, only do the forecast step

In [None]:
# Exercise 5: Your code here

def run_kf_with_obs_interval(obs_interval, n_steps=50):
    """Run Kalman filter with observations every obs_interval steps."""
    # Your implementation here
    pass

# Compare different observation intervals

---
## Exercise 6: 2D Heat Equation (Advanced)

### Context
This exercise connects directly to the reduced basis setting. Consider a discretized 1D heat equation on a rod with 10 nodes.

**Model:**
$$\mathbf{x}_{k+1} = \mathbf{M}\mathbf{x}_k + \mathbf{w}_k$$

where $\mathbf{M}$ is a tridiagonal diffusion matrix.

**Observations:**
We have sensors at positions 3 and 7 (0-indexed), similar to GEIM magic points.

### Questions

**Q6.1** Set up the problem:
- Create the diffusion matrix $\mathbf{M}$ (use $\alpha = 0.1$ for diffusion coefficient)
- Create observation operator $\mathbf{H}$ for sensors at positions 3 and 7
- Define $\mathbf{Q}$ (small process noise) and $\mathbf{R}$ (observation noise)

**Q6.2** Implement the full Kalman filter for this system.

**Q6.3** Visualize the true state, observations, and analysis as a space-time plot.

**Q6.4** Compare with a scenario where sensors are at positions 2 and 3 (clustered). How does sensor placement affect reconstruction quality?

### Guidelines
- Diffusion matrix: $M_{i,i} = 1 - 2\alpha$, $M_{i,i\pm1} = \alpha$
- Start with a Gaussian initial condition
- Use `plt.imshow` for space-time plots

In [None]:
# Exercise 6: Your code here

n_x = 10  # Number of spatial points
alpha = 0.1  # Diffusion coefficient
n_steps = 30

# Q6.1: Set up the problem
# Create diffusion matrix M
# Create observation operator H for sensors at positions 3 and 7
# Define Q and R

# Q6.2: Implement Kalman filter

# Q6.3: Visualize results

---
## Summary Questions

After completing the exercises, answer these conceptual questions:

1. **What is the role of the Kalman gain $\mathbf{K}$?** How does it balance prior information and observations?

2. **How does the correlation structure in $\mathbf{B}$ affect the analysis?** Why is this important for spatial problems?

3. **What are the main differences between PBDW and Kalman filter?**
   - Treatment of uncertainty
   - Treatment of observations
   - Time-dependence

4. **When would you prefer Kalman filter over PBDW, and vice versa?**

5. **What are the computational challenges of Kalman filter for large-scale problems?** (Hint: think about the dimension of $\mathbf{P}$)

In [None]:
# Space for your answers (as comments or markdown)