# EnKF on the Lorenz-63 system
Master 2 exercise: implement and analyze an Ensemble Kalman Filter (EnKF) on the Lorenz-63 model.
Goals: (i) code Lorenz-63 and visualize chaos, (ii) illustrate the butterfly effect, (iii) assimilate noisy observations with an EnKF when the forecast model is imperfect (coarse time-stepping, parameter bias), (iv) discuss filters' behavior (spread, RMSE, inflation, partial obs).


## Model and parameters
Lorenz-63 ODE on $\mathbb{R}^3$:\n$\dot x = \sigma (y - x)$, $\dot y = x(\rho - z) - y$, $\dot z = xy - \beta z$.\nDefault: $\sigma = 10$, $\rho = 28$, $\beta = 8/3$. Domain: all $\mathbb{R}^3$, solutions stay on the Lorenz attractor for these parameters.
Time step suggestion: $\Delta t = 0.01$ with RK4 for a trusted "truth".
Observations: full state with additive Gaussian noise $R = \sigma_o^2 I$ unless otherwise specified.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(precision=3, suppress=True)
rng = np.random.default_rng(0)
plt.style.use('seaborn-v0_8')


---
## Q1. Implement the model and integrators
Tasks:
- Implement `lorenz63(x, sigma, rho, beta)`.
- Implement `rk4_step` (reference integrator) and `euler_step` (coarse/degraded).
- Quick test: integrate with RK4 for 2000 steps (dt=0.01) from x0=[1,1,1] and plot (x,z) projection.


In [None]:
def lorenz63(x, sigma=10.0, rho=28.0, beta=8.0/3.0):
    raise NotImplementedError

def rk4_step(f, x, dt, **kwargs):
    raise NotImplementedError

def euler_step(f, x, dt, **kwargs):
    raise NotImplementedError



---
## Q2. Butterfly effect
Tasks:
- Integrate two trajectories with initial states separated by $10^{-6}$ in x (use RK4, dt=0.01).
- Plot the Euclidean distance vs time (semilogy).
- Change parameters (e.g., $\rho=20$) or use Euler with larger dt to see sensitivity to numerical accuracy.
Hint: expect exponential separation in the chaotic regime; reduced separation if parameters leave the attractor.


---
## Q3. Truth and observations
Create a "truth" trajectory with RK4, dt=0.01, parameters as above. Sample noisy observations every `obs_interval` steps with noise std `obs_noise_std`.
Questions: how do sparser observations affect the analysis problem?


In [None]:
def simulate_truth_obs(x0, n_steps, dt, obs_interval, obs_noise_std, params=None):
    raise NotImplementedError



---
## Q4. EnKF with a degraded forecast model
We now build a stochastic EnKF where the forecast model is imperfect: coarse integrator (Euler, larger dt) and biased parameter (e.g., $\rho=26$ instead of 28).
Tasks:
- Implement `enkf_lorenz` using a chosen forecast stepper (Euler or RK4) and forecast parameters.
- Use full-state observations (H=I); perturb observations with $\mathcal{N}(0,R)$.
- Track ensemble mean/spread and compute RMSE vs truth.
Questions: how does model imperfection show up? does inflation help?


In [None]:
def enkf_lorenz(x0, dt, n_steps, obs, obs_times, R, N_ens=30, inflation=1.0, forecast_stepper=None, forecast_params=None):
    H = np.eye(3)
    forecast_stepper = forecast_stepper or rk4_step
    forecast_params = forecast_params or {}
    ensemble = x0 + rng.normal(0.0, 2.0, size=(N_ens, 3))
    mean = np.zeros((n_steps, 3))
    spread = np.zeros(n_steps)
    rmse = np.zeros(n_steps)
    mean[0] = ensemble.mean(axis=0)
    spread[0] = ensemble.std(axis=0).mean()
    obs_lookup = {int(t): y for t, y in zip(obs_times, obs)}

    for k in range(1, n_steps):
        for i in range(N_ens):
            ensemble[i] = forecast_stepper(lorenz63, ensemble[i], dt, **forecast_params)

        if inflation != 1.0:
            m = ensemble.mean(axis=0)
            ensemble = m + inflation * (ensemble - m)

        if k in obs_lookup:
            y_obs = obs_lookup[k]
            m_f = ensemble.mean(axis=0)
            A = (ensemble - m_f).T
            P_f = (A @ A.T) / (N_ens - 1)
            S = H @ P_f @ H.T + R
            K = (P_f @ H.T) @ np.linalg.inv(S)
            for i in range(N_ens):
                pert = rng.multivariate_normal(np.zeros(3), R)
                ensemble[i] = ensemble[i] + K @ (y_obs + pert - H @ ensemble[i])

        mean[k] = ensemble.mean(axis=0)
        spread[k] = ensemble.std(axis=0).mean()
    return mean, spread, rmse


---
## Q5. Experiments and analysis
Run the following and discuss (plots + short comments):
1. Baseline: truth (RK4, dt=0.01, rho=28), obs_interval=5, obs_noise_std=2.0; EnKF with Euler forecast dt=0.02 and rho=26; N_ens=40; inflation=1.02. Plot truth vs mean (x component) and spread; compute RMSE.
2. Effect of inflation: vary inflation in [1.0, 1.1].
3. Effect of ensemble size: N_ens = 10 vs 50.
4. Partial observations: observe only x (adjust H) and comment on performance.
Hints: model error should be visible as bias/drift without assimilation; good inflation counteracts collapse; partial obs increase spread and degrade unobserved components.


---
## Q6. Further questions (short answers)
- When is inflation too large? What symptom do you see?
- How would you augment the state to estimate $\rho$ online? What would you expect?
- In high dimension we localize covariances; why is localization unnecessary here?


---
## Deliverables
- Plots for Q2, Q3 (obs vs truth), Q5 (mean vs truth, RMSE vs spread).
- Short comments on each question.
- Code cells filled.
