# Data assimilation (DA) & the ensemble Kalman filter (EnKF)
by Patrick N. Raanes

### Data assimilation (DA) is:
<figure style="float:right;width:350px;">
    <img src="./resources/DA_bridges.jpg" alt='DA "bridges" data and models.'/>
    <figcaption>Data assimilation "bridges" data and models.<br>Attribution: Data Assimilation Research Team, <a href="http://www.aics.riken.jp">www.aics.riken.jp</a>.</figcaption>
</figure>
 * the calibration of big models with big data;
 * the fusing of forecasts with observations.
 
The problem of DA fits well within the framework of "state estimation" and "sequential inference". A concise overview of DA is given by Wikle and Berliner: [A Bayesian tutorial for data assimilation](http://web-static-aws.seas.harvard.edu/climate/pdf/2007/Wikle_Berliner_InPress.pdf)

Modern DA builds on state estimation techniques such as the Kalman filter (KF), which is a recursive least-squares regression algorithm. It was developed to steer the Apollo mission rockets to the moon, but also has applications outside of control systems, such as speech recognition, video tracking, and finance. An [introduction by pictures](http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/) is provided by Tim Babb. An [interactive tutorial](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) has been made by Roger Labbe.

When it was first proposed to apply the KF to DA (specifically, weather forecasting), the idea sounded ludicrous because of some severe challenges:

#### Technical challenges in DA (vs. "classic" state estimation):
 * size of data and models;
 * nonlinearity of models;
 * sparsity and inhomogeneous-ness of data.

Some of these challenges may be recognized in the video below.

*Execute/run the cells (in order) to bring up the video*

In [None]:
from resources.resources import *

In [None]:
envisat_video()

### The EnKF is
an ensemble (Monte-Carlo) formulation of the KF
that manages (fairly well) to deal with the above challenges of DA.

For those familiar with the method of 4D-Var, **further advantages of the EnKF** include it being:
 * Non-invasive: the models are treated as black boxes, and no explicit jacobian is required.
 * Bayesian: 
   * provides ensemble of possible realities;
       - arguably the most practical form of "uncertainty quanitification";
       - ideal way to initialize "ensemble forecasts";
   * uses "flow-dependent" background covariances in the analysis.
 * Highly Parallellizable:
   * distributed accross realizations for model forecasting;
   * distributed accross local domains for observation analysis.
   
The rest of this tutorial provides an EnKF-centric presentation of DA; it also has a [theoretical companion](./resources/DA_intro.pdf).

---
**Exc:** Word association.
Fill in the `X`'s in the table to group the words according to meaning.

`Filtering, Sample, Random, measurements, Kalman filter (KF), Monte-Carlo, observations, State estimation, Data fusion`

---
`Data Assimilation (DA)     Ensemble    Stochastic     data        
X                          X           X              X           
X                                      X              X           
X                          
X`

In [None]:
#show_answer('thesaurus 1')

* "The answer" is given from the perspective of DA. Do you agree with it?
* Can you describe the (important!) nuances between the similar words?

---
**Exc:** Word association (advanced).
Group these words:

`Inverse problems, Sample point, Probability, Sequential, Inversion, Realization, Relative frequency, Iterative, Estimation, Single draw, Serial, Approximation, Regression, Fitting`

---

`Statistical inference    Ensemble member     Quantitative belief    Recursive 
X                        X                   X                      X         
X                        X                   X                      X         
X                        X                                          X         
X                        
X                        
X                        
`
          



In [None]:
#show_answer('thesaurus 2')

### Next: [Bayesian inference](T2 - Bayesian inference.ipynb)