# Data assimilation (DA) & the ensemble Kalman filter (EnKF)
by Patrick N. Raanes

In [1]:
from resources import *

/Users/pataan/Dropbox/DPhil/DAPPER


#### Data assimilation is:
<figure style="float:right;width:350px;">
    <img src="./DA_bridges.jpg" alt='DA "bridges" data and models.'/>
    <figcaption>Data assimilation "bridges" data and models.<br>Attribution: Data Assimilation Research Team, <a href="http://www.aics.riken.jp">www.aics.riken.jp</a>.</figcaption>
</figure>
 * the calibration/fusion of big models with big data;
 * the process of combining model forecasts with observational data;
 * the set of techniques specialized for sequential inference.
 
A concise overview of DA is given by Wikle and Berliner: [A Bayesian tutorial for data assimilation (DA)](http://web-static-aws.seas.harvard.edu/climate/pdf/2007/Wikle_Berliner_InPress.pdf)

Modern DA builds on "state estimation" techniques such as the Kalman filter (KF), which is a recursive least-squares regression algorithm. It was developed to steer the Apollo mission rockets to the moon, but also has applications outside of control systems, such as speech recognition, video tracking, and finance. An [introduction by pictures](http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/) is provided by Tim Babb. An [interactive tutorial](https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python) has been made by Roger Labbe.

The problem of DA fits well within the framework of state estimation.
However, anectodally, when it was first proposed to apply the KF to DA (specifically, weather forecasting), the idea sounded ludicrous because of some severe challenges:

#### Technical challenges in DA (vs. "classic" state estimation):
 * size of data and models;
 * nonlinearity of models;
 * sparsity and inhomogeneous-ness of data.

Some of these challenges may be recognized in the video below.

In [2]:
envisat_video()

#### The EnKF is
an ensemble (Monte-Carlo) formulation of the KF
that manages (fairly well) to deal with the above challenges of DA.

In addition, its advantages (vs. 4D-Var) include that it is
 * Non-invasive: the models are treated as black boxes, and no explicit jacobian is required.
 * Bayesian: 
   * provides ensemble of possible realities;
   * uses "flow-dependent" background covariances.
 * Highly Parallellizable:
   * distributed accross realizations for model forecasting;
   * distributed accross local domains for observation analysis.
   
The rest of this tutorial provides an EnKF-centric presentation of DA; it also has a [theoretical companion](./DA_intro.pdf).

**Exc:** Word association.
Group the following words according to meaning.

`measurements, Data Assimilation (DA), data, Estimation, Fitting, State estimation, Recursive, Approximation, Random, observations, Statistical inference, Sequential, Kalman filter (KF), Serial, Inverse problems, Inversion, Monte-Carlo, Iterative, Ensemble, Data fusion, Filtering, Stochastic, Sample, Regression`

In [3]:
show_answer('thesaurus')

"The answer" is given from the perspective of DA. Do you agree with it?

Can you describe the (important!) nuances between the similar words?

## Next: [Bayesian inference](T2 - Bayesian inference.ipynb)