# Kalman Filter Simple

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from meteo_imp.data import hai, units
from meteo_imp.data_preparation import MeteoDataTest
from meteo_imp.old.kalman.imputation import KalmanImputation
from meteo_imp.old.kalman.model import *
from ipywidgets import interact, interact_manual, IntSlider

In [None]:
def gap2res(var_sel, gap_len, gap_start, model, n_iter):
    data = MeteoDataTest(hai).add_gap(gap_len, var_sel, gap_start)
    return KalmanImputation(data.data, model).fit(n_iter=n_iter).to_result(data.data_compl_tidy, var_names= data.data.columns, units=units, pred_all=True)

## Introduction

The models uses a latent state variable $x$ that is modelled over time, to impute gaps in $y$

### Equations

The equations of the model are:

$$\begin{align} p(x_t | x_{t-1}) & = \mathcal{N}(Ax_{t-1}, Q) \\
p(y_t | x_t) & = \mathcal{N}(Hx_t, T) \end{align}$$

The Kalman filter has 3 steps:

- filter (updating the state at time t with observations till time t-1)
- update (update the state at time t using the observation at time t)
- smooth (update the state using the observations at time t+1)

In case of missing data the update step is skipped.

After smoothing the whole dataset the missing data ($y_t$) can be imputed from the state ($x_t$) using this formula:
$$p(y_t) = \mathcal{N}(Hx_x, R + HP^s_tH)$$

## Local Level Model

local level models is a model that uses Kalman filter, where the design matrix (`A`) and the Transition matrix (`H`) are identity matrix. This means that the state of model is equal to the observations and the changes in the state are only from the process noise.
$$A = I$$
$$H = I$$

In [None]:
#| include: false
@interact_manual(TA=True, SW_IN=True, VPD=True,
          gap_len=IntSlider(10, 1,100, continous_update=False),
          gap_start=IntSlider(30, 1,100),
          n_iter = IntSlider(10, 5, 15))
def show_diff_gaps_res(TA, SW_IN, VPD, gap_len, gap_start, n_iter):
    var_sel = []
    if TA: var_sel.append('TA')
    if SW_IN: var_sel.append('SW_IN')
    if VPD: var_sel.append('VPD')
    var_sel = (*var_sel,)
    gap2res(var_sel, gap_len, gap_start, LocalLevelModel, n_iter).display_results()
    

interactive(children=(Checkbox(value=True, description='TA'), Checkbox(value=True, description='SW_IN'), Check…

### Comments:

it basically makes a linear interpolation between the first and the last observation. It works okay for short gaps, but not for long gaps.

The error increases when away from the observations as expected

## Local Slope Model 

Local slope models are an extentions of local level model that in the state variable keep track of also the slope

The transition matrix (`A`) is:

$$A = \left[\begin{array}{cc}I & I \\ 0 & I\end{array}\right]$$

the state $x \in \mathbb{R}^{(2N) \times 1}$ where the upper half keep track of the level and the lower half of the slope. $A \in \mathbb{R}^{2N \times 2N}$

hence the observation matrix (`H`) is:

$$H = \left[\begin{array}{cc}I & 0 \end{array}\right]$$

For the multivariate case the 1 are replaced with an identiy matrix


In [None]:
#| include: false
@interact_manual(TA=True, SW_IN=True, VPD=True,
          gap_len=IntSlider(11, 1,100),
          gap_start=IntSlider(63, 1,100),
          n_iter = (10, 15))
def show_diff_gaps_res(TA, SW_IN, VPD, gap_len, gap_start, n_iter):
    var_sel = []
    if TA: var_sel.append('TA')
    if SW_IN: var_sel.append('SW_IN')
    if VPD: var_sel.append('VPD')
    var_sel = (*var_sel,)
    gap2res(var_sel, gap_len, gap_start, LocalSlopeModel, n_iter).display_results()
    

interactive(children=(Checkbox(value=True, description='TA'), Checkbox(value=True, description='SW_IN'), Check…

In [None]:
#| include: false
@interact_manual(TA=True, SW_IN=True, VPD=True,
          gap_len=IntSlider(11, 1,100),
          gap_start=IntSlider(63, 1,100),
          n_iter = (10, 15))
def show_diff_gaps_res(TA, SW_IN, VPD, gap_len, gap_start, n_iter):
    var_sel = []
    if TA: var_sel.append('TA')
    if SW_IN: var_sel.append('SW_IN')
    if VPD: var_sel.append('VPD')
    var_sel = (*var_sel,)
    gap2res(var_sel, gap_len, gap_start, LocalSlopeModel, n_iter).display_results()
    

interactive(children=(Checkbox(value=True, description='TA'), Checkbox(value=True, description='SW_IN'), Check…

### Comment

the addition of the slope helps with following the pattern in the data, but is still not enough for long gaps.

For all short gaps (<200 obs missing) this is the average gap len:

|    | variable   |   gap_len |
|---:|:-----------|----------:|
|  0 | TA   |  11.3392  |
|  1 | VPD   |   9.64244 |
|  2 | SW_IN |   5.72689 |

## Next Steps

the way forward consists on 3 parallel ways:

- implement a Factor Analysis (so takes into account correlation between variables) + conditional guassian 
- add ERA5 data
- more complex state

implementation:

- solve issue with negative covariances
- consider re-implement EM with gradient descend
- initialization of parameters 


## ERA Data

ERA5 data has a frequency of 1 hours while our meteo data has a frequency of half an hour.
Since we are interested in the change of the state the idea is to numerically compute the slope of the ERA observations

$e_t$ is the observation at time $t$ from the ERA5 dataset, which are available only for odds values of $t$

we compute the slope of $e$

$$s_t = \begin{cases}
 (e_{t} - e_{t-2})/2 & \text{if }t\ mod\ 2 = 0  \\
 (e_{t+1} - e_{t-1})/2 & \text{if }t\ mod\ 2 = 1
\end{cases}$$

then the model equation becomes:

$$p(x_t | x_{t-1}) = \mathcal{N}(Ax_{t-1} + Bs_t , Q)$$

where:
- B is the control input matrix $B \in \mathbb{R}^{k \times k}$
- $x_t \in \mathbb{R}^k$ 

additions:

- if $H \ne I$ then the ERA5 observations needs to be transformed into the latent variables using $H^{-1}$ (which needs to be invertible)
- include uncertanties in the ERA5 measurements
- add a bias term for the ERA5 slope