In [None]:
from resources.workspace import *
from IPython.display import display
from scipy.integrate import odeint
import copy

%matplotlib inline

# Lyapunov vectors and ensemble based covariances

We return now to the discussion of perturbations in nonlinear models.  Recall the equations for the evolution of perturbations.  We may define *linear* dynamics, generated by the Jacobian equation,
<h2>$$\begin{align}
\frac{{\rm d} \mathbf{x}}{{\rm d} t} = \nabla f_{\rvert_{\mathbf{x}c}},
\end{align}$$</h2>
computed along some control trajectory <span style='font-size:1.25em'> $\mathbf{x}^c(t)$</span>.  This system of equations is known as the <a href="http://glossary.ametsoc.org/wiki/Tangent_linear_model" target="blank"><b>tangent-linear model</b></a>.  The tangent space can be understood as the <b>space of perturbations</b> at a point (or along a trajectory).  It can be used to define a vector field on a "space" describing the time evolution of trajectories. 

<div style='width:900px'>
<img src="./resources/Tangentialvektor.svg">
</div>

<b>By derivative work: McSush (talk)Tangentialvektor.png: TNThe original uploader was TN at German Wikipedia (Tangentialvektor.png) [Public domain], <a href="https://commons.wikimedia.org/wiki/File:Tangentialvektor.svg">via Wikimedia Commons</a></b>

As we have seen with the [breeding of errors](#breeding), we can compute the log-average growth rate of the dominant growing mode of the tangent-linear model to approximate the leading Lyapunov exponent.  In a linear system, with fixed matrix <span style='font-size:1.25em'> $\mathbf{M}$</span> this corresponds exactly to taking the log-average growth rate in the power method.  Recall the generalization of the power method to the QR algorithm.  We might hope to find **all of the log-average growth rates** in the tangent-linear model by computing the log averages of the diagonal elements in the QR factors <span style='font-size:1.25em'> $\mathbf{U}_k$</span>. 

In this case, the Gram-Schmidt factors form a basis for the tangent linear model where:
<ol>
    <li> the leading vector aligns with the dominant growth mode;</li>
    <li> the subsequent vectors separate out the lower order growth rates.</li>
</ol>

Suppose the tangent-linear model is computed discretely in time, where <span style=font-size:1.25em> $\textbf{M}_k$</span> takes the perturbations at time <span style=font-size:1.25em>$t_{k-1}$</span> to time <span style=font-size:1.25em>$t_k$</span>.  Suppose that in matrix form, <span style=font-size:1.25em>$\mathbf{M}_k$</span>, produces the following QR factorization,
<h3>$$\begin{align}
\mathbf{M}_k \mathbf{E}_{k-1} &= \mathbf{E}_k \mathbf{U}_k
\end{align}$$</h3>

**Exc 4.36**: Suggest a constructive definition for:
<ol>
    <li> the Lyapunov exponents of the nonlinear model</li>
    <li> the "Lyapunov vectors" </li>
</ol>
<b>Note:</b> We will not stress yet what kind of Lyapunov vector we are constructing.  It turns out that Lyapunov exponents are generally <b>globally defined</b>, but there are several types of Lyapunov vectors that are locally defined.

In [None]:
# Answer 

# show_answer('lyapunov_vs_es')

The type of vector this construction leads to is the **"backward" Lyapunov vectors**.  They are denoted "backward" because they contain information from the in the past, leading to the current time, i.e., the "errors of the day" described by Toth and Kalnay.  

For completeness, we remark that the forcing singular vectors mentioned above can be shown to converge to the "forward" Lyapunov vectors.  These are denoted "forward" because they contain information about the future evolution of pertubations, pulled back to the current time.  A third type of Lyapunov vector is also often studied, which are called "covariant".  Similar to how eigenvectors always are mapped to a scaled copy of themselves, covariant vectors share an analogous property with respect to time-varying dynamics.  

A full discussion of the significance of forward, backward, and covariant vectors goes well beyond the scope of this tutorial.  For a comprehensive discussion of Lyapunov theory, aimed at practicioners, it is recommended to read the work of [Legras & Vautard](http://www.lmd.ens.fr/legras/publis/liapunov.pdf) and [Kutpsov & Parlitz](https://arxiv.org/abs/1105.5228).

## Ensemble based covariances

When we restrict the forecasting problem to the situation where we assume:
<ul>
    <li> we can perfectly model and compute the purely deterministic dynamics; and</li>
    <li> prediction error originates soley from the uncertainty in initial conditions,</li>
</ul> 
it is realistic to think of the data assimilation problem as tracking the evolution of perturbations along a control trajectory.  This control trajectory is the "true" state of the dynamical system we are studying, which we receive observations of, and try to predict the behavior of at future times.

In this case, we suppose we have a prior distribution for the state of the dynamical system.  Suppose we sample for an ensemble of "nearby" initial conditions.  If the ensemble remains sufficiently close to the control trajectory, and we can use the [approximation for the evolution of perturbations](#perturbation_equation) to accurately model the forecast errors, then the ensemble spread will be characterized by the backward Lyapunov vectors and their growth and decay rates.

Consider the conceptual image below.  Suppose the initial prior covariance is given by the unit disk, centered on the "true" state of a dynamical system.  If the evolution of uncertainty can be well approximated by the tangent linear model along the "truth", we will expect the covariance to stretch and deform into an ellipse according to the directions of **growth** and **decay**.
<div style='width:800px'>
<img src="./resources/LyapunovDiagram.svg">
</div>
<b>By Mrocklin (original creation) [<a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a> or <a href="http://www.gnu.org/copyleft/fdl.html">GFDL</a>], <a href="https://commons.wikimedia.org/wiki/File:LyapunovDiagram.svg">via Wikimedia Commons</a></b>

In the following, we will implement a simple, "square-root" ensemble Kalman filter in the Lorenz-63 model.  The ensemble Kalman filter will be given **noisy observations** of a control trajectory.   Along the control trajectory, we will compute the **QR factorization** of the tangent-linear model.  

We will define the **projection coefficient** of the covariance matrix <span style='font-size:1.25em'> $\mathbf{P}_k$</span> into the $j$-th QR factor to be the quantity,
<h3>$$\begin{align}
\left(\mathbf{Q}_k^j\right)^{\rm T} \mathbf{P}_k \mathbf{Q}^j
\end{align}$$</h3>
We will compute the projection coefficients of the ensemble based covariance into each of the QR factors, as defined above.  

<b>Exc 4.38</b>: Can you conjecture how the projection coefficients will vary in the index $j$? <br>
**Hint**: consider **Exc 4.34**.

<b>Exc 4.40</b>: Test your conjecture from <b>Exc 4.40</b>.  Use the code below to investigate the relationship between the ensemble based covariance and its projeciton into each of the QR factors.  We plot the average projection coefficient for the EnKF covariance into each of the QR factors over the number of analyses.  Similarly, we plot the EnKF mean square error, to vefify that the ensemble mean lies within the variance of the error in these directions.

**Note**: we may consider the QR factorizations to give approximately the "true" Lyapunov vectors when the log-average growth rate approaches the Lyapunov exponents for the system.  The Lyapunov exponents are approximately given by,
<h3>$$\begin{align}
\lambda_1 &\approx 0.905 \\
\lambda_2 & = 0 \\
\lambda_3 & \approx -14.571.
\end{align}$$<h3/>

In [None]:
SIGMA = 10.0
BETA  = 8/3
RHO   = 28.0

sigma = SIGMA
beta = BETA
rho = RHO

def dxdt(xyz, t0, sigma=SIGMA, beta=BETA, rho=RHO):
    """Compute the time-derivative of the Lorenz-63 system."""
    x, y, z = xyz
    return array([
        sigma * (y - x),
        x * (rho - z) - y,
        x * y - beta * z
    ])

def l63_jac(x):
    jac = np.array([
        [-sigma,  sigma,     0 ],
        [rho - x[2], -1,  -x[0]],
        [x[1],      x[0], -beta]
        ]
    )
    
    return jac

def l63_step_TLM(x, Y, h):
    
    h_mid = h/2

    # calculate the evolution of x to the midpoint
    x_mid = l63_rk4_step(x, h_mid)

    # calculate x to the next time step
    x_next = l63_rk4_step(x_mid, h_mid)

    k_y_1 = l63_jac(x).dot(Y)
    k_y_2 = l63_jac(x_mid).dot(Y + k_y_1 * (h / 2.0))
    k_y_3 = l63_jac(x_mid).dot(Y + k_y_2 * (h / 2.0))
    k_y_4 = l63_jac(x_next).dot(Y + k_y_3 * h)

    Y_next = Y + (h / 6.0) * (k_y_1 + 2 * k_y_2 + 2 * k_y_3 + k_y_4)

    return [x_next, Y_next]

def l63_rk4_step(xyz, h):
    """ calculate the evolution of Lorenz-63 one step forward via RK-4"""
    
    k_xyz_1 = dxdt(xyz, h, sigma=SIGMA, beta=BETA, rho=RHO)
    k_xyz_2 = dxdt(xyz + k_xyz_1 * (h / 2.0), h, sigma=SIGMA, beta=BETA, rho=RHO)
    k_xyz_3 = dxdt(xyz + k_xyz_2 * (h / 2.0), h, sigma=SIGMA, beta=BETA, rho=RHO)
    k_xyz_4 = dxdt(xyz + k_xyz_3 * h, h, sigma=SIGMA, beta=BETA, rho=RHO)

    xyz_step = xyz + (h / 6.0) * (k_xyz_1 + 2 * k_xyz_2 + 2 * k_xyz_3 + k_xyz_4)

    return xyz_step

def animate_enkf_covariance(nanl=0):    
    
    # Initial conditions: perturbations around some control state
    tanl=0.005
    h = 0.001
    tl_steps = int(tanl / h)
    N = 4
    obs_un = 0.25
    obs_dim = 3
    R = np.eye(obs_dim) * obs_un
    H = np.eye(3, M=obs_dim).T
    proj_traj = np.zeros([nanl, 3])
    err = np.zeros([nanl])
    
    seed(1)
    x_0 = array([-6.1, 1.2, 32.5])               # define the control
    
    # define the perturbations, randomly generated but of fixed norm epsilon
    A_f = randn([3, N])
    a_m = np.mean(A_f, axis=1)
    A_f = A_f.T - a_m
    del a_m
    A_f = (x_0 + A_f).T
                  
    lam = np.zeros(3)
    Q = np.eye(3)
        
    # for each analysis cycle
    for kk in range(nanl):
        for j in range(tl_steps):
            x_0, Q = l63_step_TLM(x_0, Q, h)
            for l in range(2):
                for j in range(N):
                    A_f[:, j] = l63_rk4_step(A_f[:, j], h / 2)
            
        # perform QR step and find the local Lyapunov exponents
        Q, U = np.linalg.qr(Q)
        lam += np.log(np.abs(np.diag(U))) / tanl
        
        # define an observation
        y_0 = H @ x_0 + np.random.multivariate_normal(np.zeros([obs_dim]), R)
        
        # forecast mean, and rmse
        x_f = np.mean(A_f, axis=1)
        
        A_f = (A_f.T - x_f).T
        
        # forecast covaraince
        P_f = (N-1) ** (-1) * A_f @ A_f.T 
        
        # form the Kalman gain    
        K = P_f @ H.T @ np.linalg.inv(H @ P_f @ H.T + R) 
        
        # analysis mean
        x_a = x_f + K @ (y_0 - H @ x_f)
        err[kk] = np.mean((x_a - x_0)**2)
        
        # analyze the ensemble
        T = np.eye(N) - (N-1)**(-1) * (H @ A_f).T @ np.linalg.inv(H @ P_f @ H.T + R) @ (H @ A_f)
        U, S, V_h = np.linalg.svd(T)
        T = U @ np.diag(np.sqrt(S)) @ U.T
        A_f = A_f @ T
        
        P_a = (N-1)**(-1) * A_f @ A_f.T
        # find the forecast projection coefficients
        for i in range(3):
            proj_traj[kk, i] = Q[:, i].T @ P_a @ Q[:, i]
            
        A_f = (x_a + A_f.T).T
        
        
    # PLOTTING
    avg_proj = np.zeros([nanl, 3])
    
    # we plot the average projection coefficient into each blv
    for i in range(1,nanl):
        avg_proj[i, :] = np.mean(proj_traj[:i, :], axis=0)
    
    lam = lam / nanl
    fig = plt.figure(figsize=(16,8))
    ax = plt.subplot(111)
    
    for i in range(3):
        ax.plot(range(nanl), avg_proj[:, i], label='BLV projection ' + str(i + 1) + ', log-avg growth rate ' + 
                str(np.round(lam[i],decimals=3)).zfill(3))
                
    ax.axhline(y=np.mean(err), color='k', label='Mean square error')
    plt.legend(fontsize=20)
    ax.set_xbound([1, nanl])
    ax.set_yscale('log')
    ax.tick_params(labelsize=20)
    plt.show()
    
w = interactive(animate_enkf_covariance,nanl=(10,20010,2000))
w

<b>Exc 4.42</b>: Answer the following questions.
<ol>
   <li>How do the projection coefficients relate to the log-average growth rates in each direction? </li> 
    <li>What is significant about the <b>effective rank</b> of the covariance?</li>
   <li>Can you conjecture what this means about the necessary number of ensemble members to prevent filter divergence?
       <br> <b>Hint</b>: the ensemble should capture the effective spread of the uncertainty.</li>
</ol>