In [1]:
#libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import erf


# Information Content Assessment (ICA) with the Rodgers technique

For a detailed demo, see RodgersICADemo.ipynb

## Introduction

The intent of this notebook is to demonstrate and perform information content assessments based upon the methodology of Rodgers, 2000. It investigates the relationship between state space (the parameters one wants to determine, such as ocean microplastic concentration) and observation space (what is measured, such as satellite observed earth reflectance). 

The relationship between spaces can be modeled as 

$ \textbf{y} = F(\textbf{x}) $

where $\textbf{y}$ is the vector of measurements, with length $\textit{m}$, corresponding to a state space defined by the parameters $\textbf{x}$, with length $\textit{n}$. $F(\textbf{x})$ is a forward model (simulation) of reality representing the relationship between spaces. In the example noted above, $\textbf{y}$ is what we observe from the satellite, $\textbf{x}$ contains the ocean microplastic parameter we wish to determine (among others), and $ F(\textbf{x})$ is highly nonlinear and cannot be analytically inverted. 

In practice, this technique relies on the Jacobian matrix, comprised of the partial derivatives of $F(\textbf{x})$ with respect to each state space parameter. It describes the sensitivity of the forward model at a defined point within state space:

$ K_{j,i}(\textbf{x}) = \frac{\partial F_i (\textbf{x})}{\partial x_j}$

where $\textit{i}$ has length $\textit{m}$ and $\textit{j}$ has length $\textit{n}$. 



In this work we incorporate estimates of measurement uncertainty, simulation errors, and prior knowledge with this Jacobian to predict the uncertainty in determining parameter values. The approach involves several simplifications and approximations described in Rodgers, 2000 and demonstration publications Knobelspiesse et al., 2012, Chen et al., 2017 and Knobelspiesse and Nag, 2018. For example, this technique assumes that $F(\textbf{x})$ is a close model of physical reality, that differences between it and reality can be quantified, that a retrieval algorithm to determine parameters from observations does not impose additional errors, that $F(\textbf{x})$ is locally linear, and that measurement and parameter uncertainty can be simply expressed with error covariance matricies. In this sense the approach can be considered the 'best case scenario' that demonstrates the capability limits of a measurement system.

Measurement uncertainty is defined by the error covariance matrix, $\textit{S}_{\epsilon}$ with dimension $[m x m ]$. The square of the one sigma uncertainties associated with each $\textit{m}$ is represented on the diagonal, while other elements represent covariance between measurements. 

Another important aspect of ICA is to incorporate what knowledge of parameters exists prior to making a measurement. This is expressed in the $\textit{a priori}$ error covariance matrix, $S_{a}$. This has dimensions $[n x n ]$ and contains the square of the one sigma prior uncertainties on the diagonal (and covariance otherwise). The ICA will not predict retrieval uncertainties worse than this. In some cases, little to no information exists prior to a measurement, in which case an 'uninformative' prior with large assessed uncertainties is used.

The above quantities are combined to predict the retrieval error covariance matrix ($[n x n ]$):

$ \hat{S} =  \left[ K^T S_{\epsilon}^{-1} K + S_{a}^{-1} \right]^{-1} $

Like the other covariance matricies, the square root of the diagonal terms correspond to one sigma uncertainty estimates. 

In some cases, the forward model has known and quantifiable uncertainties. In our example of detecting microplastic with remote sensing observations, this might be how we account for the atmospheric absorption of a trace gas such as ozone. We can estimate ozone and use it in our forward model, but that estimate may have uncertainty with an impact that we can quantify. This can be incorporated into the error covariance matrix in the following manner:

$ S_{\epsilon} = S_{y} +  K_b S_b K_b^T $

where $S_b$ is the error covariance matrix of the parameters defining model parameters and $K_b$ the associated Jacobian. $S_y$ is the instrumental error covariance matrix. The above can be combined to:

$ \hat{S} =  \left[ K^T \left(S_{y} +  K_b S_b K_b^T\right)^{-1} K + S_{a}^{-1} \right]^{-1} $



An alternative formation of the above that has value for ICA is the averaging kernel matrix, $\textbf{A}$, which is the derivative of the posterior state ($\hat{S}$) with respect to the true state:

$ A = \left[ K^T S_{\epsilon}^{-1} K + S_{a}^{-1} \right]^{-1} K^T S_{\epsilon}^{-1} K $

A perfect retrieval has a unity $A$, and individal values of 1 on the diagonal indicates success for that particular parameter. A useful single scalar encapsulation of retrieval success is the Degrees of Freedom for Signal (DFS), which is the trace of the averaging kernel:

$DFS = trace(A)$

We can also calculate the Shannon Information Content: 

$SIC = 0.5 \ln \left| \hat{S}^{-1} S_{a} \right| $

where $ \left| \cdot \right| $ indicates the determinant of the enclosed matrix. This formulation is analogous to the reduction in entropy from the prior to posterior states, and like the $DFS$ represents the information content as a scalar value.

We are also interested in the detectability of a particular parameter or parameters. We formulate this as the probability of a successful non zero retrieval result, given the true value and the ICA predicted retrieval uncertainty. 

$P_{nz} = 1 - 0.5 \left[1+erf \left(\frac{-x_j}{\sqrt{2\hat{S}_{j,j}}}\right) \right]$

where $erf$ is the error function and the subscript $j$ denotes an element within the parameter vector $\textbf{x}$. This is derived from the treatment of the posterior probability as a Gaussian distribution function centered at the value of $x_j$ with width defined by the predicted uncertainty $\sqrt{\hat{S}_{j,j}}$. The integral for all positive values in that distribution compared to the total distribution is the propability of detection. Said another way, the probably is 1 - the cumulative distribution function (integral of the probability distribution function) at the point where it crosses zero. The range of the probability is $0.5 < P_{nz} < 1.0 $, meaning that at worst case we are equally likely to successfully or unsuccessfully make a detection.


Detection probability:

The cumulative distribution function (CDF) is the integral of the PDF from $-\infty$ to $x$. We want the converse:

$P_d = 1 - CDF(0) = \frac{1}{2} \left[1 - erf \left( \frac{-x_i}{ \sqrt{2\hat{S}_{i,i}}} \right)   \right]   $

where $erf$ is the error function associated with the integral of a gausssian distributed PDF. In the example plotted above, where $x_i = 0.5$ and $\sqrt{\hat{S}_{i,i}} = 1.0$, $P_d = 0.691$. In other words, there is a 69.1% probability that $x_i > 0$. Note that $ 0.5 < P_d < 1.0 $. 

## Refrences

Chen, X., Wang, J., Liu, Y., Xu, X., Cai, Z., Yang, D., Yan, C.-X., and Feng, L.: Angular dependence of aerosol information content in CAPI/TanSat observation over land: Effect of polarization and synergy with A-train satellites, Remote Sens. Environ., 196, 163-177, https://doi.org/10.1016/j.rse.2017.05.007, 2017.

Knobelspiesse, K., Cairns, B., Mishchenko, M., Chowdhary, J., Tsigaridis, K., van Diedenhoven, B., Martin, W., Ottaviani, M., and Alexandrov, M.: Analysis of fine-mode aerosol retrieval capabilities by different passive remote sensing instrument designs, Opt. Express, 20(19), 21457-21484, https://doi.org/10.1364/OE.20.021457, 2012.

Knobelspiesse, K. and Nag, S.: Remote sensing of aerosols with small satellites in formation flight, Atmos. Meas. Tech., 11(7), 3935--3954 , https://doi.org/10.5194/amt-11-3935-2018, 2018.

Rodgers, C. D.: Inverse Methods for Atmospheric Sounding: Theory and Practice, World Scientific, Singapore, 2000. 



## Inputs


Dimensions specified below

$\textit{m}$: length of the measurement vector

$\textit{n}$: length of the state (parameter) vector

$\textit{r}$: length of the model error vector


To perform this ICA, you'll need a few things:

1. Jacobian matrix, defined as the $ K_{j,i}(\textbf{x}) = \frac{\partial F_i (\textbf{x})}{\partial x_j}$ where $\textit{i}$ has length $\textit{m}$ and $\textit{j}$ has length $\textit{n}$. $F(\textbf{x})$ is the forward model at the state defined by $\textbf{x}$.

    jac [n x m]
    
2. Model error Jacobian matrix, defined as the $ K_{b,u,i}(\textbf{x}) = \frac{\partial F_i (\textbf{x})}{\partial b_u}$ where $\textit{i}$ has length $\textit{m}$ and $\textit{u}$ has length $\textit{r}$. $F(\textbf{x})$ is the forward model at the state defined by $\textbf{x}$. Note there are options in the Rodgers function to omit consideration of model errors if these are not known.

    jac_me [r x m]
    
3. Measurement error coveriance matrix, $S_{\epsilon}$. This can be specificed as either an $\textit{m}$ length vector of sigma squared measurement uncertainties or an $\textit{m} x \textit{m}$ full covariance matrix. In the former case, it is assumed that measurement uncertainty is uncorrelated. 

    err [m x m] or [m]   

4. Model error coveriance matrix, $S_{b}$. This can be specificed as either an $\textit{r}$ length vector of sigma squared measurement uncertainties or an $\textit{r} x \textit{r}$ full covariance matrix. In the former case, it is assumed that measurement uncertainty is uncorrelated. Note there are options below to omit consideration of model errors if these are not known.

    err_me [m x m] or [m]   
    
5. A priori error coveriance matrix, $S_{a}$. This can be specificed as either an $\textit{n}$ length vector of sigma squared measurement uncertainties or an $\textit{n} x \textit{n}$ full covariance matrix. In the former case, it is assumed that measurement uncertainty is uncorrelated. 

    ap [n x n] or [n]   
    
6. If calculating the detection probability: the value of the parameter in question

    mu [scalar]
    
        

# Outputs


### from rodgers()
1. Error covariance matrix, $\hat{S}$.

    S_hat [n x n]
    
2. Shannon Information Content, $SIC$.

    SIC [scalar]
    
3. Averaging kernel matrix, $A$. 

    AvgK [n x n]

4. Degrees of Freedom for Signal, $DFS$. 

    DFS [scalar]  

### from rodgers()
1. Error covariance matrix, $\hat{S}$.

    S_hat [n x n]
    
### from detect_prob()
1. Probability of detection, $P_d$. Note this requires inputs mu (parameter value in question) and sigma ($\sqrt{\hat{S}_{p,p}}$ where $p$ is the parameter index.)

    Pd [scalar]
    
    Pd_pcnt_str [percentage as string]


In [14]:
#make or load derivatives
#
#
# OUTPUTS - 
# jac: jacobian matrix
# jac_me: jacobian with model errors
#

def get_jac():
    #here there should be code to read/create and format the jacobians relevant to the ICA

    return jac, jac_me

In [15]:
    #define diagonal terms for error covariance matricies
ap=(np.linspace(10.0,10.0,4))**2 #generate a priori error covariance matrix
err=(np.linspace(0.5,0.5,numpts))**2 #generate error coveriance matrix diagonals (code also takes 2d input)
me_err=np.array([0.1**2,0.1**2]) #error on the model error parameters

In [5]:
#function to calculate the parameter error covariance matrix.

#input Jacobian, K, [n x m], error covariance matrix Se, [m x m] and a priori matrix Sa, [n x n]
#jac_me and me are associated with model uncertainty - that parameterized uncertainty and its jacobian
def rodgers(jac, err, ap, model_error={}, model_error_jacobian={}): 
    #todo
    # consider microplastic simulation jacobian in the following manner: Se' = Se + Kb Sb Kbt, where 
    #  Se is the same as above, Sb is the microplastic parameter uncertainty, Kb the microplastic jacobian
    
        #check if error covariance matrix is square, or just diagonal values. If latter make full matrix
    if err.ndim == 1:
        ln=np.shape(err)
        err2d = np.zeros((ln[0], ln[0]))
        np.fill_diagonal(err2d, err)
        err=err2d

        #check if a priori covariance matrix is square, or just diagonal values. If latter make full matrix
    if ap.ndim == 1:
        ln=np.shape(ap)
        ap2d = np.zeros((ln[0], ln[0]))
        np.fill_diagonal(ap2d, ap)
        ap=ap2d        
            
        #section to verify compatable dimensions ------------------------------------------------------
    sh_jac = np.shape(jac)
    sh_err = np.shape(err)
    sh_ap = np.shape(ap)
    
    n_dim = sh_jac[0]
    m_dim = sh_jac[1]
    
    if not((sh_err[0] == sh_err[1]) and (sh_ap[0] == sh_ap[1])):
        print('ERROR: error covariance matrix or a priori matrix are not square')
        print('Error covariance matrix dimensions')
        print(sh_err)
        print('A priori matrix dimensions')
        print(sh_ap)
        return -1, -1, -1, -1
    
    if not(sh_jac[0] == sh_ap[0]):
        print('ERROR: n dimensions inconsistent, should be Jacobian [n x m]; a priori [n x n]')
        print('Jacobian matrix dimensions')
        print(sh_jac)
        print('A priori matrix dimensions')
        print(sh_ap)
        return -1, -1, -1, -1
    
    if not(sh_jac[1] == sh_err[0]):
        print('ERROR: m dimensions inconsistent, should be Jacobian [n x m]; error covariance [m x m]')
        print('Jacobian matrix dimensions')
        print(sh_jac)
        print('Error covariance matrix dimensions')
        print(sh_err)
        return -1, -1, -1, -1
        
    #section to generate model derived error -------------------------------------------------------
    
    if len(model_error) > 0:
        me=model_error
        jac_me=model_error_jacobian
        
        ln_me=np.shape(me)
        errme_2d = np.zeros((ln_me[0], ln_me[0]))
        np.fill_diagonal(errme_2d, me)
        err_me=errme_2d
    
        jac_me_t=np.transpose(jac_me)      
    
        JacmetMeJacme = np.matmul(jac_me_t,np.matmul(err_me,jac_me))
        err = err + JacmetMeJacme
    
        #perform inverse and matrix multiplication calculations ----------------------------------------
    jac_t=np.transpose(jac) #transpose of Jacobian (KT)
    
    try: 
        err_i=np.linalg.inv(err) #inverse of error covariance matrix (Se-1)
    except:
        print("ERROR: problem inverting error covariance matrix")
        return -1, -1, -1, -1
    
    try: 
        ap_i=np.linalg.inv(ap) #inverse of a priori error covariance matrix
    except:
        print("ERROR: problem inverting a priori covariance matrix")
        return -1, -1, -1, -1

    KtSK = np.matmul(jac,np.matmul(err_i,jac_t)) #calcuates KT Se-1 K

    try: 
        S_hat = np.linalg.inv(KtSK+ap_i) #calculate the inverse of (above + Sa-1)
    except:
        print("ERROR: problem inverting retrieval error covariance matrix")
        return -1, -1, -1, -1
    
    SIC = 0.5*np.log(np.linalg.det(np.matmul((KtSK+ap_i),ap))) #calculate Shannon Information Content    
    AvgK = np.matmul(S_hat,KtSK) #averaging kernel
    DFS = np.trace(AvgK) #degrees of freedom for signal (DFS) which is trace of averaging kernel
    
    return S_hat, SIC, AvgK, DFS  #returns retrieval error covariance matrix and the Shannon Information Content

In [6]:
def print_out(S_hat, SIC, AvgK, DFS, jac, err, ap, me_err, numpts, params, me_params ):

    S_hat_diag=np.diagonal(S_hat)
    Err=np.sqrt(S_hat_diag)

    np.set_printoptions(formatter={'float': '{: 0.2f}'.format})
    print('Error covariance matrix:')
    print(S_hat)
    print()

    np.set_printoptions(formatter={'float': '{: 0.5f}'.format})
    print('Averaging kernel matrix:')
    print(AvgK)
    print()
    np.set_printoptions(formatter={'float': '{: 0.5f}'.format})
    print('Model Parameters:       ', params)
    print('Number of observations: ', numpts)
    print('A priori uncertainty:   ', np.sqrt(ap))
    print('Uncertainties:          ', Err)
    print('Shannon Information Content:      ', SIC)
    print('Degrees of freedom for signal:    ', DFS)

In [11]:
#calculates the probability of detection given the parameter value (mu) and uncertainty (sigma)
#assumes PDF is gaussian normally distributed
def detect_prob(mu, sigma, doprint=0): 

    Pd = 1-0.5*(1+erf((-1*mu)/(sigma*np.sqrt(2))))  #detection probability, modified from CDF function

    Pd_pcnt_str=str(np.around(Pd*100,decimals=1))+'% positive probability' #string output version

    if doprint > 0:
        print(Pd_pcnt_str)

    return Pd, Pd_pcnt_str