In the last notebook, we saw that the Jacobian of the predicted heating had extremely bad performance. Including the upper atmospheric humidity (which has nearly 0 variance) caused this ill-conditioning just as it did for the linear response functions. In this notebook, I experiment with removing these null-variance features from the inputs to the neural network.

In [None]:
import numpy as np
import xarray as xr
import torch
from scipy.linalg import eigvals
from torch.autograd import Variable

from lib.models.torch_models import predict, jacobian, train_euler_network

In [None]:
import holoviews as hv
from lib.hvops import quadmesh
hv.extension('matplotlib')

invert_opts = dict(plot=dict(invert_yaxis=True, invert_axes=True))


Let's load the data.

In [None]:
data = np.load("../data/ml/ngaqua/time_series_data.npz")


X = data['X']
G = data['G']
scale = data['scales']
w = data['w']

# # we need to grap the pressure field from a different path
p = xr.open_dataset("../data/raw/ngaqua/stat.nc").p.values
# t = dt * np.arange(X.shape[0])

and compute the mean.

In [None]:
mu = np.apply_over_axes(np.mean, X, axes=(0,1,2)).ravel()
mu = mu[:-14]


sig = np.apply_over_axes(np.std, X, axes=(0,1,2)).ravel()
sig = sig[:-14]

x_mean = np.apply_over_axes(np.mean, X, axes=(0,1,2)).reshape((2,-1))
x_std = np.apply_over_axes(np.std, X, axes=(0,1,2)).reshape((2,-1))

Let's find highlight the 200 hPA level.

In [None]:
%%opts Points(color='red')

ind = np.searchsorted(p[::-1], 200)
nz = p.shape[0]
print("200hPA index is", ind)


lay = hv.Curve((p, x_std[1,:])).opts(**invert_opts) * hv.Points((p[-ind], x_std[1,-ind]))
lay.redim.label(y="std(qt) (g/kg)", x="p (hPa)")

In the future, we should not use the moisture above 200 hPa as an input to the algorithm. Let's train a neural network excluding these points. We only train with a random subset of 100000 samples.

In [None]:
stepper = train_euler_network(data, n=2, nsteps=1, ntrain=100000, weight_decay=0.0, nhidden=256)

Let's compute the linear response function  $M(x)=\log(I + h J(x))$, about the mean profile.

In [None]:
lrf= stepper.linear_response_function(mu)

Let's plot this jacobian

In [None]:
def lrf_plots(p, sig, jac, nz=34):
    p_t = p
    # q has fewer grid points
    p_q = p[:-14]
    
    opts ="""
    QuadMesh[invert_yaxis=True, invert_xaxis=True, colorbar=True](cmap='viridis') {+axiswise}
    VLine(color='red')
    """
    
    jac = jac * sig
    
    # this holomap plots the panes of the LRF
    holomap = hv.HoloMap(kdims=['d', 'f'])
    holomap['sl', 'sl'] = quadmesh(p_t, p_t, jac[:nz,:nz])
    holomap['sl', 'qt'] = quadmesh(p_q, p_t, jac[:nz,nz:])
    holomap['qt', 'sl'] = quadmesh(p_t, p_q, jac[nz:,:nz])
    holomap['qt', 'qt'] = quadmesh(p_q, p_q, jac[nz:,nz:])
    holomap = holomap.redim.label(x="p (in)", y="p (out)")
    
    # cannot render ndlayout in layout
    lrf=holomap['sl', 'sl'] + holomap['sl', 'qt'] + holomap['qt', 'sl']  + holomap['qt', 'qt']
    
    # now let's look at the eigenvalues
    lam = eigvals(jac)
    eigpts = hv.Points((lam.real, lam.imag), kdims=[r'$\Re$', r'$\Im$'])

    
    return (lrf + eigpts * hv.VLine(0.0)).opts(opts)

In [None]:
lrf_plots(p, sig, lrf).cols(2)

this pane looks very similar to the results we had with the linear response function. However, there are many small positive eigenvalues. This might not necessarily be a problem because we can just add some small amount of damping.

# Time stepping with neural networks

The data we are using are only evaulated with a coarse sampling time step of 3 hours. On the other hand, we will probably use 10-20 minute time step for the coarse resolution model. This means that the dynamical model we are trying to fit is 
$$ x^i_{n+1} = \underbrace{f(f(\ldots f}_{\text{m times}}(x^i_n))) + \int_{t_n}^{t_{n+1}} g(x(t), t) dt$$ 
where $i$ is the horizontal spatial index, and $n$ is the time step. The number of times the function $f$ is applied is $m=\frac{\Delta t}{h}$ where $h$ is the GCMs time step, and $\Delta t$ is the sampling interval of the stored output. The integral on the right represents the approximately known terms such as advection, and $f$ represents the unknown source terms.

We solve a minimization problem to find $f$. This is given by 
$$
\min_{a} \lim_{m \rightarrow \infty} \sum_{i,n} ||x^{i}_{n+1} - F^{(m)} x^i_{n} - g_n^{i}||_W^2 \quad \text{s.t.}\quad F^{(m)}(\cdot) = \underbrace{f(f(\ldots f}_{\text{m times}}(\cdot))),\ f(x) = x +  \frac{ \Delta t}{m} a(x).
$$
Intuitively, the forward operator $F^{(m)}$ is the result applying $m$ forward euler steps to the system $a$.

Let's try performing this fit fior $m=10$.

In [None]:
stepper = train_euler_network(data, n=5, nsteps=10, ntrain=100000, weight_decay=0.00, nhidden=20)

In [None]:
M = stepper.linear_response_function(mu)
lrf_plots(p, sig, M).cols(2)

There are two lessons to using this euler stepping approach. The first is that we can achieve a similar performance using the many fewer hidden nodes.

## Single column performance of time stepping schemes

In [None]:
x = X[:-1,8,0,:-14]
xp = X[1:, 8,0,:-14]
g = G[:-1,8,0,:-14]

t = np.arange(x.shape[0])*.125

In [None]:
xp_pred = predict(stepper, x)

q1_truth = (xp-x)/(3/24)-g
q1_pred = (xp_pred-x)/(3/24) - g

# q1_pred = predict(stepper.rhs, x)

In [None]:
%%output size=200
%%opts QuadMesh[invert_yaxis=True, colorbar=True, aspect=3](cmap='inferno')

hv.HoloMap({
    'pred': quadmesh(t, p, q1_pred[:,:34].T, kdims=['t (d)', 'p (mb)']),
    'truth':  quadmesh(t, p, q1_truth[:,:34].T, kdims=['t (d)', 'p (mb)'])
}).layout().cols(1)

# Prognostic performance (with L2 regularization)

Here I test the performance of neural network trained with only one time step.

In [None]:
from scipy.interpolate import interp1d
from scipy.integrate import odeint

In [None]:
stepper = train_euler_network(data, n=4, nsteps=1, ntrain=100000, weight_decay=0.01, nhidden=256)

In [None]:
M = stepper.linear_response_function(mu)
lrf_plots(p, sig, M).cols(2)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

Let's make some interpolators for the forcing as well as the observed time series

In [None]:
forcing = interp1d(t, g, axis=0)
actual = interp1d(t, x, axis=0)

In [None]:
def print_soln(f):
    
    y = odeint(f, x[0], t[:400])
    
    fig, axs = plt.subplots(2,2, figsize=(8,5), sharey=True)
    qt_levs = np.arange(11)*2.5


    t_levs = np.arange(12)*25 + 275
    t_im = axs[0,0].contourf(x[:,:34].T, levels=t_levs)
    axs[0,1].contourf(y[:,:34].T, levels=t_levs)
    q_im = axs[1,0].contourf(x[:,34:].T, levels=qt_levs)
    axs[1,1].contourf(y[:,34:].T, levels=qt_levs)

    plt.colorbar(t_im, ax=axs[0,:].tolist())
    plt.colorbar(q_im, ax=axs[1,:].tolist())
    
    axs[0,1].set_title("Prediction")
    axs[0,0].set_title("Truth")
    
    axs[0,0].set_ylabel('sl')
    axs[1,0].set_ylabel('qt')
   

def f(x, t):
    return predict(stepper.rhs, x) #- (x-mu)/4 + forcing(t)

print_soln(f)

We can see that the solutions tend to explode behavior when the neural net is the only active source term.

Including some weak damping $\tau=20\text{ day}$ towards the mean stabilizes the scheme.

In [None]:
def f(x, t):
    return predict(stepper.rhs, x) - (x-mu)/20 #+ forcing(t)
print_soln(f)

What happens if we add in the advection forcing?

In [None]:
def f(x, t):
    return predict(stepper.rhs, x) - (x-mu)/20 + forcing(t)
print_soln(f)

It is unstable again, so we should probably increase the damping once more

In [None]:
def f(x, t):
    return predict(stepper.rhs, x) - (x-mu)/4 + forcing(t)
print_soln(f)

We can also damp towards the actual time series.

In [None]:
def f(x, t):
    return predict(stepper.rhs, x) - (x-actual(t))/4 + forcing(t)
print_soln(f)

We can see the answer is basically the as before. What happens if we turn off the neural network scheme?

In [None]:
def f(x, t):
    return - (x-actual(t))/4 + forcing(t)
print_soln(f)

We get a much worse prediction.