In the notebook at `reports/1.0-ndb-chris-slides.ipynb` I found that the advection forcings used in the model are extremely noisy, and probably inaccurate. The liquid water temperature forcing is especially noisy near the tropopause of the model. The goal of this notebook is to develop a more stable method for computing this vertical terms.

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np
import xarray as xr
from lib.advection import material_derivative
from lib.thermo import liquid_water_temperature

import holoviews as hv
from lib.holoviews import quadmesh

hv.extension('matplotlib')

In [None]:
def loc(x):
    return x.isel(x=0, y=8)


def plot_tz(x, **kwargs):
    plt.figure(dpi=125, figsize=(6,2))
    return x.transpose('z', 'time').plot(**kwargs)

In [None]:
    
fields_3d = xr.open_mfdataset("../data/raw/2/NG_5120x2560x34_4km_10s_QOBS_EQX/coarse/3d/*.nc",
                     preprocess=lambda x: x.drop('p')).isel(y=slice(24,40)).compute()

fields_loc = loc(fields_3d).compute()
sl = fields_loc.pipe(lambda x: liquid_water_temperature(x.TABS, x.QN, x.QP))

Here is the vertical velocity field. It seems pretty reasonable.

In [None]:
fields_loc.W.sel(time=slice(100, 120)).pipe(plot_tz)

Let's compute $\frac{ds_l}{dz}$ using centered differnces

In [None]:
dsldz =sl.centderiv('z')

In [None]:
dsldz.sel(time=slice(100, 120)).pipe(plot_tz, vmax=.02)

In [None]:
(dsldz*fields_loc.W).sel(time=slice(100, 120)).pipe(plot_tz)

Clearly, the the very strong stratification anomalies near the tropopause of the simulation amplify small vertical velocities there and make the vertical advection noisy overall.

In [None]:
sl = fields_3d.pipe(lambda x: liquid_water_temperature(x.TABS, x.QN, x.QP))
w = fields_3d.W

vert_adv = (sl.centderiv('z') * w).compute()

In [None]:
from scipy.ndimage import gaussian_filter

In [None]:
vert_adv.dims

In [None]:
X_xr = vert_adv.stack(samples=['time' ,'x', 'y']).transpose('samples', 'z')
X = X_xr.values

dm = (X - X.mean(axis=0))/.0005

# Denoising using autoencoders

Here I train an auto-encoder which ignores the top 10 grid points as inputs, and has a internal hidden layer with  5 nodes.

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout,  BatchNormalization
model = Sequential([
    Dense(34, input_shape=(34-10,)),
    Activation('relu'),
    Dense(5),
    Activation('relu'),
    Dense(34),
    Activation('relu'),
    Dense(34)]
    )


# For a mean squared error regression problem
model.compile(optimizer='rmsprop', 
              loss='mse')

model.fit(dm[:,:-10], dm, epochs=1)

In [None]:
denoised = model.predict(dm[:,:-10])*.0005 + X.mean(axis=0)
denoised = xr.DataArray(denoised, coords=X_xr.coords).unstack('samples')

In [None]:
denoised.isel(x=0,y=8).sel(time=slice(100, 120)).pipe(plot_tz)

As you can see, this data is substantially denoised, and might be a better forcing dataset. An alternative could be to alter the objective function to make the model fit slack.