Given the problems with incompressibility and other things, I have completely changed the strategy for the training dataset, in this notebook I will analyze the new version of the forcing data.

In [None]:
import xarray as xr
import holoviews as hv
import torch
import glob
from collections.abc import Mapping
from holoviews import streams
hv.extension('bokeh')
%opts Image[colorbar=True, invert_yaxis=True, width=500, height=200] (cmap='viridis') {+framewise +axiswise}
%opts Raster[colorbar=True, invert_yaxis=True, width=500, height=200] (cmap='viridis') {+framewise +axiswise}
%opts QuadMesh[colorbar=True, invert_yaxis=True, width=500, height=200] (cmap='viridis') {+framewise +axiswise}

In [None]:
ds = xr.open_dataset("../data/training_data.nc")
ds

Let's look at a typical grid point on the tropics (x,y) = (0,32)...

In [None]:
loc = ds.isel(x=0,y=32)

In [None]:
%%opts Curve[width=500, height=100]
lay=  hv.Raster(loc.FQT.values.T, label='FQT')  + hv.Raster(loc.FSLI.values.T, label='FSLI') \
+ hv.Curve(loc.Prec.values, vdims=['Prec'])
lay.cols(1).redim(x='time', z='c', y='z')

It looks like this new data does not have the banding artifact near the boundary layer. What about the zonal mean?

In [None]:
mu = ds.mean(['x', 'time'])

In [None]:
lay =  hv.Raster(mu.FSLI.values, label='FSLI') + hv.Raster(mu.FQT.values, label='FQT')
lay.cols(1).redim(x='y', z='c', y='z')

There still is some strange artifacts near the edge of the domain for FSLI, but that is not unexpected. The humidity forcing looks better. Maybe there are still some bugs in how I'm computing temperature.

# Horizontal momentum tendencies

These are also kind of strange. I am not sure if these are physical or not.

In [None]:
%%opts Raster[width=400, height=200]{+axiswise}

fields = ['U', 'V', 'FV', 'FU']
hmap = hv.HoloMap({key: hv.Raster(mu[key].values) for key in fields}).redim(z='c', x='y', y='z')
hmap.layout().cols(2)

# Comparison with debugging data

In [None]:
id = 'deadly_becquerel'
path = f'../data/samNN/{id}/NG1_test_0000002.pkl'


def norm(x):
    return np.sqrt((x**2).mean(axis=(-1,-2)))

# get first debugging point
dbg = torch.load(f'../data/samNN/{id}/NG1_test_0000001.pkl')
state, dt = dbg['args']

# get first time point
ds = xr.open_dataset("../data/training_data.nc").isel(time=0)


In [None]:
norm(ds.W-state['W']).plot()

In [None]:
norm(ds['FQT']-state['FQT']).plot()

In [None]:
(norm(ds['FSLI']-state['FSLI'])*86400).plot()

In [None]:
(norm(ds['FQT']-state['FQT'])*86400).plot()

There is still a big discrepency in FQT and FSL in the first time point

In [None]:
ds.FQT[5].plot()

In [None]:
def index_like(x, y):
    if isinstance(x, Mapping):

        keys = set(x) & set(y.data_vars)
        return xr.Dataset({key: index_like(x[key], y[key]) for key in keys})
    else:
        if x.shape[0] == 1:
            x = x[0]
        return xr.DataArray(x, dims=y.dims, coords=y.coords)
    
    
def open_debug_state_like_ds(path: str, ds: xr.Dataset) -> xr.Dataset:
    """Open SAM debugging output as xarray object
    
    Parameters
    ----------
    path
        path to pickle saved by torch
    ds
        dataset to use a template
        
    Returns
    -------
    state
        dataset with fields from path
    """
    dbg = torch.load(path)
    state = dbg['args'][0]
    out = dbg['out']
    return index_like(state, ds)


def concat_datasets(args, name='mode'):
    """Concatenate datasets with a new named index
    
    This function is especially useful for comparing two datasets with
    shared variables with holoviews
    
    Parameters
    ----------
    args : list
        list of (name, dataset) pairs
    name : str
        name of the new dimension
    
    Returns
    -------
    ds : xr.Dataset
        concatenated data
    """
    
    names, vals = zip(*args)
    
    # get list of vars
    vars = set(vals[0])
    for val in vals:
        vars = vars & set(val)
    vars = list(vars)
    
    vals = [val[vars] for val in vals]
        
    return xr.concat(vals, dim=pd.Index(names, name=name))




i = 4
path = glob.glob(f'../data/samNN/{id}/NG1_test_0000*.pkl')[i]
dbg = open_debug_state_like_ds(path, ds)
cds = concat_datasets([('DBG', dbg),('Train', ds)], name='source')

In [None]:
for key in dbg:
    if dbg[key].ndim > 1:
        print(norm(dbg[key]-ds[key]))

All these should be identically zero. That they are not indicates some problem.

In [None]:
%%opts Curve[invert_axes=True] {+framewise}

variables_to_plot = [ 'FSLI', 'FQT','SLI', 'QT', 'W', 'U']
data_to_plot = cds[variables_to_plot].to_array(dim='variable', name='value')


hv.Dataset(data_to_plot).to.curve("z", dynamic=True)\
  .overlay("source")


The debugging and training data are mostly similar although not identical. However, the neural network should hopefully not be sensitive to these types of small differences. I need to check, but I suspect the problem is that the network is simply too sensitive to FQT and FSLI. Now that I have ironed out many of the issues with the training data

there are some points with extremely different points. Here is a little interface for exploring the domain.

In [None]:
%%opts Curve[invert_axes=True] {+framewise}

variables_to_plot = [ 'FSLI', 'FQT','SLI', 'QT', 'W', 'U']
data_to_plot = cds[variables_to_plot].to_array(dim='variable', name='value')


def curves( x, y):
    if None in [x, y]:
        x, y = (0,0)
        
    return hv.Dataset(cds['W'].sel(x=x, y=y, method='nearest')).to.curve("z").overlay("source")
    


w_im = hv.Image(ds.Prec)
# pointer = streams.SingleTap(transient=True, source=w_im)
pointer = streams.PointerXY(x=0,y=0, source=w_im)


dmap = hv.DynamicMap(curves, kdims=['x', 'y'], streams=[pointer]).redim.values(key=variables_to_plot)


dmap.select(key='W').redim.range(W=(-.3, .3))  +  w_im