# Preview

# Issues

today
- branch nobm_day to nobm_mon and finish pipeline to get monthly rrs
- refactor construction of train, validate, test generation

later
- transform of outputs
- pca outputs, to reduce dimensionality as needed
- pca inputs, to reduce complexity
- test for signal
  - 1 vs 2 nearest neighbor outputs
  - chl-a retrieval algorithms
- dealing with unbalanced data (are they unbalanced?)
- try classification only

## Imports

In [None]:
import importlib
import os
import warnings
import datetime as dt

from IPython.display import Markdown
from scipy.stats import zscore
import holoviews as hv
import hvplot.xarray
import numpy as np
import pandas as pd
import panel as pn
import param as p
import xarray as xr

#from re_nobm_pcc import preprocess
from re_nobm_pcc import kit

warnings.filterwarnings(action='ignore', category=FutureWarning)
hv.opts.defaults(hv.opts.Curve(active_tools=[]), hv.opts.Image(active_tools=[]))
hv.extension('bokeh')

## Raw Data

The OASIM model requires absorption and backscattering for each phytoplankton group.

In [None]:
ds = []
for item in ['dia', 'chl', 'cya', 'coc', 'pha', 'din']:
    path = f'../data/oasim_param/{item}1.txt'
    df = pd.read_table(path, sep='\t', dtype={0: int})
    df.columns = ('wavelength', 'absorption', 'scattering')
    da = df.set_index('wavelength').to_xarray().expand_dims('component')
    da['component'] = [item]
    ds.append(da)
ds = xr.concat(ds, 'component')
(
    ds.hvplot.line(x='wavelength', y='absorption', by='component')
    + ds.hvplot.line(x='wavelength', y='scattering', by='component')
).cols(1)

The NOBM data provided by Cecile contains the ocean constituents that are sufficient inputs for the OASIM Fortran library to calculte Rrs.

Below 350nm however, there is no phytoplankton absorption data so those Rrs values should be ignored.

In [None]:
paths = [f'../data/rrs_day/rrs{1998+i}{1+j:02}.nc' for j in range(12) for i in range(24)]
ds = xr.open_mfdataset(paths)

In [None]:
# FIXME move to re_nobm_pcc.simulate.py once oasim is fixed
ds = ds.roll({'lon': ds.sizes['lon'] // 2})
coords = xr.Dataset(
    coords={
        'lon': np.linspace(-180, 180, ds.sizes['lon']),
        'lat': np.linspace(-84, 71.4, ds.sizes['lat'])
    },
)
ds = xr.merge((ds.drop_vars(('lon', 'lat')), coords))

In [None]:
class Dashboard(p.Parameterized):
    
    # part of the GUI
    date = p.Date(dt.date(1998, 1, 1))
    h2o = p.Selector(['tot', 'dtc', 'pic', 'cdc', 't', 's'], label='Ocean Property Variable')
    phy = p.Selector(ds['component'].values.tolist(), label='Phytoplankton Group')
    # needed as dependencies, not part of the GUI
    data = p.ClassSelector(xr.Dataset)
    stream = hv.streams.Tap(x=0, y=0)
    
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.tap = xr.DataArray(
            np.empty((ds.sizes['wavelength'], 0), dtype=ds['rrs'].dtype),
            dims=('wavelength', 'tap'),
            name='rrs',
        )
    
    @p.depends('date', watch=True, on_init=True)
    def _load_date(self):
        self.data = ds.sel({'date': np.datetime64(self.date)}).load()
        
    @p.depends('data', 'h2o')
    def plt_h2o(self):
        da = self.data[self.h2o]
        return da.hvplot.image(x='lon', y='lat', clabel=self.h2o, title='')
    
    @p.depends('data', 'phy')
    def plt_phy(self):
        da = self.data['phy'].sel({'component': self.phy})
        plt = da.hvplot.image(x='lon', y='lat', clabel=self.phy, title='')
        self.stream.source = plt
        return plt

    @p.depends('stream.x', 'stream.y')
    def plt_rrs(self):
        da = self.data['rrs'].sel(
            {'lon': self.stream.x, 'lat': self.stream.y},
            method='nearest',
        )
        da = da.expand_dims('tap')
        self.tap = xr.concat((self.tap, da), dim='tap')
        return self.tap.hvplot(x='wavelength', by='tap', title='')
    
    
dash = Dashboard(name='NOBM Variables and Computed Rrs')
pn.Column(
    pn.panel(dash.param, parameters=['date', 'h2o', 'phy'], widgets={'date': pn.widgets.DatePicker}),
    dash.plt_h2o,
    dash.plt_phy,
    dash.plt_rrs,
)

# OBPG Algorithms

OC4 (SeaWiFS) from https://oceancolor.gsfc.nasa.gov/atbd/chlor_a/

## Preprocessed Data

The features and labels are both model output from NASA GMAO using the [NOBM and OASIM](https://gmao.gsfc.nasa.gov/gmaoftp/NOBM) models. The labels are four phytoplankton chlorophyll densities output by NOBM. The features are normalized water leaving radiances output by OASIM, using the NOBM model as input.

### Features

One NetCDF file contains all the predictor data. Note that the `FillValue` attribute is not set to `9.99e11` in the netCDF file (Cecile will fix in next version). There are no explicit coordinates given; they are documented as attributes.

## Labels

Each of twelve NetCDF files contain a month of NOBM model output. The first is representative. Unlike the HyperLwn file, this one contains coordinates.

The `PhytoChl` xarray.Dataset includes the different phytoplankton groups as variables.

# Plot your Data

## Features

The radiances currently make a nice map, but the data should be more sparsely sampled.

A few "typical" hyperspectral radiances.

Mean centered radiances and corresponding phytoplankton abundances.

SVD to reduce the wavelength dimension to `k` vectors accounting for the most variation in the features. The singular values are:

The corresponding vectors:

A matrix of univariate (diagonal) and bivariate (off-diagonal) histograms of the `scores`, or coefficients generating each wavelength by linear combination of the `vectors` above.

## Labels

A map of the phytoplankton labels in `PhytoChl` at one month.

The distribution of the four phytoplankton groups.