#### Produce transfer function between HS2 and EcoPuck data

This Notebook is part of the auv-python project (private repository at https://github.com/mbari-org/auv-python). It demonstrates how to read and make interactive plots of millions of data points accessed from a STOQS database.

To execute it (for example):

```bash
    cd GitHub  # Or other appropriate directory on your computer
    git clone https://github.com/mbari-org/auv-python.git
    cd auv-python
    poetry install
    poetry shell
    cd notebooks
    jupyter notebook
    # Open this notebook and run it from your browser - interactive zooming does not work in VS Code
```

The urls in the pooch.retrieve() calls  below were generated by going to https://stoqs.shore.mbari.org/stoqs_auv_compare/ and clicking the buttons of the Measured Parameters to be included in the Parquet file. Then clicking the "Measured Parameter Data Access" section and clicking the "Estimate requirements" button to verify that the estimated values are within the available values of the server. 

In [None]:
# Do all the imports here and then load the data so that we can randomly execute
# any of the plotting cells below

import colorcet
import holoviews as hv
import hvplot.pandas
import os
import ipywidgets as widgets
import pandas as pd
import panel as pn
import pooch
import statsmodels.api as sm
from bokeh.models.formatters import PrintfTickFormatter
from holoviews.operation.datashader import datashade

hv.extension("bokeh")


In [None]:
# Takes several minutes to retrieve the data the first time, thereafter it's read from a local cache
data_file = pooch.retrieve(
    url="https://stoqs.shore.mbari.org/stoqs_auv_compare_t/api/measuredparameter.parquet?parameter__name=ecopuck_bbp700+%28m%5E-1+sr%5E-1%29&parameter__name=ecopuck_chl+%28ug%2Fl%29&parameter__name=hs2_bb700+%28m-1%29&parameter__name=hs2_fl700&measurement__instantpoint__activity__platform__name=dorado&measurement__instantpoint__timevalue__gt=2020-09-02+02%3A39%3A37&measurement__instantpoint__timevalue__lt=2023-02-16+16%3A50%3A46&measurement__depth__gte=-8.36&measurement__depth__lte=265.11&collect=name&include=activity__name",
    known_hash="063071ffb1454b07625ac4b58c77f847830a798d37f68494d2c1c797fd097bc4",
)
df = pd.read_parquet(data_file)
df.describe()

In [None]:
df

In [None]:
# The following cells make time series comparison plots all of the diamond mission data, in order:
# 'adinos', 'bg_biolum', 'diatoms', 'hdinos', 'intflash', 'nbflash_high', 'nbflash_low', and 'profile'
# Do not commit following cell outputs to the repository - they are too big!

non_time_indx = ['platform', 'activity__name', 'depth', 'latitude', 'longitude']

In [None]:
hs2_plot = df.droplevel(non_time_indx)["hs2_bb700 (m-1)"].hvplot(width=800, height=300)
ecopuck_plot = df.droplevel(non_time_indx)["ecopuck_bbp700 (m^-1 sr^-1)"].hvplot()
hs2_plot * ecopuck_plot

The following cells make comparison biplots of all of the diamond mission data, in order:
'adinos', 'bg_biolum', 'diatoms', 'hdinos', 'intflash', 'nbflash_high', 'nbflash_low', and 'profile'
There should be a slope of 1.0 for all of the plots

In [None]:
def biplot(df, x="hs2_bb700 (m-1)", y="ecopuck_bbp700 (m^-1 sr^-1)"):
    # Use statsmodels and datashader to print regression info and make a biplot
    dfa = df[[x, y]].dropna()
    results = sm.OLS(dfa[y], dfa[x]).fit()
    print(results.summary())
    slope_plot = hv.Slope.from_scatter(hv.Scatter(dfa.to_numpy())).opts(line_width=1, color='red')
    pts = hv.Points(dfa, [x, y])
    # { and } cause problems in opts title, so we need to replace them in the x variables
    title = y + " = " + f"{results.params[0]:.4f}" + " * " + x.replace("{", r"").replace("}", "")
    return datashade(pts).opts(width=700, height=700, title=title) * slope_plot

In [None]:
biplot(df)