#### Use Holoviews to compare HS2 and Ecopuck parameters from stoqs_mb_diamonds

This Notebook is part of the auv-python project (private repository at https://github.com/mbari-org/auv-python). It demonstrates how to read and make interactive plots of millions of data points accessed from a STOQS database.

To execute it (for example):

```bash
    cd GitHub  # Or other appropriate directory on your computer
    git clone https://github.com/mbari-org/auv-python.git
    cd auv-python
    poetry install
    poetry shell
    cd notebooks
    jupyter lab
    # Open this notebook and run it from your browser - interactive zooming does not work in VS Code
```

The url in the cell below was generated by going to https://stoqs.shore.mbari.org/stoqs_mb_diamonds/ and clicking the buttons of the Measured Parameters to be included in the Parquet file. Then clicking the "Measured Parameter Data Access" section and clicking the "Estimate requirements" button to verify that the estimated values are within the available values of the server. The cells below load the data into a Pandas DataFrame and makes interactive zoomable plots.

In [1]:
import pandas as pd
import panel.widgets as pnw
import pooch

# Construct the url to retrieve the data from the STOQS database:
parms = [
    "ecopuck_bbp700 (m^-1 sr^-1)",
    "ecopuck_cdom (ppb)",
    "ecopuck_chl (ug/l)",
    "hs2_bb420 (m-1)",
    "hs2_bb700 (m-1)",
    "hs2_fl700",
]
    
stoqs_url = "https://stoqs.shore.mbari.org/stoqs_mb_diamonds/api/measuredparameter.parquet?"
stoqs_url += "parameter__name=" + "&parameter__name=".join(parms)
stoqs_url += "&collect=name&include=activity__name"

# Takes several minutes to retrieve the data the first time, thereafter it's read from a local cache
file_name = pooch.retrieve(
    url = stoqs_url,
    known_hash="be75fa2c75be4e08b4cb72f345cf09b6aa6533713d4668f729d0955a4ad684da",
    progressbar=True,
)
df = pd.read_parquet(file_name)
df.describe()

name,ecopuck_bbp700 (m^-1 sr^-1),ecopuck_cdom (ppb),ecopuck_chl (ug/l),hs2_bb420 (m-1),hs2_bb700 (m-1),hs2_fl700
count,3450397.0,3450397.0,3450397.0,5483381.0,5483229.0,6161929.0
mean,0.0008322027,1.497288,2.652438,0.005167818,1.039454,0.0007445026
std,0.0009314967,1.04613,4.003378,0.009249174,2418.817,0.001257494
min,9.45507e-05,-1.041218,0.02455455,-0.002622544,-0.00523946,-0.09492986
25%,0.0003120663,1.22715,0.31171,0.002466258,0.001838945,0.0001042317
50%,0.0005455853,1.40895,1.05485,0.003690073,0.003388353,0.0002880702
75%,0.0009768606,1.61802,3.16163,0.006027729,0.006294274,0.0008601799
max,0.006669172,87.95484,29.8205,16.94382,5663972.0,0.1075705


In [None]:
# See that ecopuck data exist at the tail end
df.tail()

In [None]:
# Use datashader to make interactive biplots of the data
# Do not commit following cell outputs to the repository - they are too big!
import colorcet
import holoviews as hv
from holoviews.operation.datashader import datashade
hv.extension("bokeh")

In [None]:
# Compare ecopuck and HS2 Chlorophyll
pts_hs2eco = hv.Points(df, ['ecopuck_chl (ug/l)', 'hs2_fl700'])
plots = ( datashade(pts_hs2eco, cmap=colorcet.fire).opts(width=800, height=600) )
plots

In [None]:
# Compare ecopuck and HS2 420 BB
pts_hs2eco = hv.Points(df, ['ecopuck_bbp700 (m^-1 sr^-1)', 'hs2_bb420'])
plots = ( datashade(pts_hs2eco, cmap=colorcet.fire).opts(width=800, height=600) )
plots

In [None]:
# Compare ecopuck BB and HS2 700
pts_hs2eco = hv.Points(df, ['ecopuck_bbp700 (m^-1 sr^-1)', 'hs2_bb700'])
plots = ( datashade(pts_hs2eco, cmap=colorcet.fire).opts(width=800, height=600) )
plots

In [None]:
# Ecopuck BB vs. Chl
pts_hs2eco = hv.Points(df, ['ecopuck_bbp700 (m^-1 sr^-1)', 'ecopuck_chl (ug/l)'])
plots = ( datashade(pts_hs2eco, cmap=colorcet.fire).opts(width=800, height=600) )
plots

In [None]:
# HS2 BB vs. Chl
pts_hs2eco = hv.Points(df, ['hs2_bb420', 'hs2_fl700'])
plots = ( datashade(pts_hs2eco, cmap=colorcet.fire).opts(width=800, height=600) )
plots

In [None]:
# Compute proxy_cal_factor for use in 5.2-mpm-bg_biolume-PiO-paper.ipynb and src/data/resample.py
proxy_cal_factor = df['hs2_fl700'].quantile(0.99)
print(f"proxy_cal_factor = {proxy_cal_factor:.6f}")
# proxy_cal_factor = 0.006493