# PAR

### Purpose
The purpose of this notebook is to calculate the gross range user min/max values as well as the seasonally-varying climatology range table for populating QARTOD parameter tables for data streams for OOI - CGSN data streams for the Photosynthetically-active-radiation (PAR) sensors deployed by OOI. This instrument is the Biospherical Instruments QSP-2200 and is deployed only on Coastal wire-following-profilers.

Due to sparsity of identifiably-good data as well as the nature of PAR, we elected to generate the gross range and climatology values using data obtained from the "NOAA MSL12 Ocean Color - Science Quality - VIIRS SNPP" coupled with an vertical model with exponentially-decreasing PAR as a function of depth to derive the expected values and ranges.

### Test Parameters

| Dataset Name | OOINet Name | Range |
| ------------ | ----------- | ----- |
| parad_k_par  | parad_k_par | 0 - 5000 $\mu$mol photons m$^{-2}$ s$^{-1}$ |

#### Import libraries

In [None]:
import os, sys, datetime, pytz, re
import dateutil.parser as parser
import pandas as pd
import numpy as np
import xarray as xr
import warnings
import gc
import json
warnings.filterwarnings("ignore")

In [None]:
from dask.diagnostics import ProgressBar

#### Import the ```ooinet``` M2M toolbox

In [None]:
sys.path.append("/home/areed/Documents/OOI/reedan88/ooinet/")
from ooinet import M2M

#### Install the ```pysolar``` package if not already installed

In [None]:
from ftplib import FTP
!pip install pysolar
from pysolar.solar import get_altitude
from pysolar.radiation import get_radiation_direct

#### Import ```ooi_data_explorations``` toolbox

In [None]:
sys.path.append("/home/areed/Documents/OOI/oceanobservatories/ooi-data-explorations/python/")
from ooi_data_explorations.qartod.qc_processing import process_gross_range, process_climatology, woa_standard_bins, \
    inputs, ANNO_HEADER, CLM_HEADER, GR_HEADER

---
## Identify Data Streams
This section is necessary to identify all of the data stream associated with a specific instrument. This can be done by querying UFrame and iteratively walking through all of the API endpoints. The results are saved into a csv file so this step doesn't have to be repeated each time.

First, set the instrument to search for using OOI terminology:

In [None]:
instrument = "PAR"

### Query OOINet for Data Streams <br>
First check if the datasets have already been downloaded; if not, use the ```M2M.search_datasets``` tool to search the OOINet API and return a table of all of the available datasets for the given instruments.

In [None]:
try:
    datasets = pd.read_csv("../data/PAR_datasets.csv")
except:
    datasets = M2M.search_datasets(instrument="PAR", English_names=True)
    # Save the datasets
    datasets.to_csv("../data/PAR_datasets.csv", index=False)

Separate out the CGSN datasets from the EA and RCA datasets:

In [None]:
cgsn = datasets["array"].apply(lambda x: True if x.startswith(("CP","GA","GI","GP","GS")) else False)
datasets = datasets[cgsn]

Remove the ```PARAD``` mounted on gliders and AUVS ("MOAS") as well as surface-piercing profilers (CSPPs)

In [None]:
mask = datasets["refdes"].apply(lambda x: False if "MOAS" in x or "SP001" in x else True)
datasets = datasets[mask]
datasets

---
## Single Reference Designator
The reference designator acts as a key for an instrument located at a specific location. First, select a reference designator (refdes) to request data from OOINet.

In [None]:
reference_designators = sorted(cgsn_datasets["refdes"])
print("Number of reference designators: " + str(len(reference_designators)))
for refdes in reference_designators:
    print(refdes)

#### Select a reference designator

In [None]:
#k=6
#refdes = reference_designators[k]
refdes = "CP02PMUO-WFP01-05-PARADK000"
print(refdes)

#### Sensor Vocab
The vocab provides information about the instrument model and type, its location (with descriptive names), depth, and manufacturer. Get the vocab for the given reference designator.

In [None]:
vocab = M2M.get_vocab(refdes)
vocab

#### Sensor Deployments
Download the deployment information for the selected reference designator:

In [None]:
deployments = M2M.get_deployments(refdes)
deployments

#### Sensor Data Streams
Next, select the specific data streams for the given reference designator

In [None]:
datastreams = M2M.get_datastreams(refdes)
datastreams

---
## Metadata 
The metadata contains the following important key pieces of data for each reference designator: **method**, **stream**, **particleKey**, and **count**. The method and stream are necessary for identifying and loading the relevant dataset. The particleKey tells us which data variables in the dataset we should be calculating the QARTOD parameters for. The count lets us know which dataset (the recovered instrument, recovered host, or telemetered) contains the most data and likely has the best record to use to calculate the QARTOD tables. 

In [None]:
metadata = M2M.get_metadata(refdes)
metadata

#### Sensor Parameters
Each instrument returns multiple parameters containing a variety of low-level instrument output and metadata. However, we are interested in science-relevant parameters for calculating the relevant QARTOD test limits. We can identify the science parameters based on the preload database, which designates the science parameters with a "data level" of L1 or L2. 

Consequently, we through several steps to identify the relevant parameters. First, we query the preload database with the relevant metadata for a reference designator. Then, we filter the metadata for the science-relevant data streams. 

In [None]:
def filter_science_parameters(metadata):
    """This function returns the science parameters for each datastream"""
    
    def filter_parameter_ids(pdId, pid_dict):
        data_level = pid_dict.get(pdId)
        if data_level is not None:
            if data_level > 0:
                return True
            else:
                return False
        else:
            return False
    
    # Filter the parameters for processed science parameters
    data_levels = M2M.get_parameter_data_levels(metadata)
    mask = metadata["pdId"].apply(lambda x: filter_parameter_ids(x, data_levels))
    metadata = metadata[mask]

    return metadata

def filter_metadata(metadata):
    science_vars = filter_science_parameters(metadata)
    # Next, eliminate the optode temperature from the stream
    mask = science_vars["particleKey"].apply(lambda x: False if "temp" in x else True)
    science_vars = science_vars[mask]
    science_vars = science_vars.groupby(by=["refdes","method","stream"]).agg(lambda x: pd.unique(x.values.ravel()).tolist())
    science_vars = science_vars.reset_index()
    science_vars = science_vars.applymap(lambda x: x[0] if len(x) == 1 else x)
    science_vars = science_vars.explode(column="particleKey")
    return science_vars

In [None]:
science_vars = filter_science_parameters(metadata)
science_vars = science_vars.groupby(by=["refdes","method","stream"]).agg(lambda x: pd.unique(x.values.ravel()).tolist())
science_vars = science_vars.reset_index()
science_vars = science_vars.applymap(lambda x: x[0] if len(x) == 1 else x)
science_vars

---
## Generate the expected irradiance

Data obtained from the "NOAA MSL12 Ocean Color - Science Quality - VIIRS SNPP" satellite data products page, downloading the L3 monthly KdPAR data from the FTP server (ERDDAP and THREDDS servers proved too unstable to rely on). See the data products web page for more information: https://coastwatch.noaa.gov/cw/satellite-data-products/ocean-color/science-quality/viirs-snpp.html. 

First check to see if the data has been downloaded, if not do so first

In [None]:
from ooinet import Download
from queue import Queue

Download the Coastwatch data via FTP in three steps:
1. Connect to the FTP server
2. Setup the directory to download the Coastwatch data to
3. Download the data

In [None]:
# Step 1
ftp_server = 'ftp.star.nesdis.noaa.gov'
ftp_server_path = '/pub/socd1/mecb/coastwatch/viirs/science/L3/global/kd/monthly/WW00/'
ftp = FTP(ftp_server)
ftp.login(user='anonymous')
ftp.cwd(ftp_server_path)
# List the files
files = ftp.nlst('*kdpar.nc')

# Step 2
saveDir = f"../data/coastwatch/raw/"
saveDir = os.path.abspath(saveDir)
Download.setup_download_dir(saveDir)

# Step 3
for file in files:
    download_path = "/".join((saveDir, file))
    # Check if the file has been downloaded
    if not (os.path.isfile(download_path)) or (os.path.getsize(download_path) == 0):
        with open(download_path, 'wb') as f:
            ftp.retrbinary('RETR %s' % file, f.write)
    else:
        continue

Close the FTP connection

In [None]:
ftp.close()

---
## Process the CoastWatch data

In [None]:
# Get the latitude and longitude of the deployments
latitude = deployments["latitude"].mean()
longitude = deployments["longitude"].mean()

# Get the min depth and max depth of the instrument
min_depth = vocab["mindepth"].values
max_depth = vocab["maxdepth"].values

Load the coastwatch data

In [None]:
with ProgressBar():
    kd = xr.open_mfdataset("../data/coastwatch/raw/*.nc", combine='nested', concat_dim='time', engine='netcdf4', parallel=True)

Clean up the dataset & limit the geographic extent to the site under question

In [None]:
kd = kd.drop_vars(['coord_ref', 'palette'])
kd = kd.where((kd['lat'] >= latitude - 0.09375) & (kd['lat'] <= latitude + 0.09375), drop=True)
kd = kd.where((kd['lon'] >= longitude - 0.09375) & (kd['lon'] <= longitude + 0.09375), drop=True)
kd

Take the mean of the CoastWatch PAR data across the altitude, latitude, and longitude coordinates

In [None]:
kd = kd.mean(dim=['altitude', 'lat', 'lon'], keep_attrs=True)
kd = kd.sortby('time')
kd = kd.compute()
kd

Calculate clear sky irradiance at solar noon for this site using the Kd(PAR) time record

In [None]:
date = pd.to_datetime(kd.time).to_pydatetime()
surface = []
for d in date:
    dt = d.replace(tzinfo=pytz.timezone('US/Pacific'))
    altitude = get_altitude(latitude, longitude, dt)
    # PAR is approximately 50% of the shortwave radiation, and we need to convert from W/m^2 to umol/m^2/s
    surface.append(get_radiation_direct(dt, altitude) * 0.5 / 0.21739130434)

Create a 2D array with Ed(PAR) estimated as a function of depth from the satellite Kd(PAR) values and model estimates of clear-sky irradiance

In [None]:
depths = np.arange(min_depth, max_depth + 0.125, 0.125)
ed = np.zeros([len(date), len(depths)])
for i in range(len(date)):
    ed[i, :] = surface[i] * np.exp(-kd.kd_par.values[i] * depths)

Convert to an xarray dataset

In [None]:
# convert to an xarray dataset
ed = xr.Dataset({
    'parad_k_par': (['time', 'depth'], ed),
}, coords={'time': kd.time.values, 'depth': depths})




---
## Gross Range
The Gross Range QARTOD test consists of two parameters: a fail range which indicates when the data is bad, and a suspect range which indicates when data is either questionable or interesting. The fail range values are set based upon the instrument/measurement and associated calibration. The user range for ```PAR``` will be based on the estimation from near-surface values

Set the parameters and the gross range fail range

In [None]:
parameters = ['parad_k_par']
limits = [0, 5000]

# create the initial gross range entry
quantile = ed['depth'].quantile(0.01).values     # upper 1% of the depth array
sub = ed.where(ed.depth <= quantile, drop=True)  # limit gross range estimation to near-surface values
sub = sub.max(dim=['depth'], keep_attrs=True)

Create the initial gross range entry

In [None]:
quantile = ed['depth'].quantile(0.01).values     # upper 1% of the depth array
sub = ed.where(ed.depth <= quantile, drop=True)  # limit gross range estimation to near-surface values
sub = sub.max(dim=['depth'], keep_attrs=True)

**Generate the gross range lookup table**

In [None]:
# Generate the gross range lookup table
site, node, sensor = refdes.split("-", 2)

gross_range_table = pd.DataFrame()
for index in datastreams.index:
    method = datastreams["method"].loc[index]
    stream = datastreams["stream"].loc[index]
    gr_lookup = process_gross_range(sub, parameters, limits, site=site, node=node, sensor=sensor, stream=stream)

    # add the stream name and the source comment
    gr_lookup['notes'] = ('User range modeled from data collected by the NOAA VIIRS satellite and estimates of '
                          'clear sky irradiance from the pysolar package.')
    gross_range_table = gross_range_table.append(gr_lookup, ignore_index=True)

**Check the results**

In [None]:
gross_range_table

In [None]:
for ind in gross_range_table.index:
    print(gross_range_table.loc[ind]["qcConfig"])

**Save the gross range table**

In [None]:
gross_range_table.to_csv(f"../results/gross_range/{refdes}.csv", index=False, columns=GR_HEADER)

---
## Climatology
For the climatology QARTOD test, First, we bin the data by month and take the mean. The binned-montly means are then fit with a 2 cycle harmonic via Ordinary-Least-Squares (OLS) regression. Ranges are calculated based on the 3$\sigma$ calculated from the OLS-fitting. For the PAR, the data we are fitting is coming from the CoastWatch depth model we generated earlier in the notebook.

In [None]:
from ooi_data_explorations.qartod.climatology import Climatology

In [None]:
def make_climatology_table(ds, param, tinp, zinp, sensor_range, depth_bins):
    """Function which calculates the climatology table based on the """
    
    climatologyTable = pd.DataFrame()
    
    if depth_bins is None:
        # Filter out the data outside the sensor range
        m = (ds[param] > sensor_range[0]) & (ds[param] < sensor_range[1]) & (~np.isnan(ds[param]))
        param_data = ds[param][m]
        
        # Fit the climatology for the selected data
        pmin, pmax = [0, 0]
        
        try:
            climatology = Climatology()
            climatology.fit(param_data)

            # Create the depth index
            zspan = pd.interval_range(start=pmin, end=pmax, periods=1, closed="both")

            # Create the monthly bins
            tspan = pd.interval_range(0, 12, closed="both")

            # Calculate the climatology data
            vmin = climatology.monthly_fit - climatology.monthly_std*3
            vmin = np.floor(vmin*100000)/100000
            for vind in vmin.index:
                if vmin[vind] < sensor_range[0] or vmin[vind] > sensor_range[1]:
                    vmin[vind] = sensor_range[0]
            vmax = climatology.monthly_fit + climatology.monthly_std*3
            for vind in vmax.index:
                if vmax[vind] < sensor_range[0] or vmax[vind] > sensor_range[1]:
                    vmax[vind] = sensor_range[1]
            vmax = np.floor(vmax*100000)/100000
            vdata = pd.Series(data=zip(vmin, vmax), index=vmin.index).apply(lambda x: [v for v in x])
            vspan = vdata.values.reshape(1,-1)

            # Build the climatology dataframe
            climatologyTable = climatologyTable.append(pd.DataFrame(data=vspan, columns=tspan, index=zspan))

        except:
            # Here is where to create nans if insufficient data to fit
            # Create the depth index
            zspan = pd.interval_range(start=pmin, end=pmax, periods=1, closed="both")

            # Create the monthly bins
            tspan = pd.interval_range(0, 12, closed="both")

            # Create a series filled with nans
            vals = []
            for i in np.arange(len(tspan)):
                vals.append([np.nan, np.nan])
            vspan = pd.Series(data=vals, index=tspan).values.reshape(1,-1)

            # Add to the data
            climatologyTable = climatologyTable.append(pd.DataFrame(data=vspan, columns=tspan, index=zspan))
            
        del ds, vspan, tspan, zspan
        gc.collect()
        
    else:        
    # Iterate through the depth bins to calculate the climatology for each depth bin
        for dbin in depth_bins:
            # Get the pressure range to bin from
            pmin, pmax = dbin[0], dbin[1]

            # Select the data from the pressure range
            bin_data = ds.where((ds[zinp] >= pmin) & (ds[zinp] <= pmax), drop=True)

            # sort based on time and make sure we have a monotonic dataset
            bin_data = bin_data.sortby('time')
            _, index = np.unique(bin_data['time'], return_index=True)
            bin_data = bin_data.isel(time=index)

            # Filter out the data outside the sensor range
            m = (bin_data[param] > sensor_range[0]) & (bin_data[param] < sensor_range[1]) & (~np.isnan(bin_data[param]))
            bin_data = bin_data.where(m, drop=True)
            param_data = bin_data[param]

            # Fit the climatology for the selected data
            try:
                climatology = Climatology()
                climatology.fit(param_data)

                # Create the depth index
                zspan = pd.interval_range(start=pmin, end=pmax, periods=1, closed="both")

                # Create the monthly bins
                tspan = pd.interval_range(0, 12, closed="both")

                # Calculate the climatology data
                vmin = climatology.monthly_fit - climatology.monthly_std*3
                vmin = np.floor(vmin*100000)/100000
                for vind in vmin.index:
                    if vmin[vind] < sensor_range[0] or vmin[vind] > sensor_range[1]:
                        vmin[vind] = sensor_range[0]
                vmax = climatology.monthly_fit + climatology.monthly_std*3
                vmax = np.floor(vmax*100000)/100000
                for vind in vmax.index:
                    if vmax[vind] < sensor_range[0] or vmax[vind] > sensor_range[1]:
                        vmax[vind] = sensor_range[1]
                vdata = pd.Series(data=zip(vmin, vmax), index=vmin.index).apply(lambda x: [v for v in x])
                vspan = vdata.values.reshape(1,-1)

                # Build the climatology dataframe
                climatologyTable = climatologyTable.append(pd.DataFrame(data=vspan, columns=tspan, index=zspan))

            except:
                # Here is where to create nans if insufficient data to fit
                # Create the depth index
                zspan = pd.interval_range(start=pmin, end=pmax, periods=1, closed="both")

                # Create the monthly bins
                tspan = pd.interval_range(0, 12, closed="both")

                # Create a series filled with nans
                vals = []
                for i in np.arange(len(tspan)):
                    vals.append([np.nan, np.nan])
                vspan = pd.Series(data=vals, index=tspan).values.reshape(1,-1)

                # Add to the data
                climatologyTable = climatologyTable.append(pd.DataFrame(data=vspan, columns=tspan, index=zspan))

            del bin_data, vspan, tspan, zspan
            gc.collect()
    
    return climatologyTable#, climatology

**Get the depth bins and filter based on max depth**

In [None]:
depth_bins = woa_standard_bins()
pmax = data["depth"].max().values
pmin = data["depth"].min().values
mask = (depth_bins[:, 0] < pmax) | ((depth_bins[:, 0] < pmax) & (depth_bins[:, 1] > pmax)) | (depth_bins[:, 1] < pmin)
depth_bins = depth_bins[mask]

In [None]:
# Initialize the climatology lookup table
climatologyLookup = pd.DataFrame()

# Setup the Table Header
TBL_HEADER = ["[1,1]","[2,2]","[3,3]","[4,4]","[5,5]","[6,6]","[7,7]","[8,8]","[9,9]","[10,10]","[11,11]","[12,12]"]

# Set the subsite-node-sensor
subsite, node, sensor = refdes.split("-", 2)

# Set the parameters
param = "parad_k_par"

# ----------------- Depth tables ---------------------
# Get the sensor range of the parameter to test
print(f"##### Calculating climatology for {param} #####")
sensor_range = [0, 5000]
        
# Generate the climatology table with the depth bins
climatologyTable = make_climatology_table(ed, param, "time", "depth", sensor_range, depth_bins)

# Create the tableName
tableName = f"{refdes}-{param}.csv"

# Save the results
climatologyTable.to_csv(f"../results/climatology/climatology_tables/{tableName}", header=TBL_HEADER)
        
# ------------------ Lookup tables ------------------
# Check which streams have the param in it
streams = np.unique(datastreams["stream"])
for stream in streams:
    qc_dict = {
        "subsite": subsite,
        "node": node,
        "sensor": sensor,
        "stream": stream,
        "parameters": {
            "inp": param,
            "tinp": "time",
            "zinp": "depth",
        },
        "climatologyTable": f"climatology_tables/{refdes}-{param}.csv",
        "source": "Climatology values are calculated from and applicable to standard depth bins.",
        "notes": "User range modeled from data collected by the NOAA VIIRS satellite and estimates of \
                  clear sky irradiance from the pysolar package."
    }
    # Append to the lookup table
    climatologyLookup = climatologyLookup.append(qc_dict, ignore_index=True)

**Check the last climatologyTable for reasonableness**

In [None]:
climatologyTable

**Check the climatologyLookup table that all the entries made it in**

In [None]:
climatologyLookup

In [None]:
for p in climatologyLookup["parameters"]:
    print(p)

In [None]:
for t in climatologyLookup["climatologyTable"]:
    print(t)

**Save the climatologyLookup table**

In [None]:
climatologyLookup.to_csv(f"../results/climatology/{refdes}.csv", index=False, columns=CLM_HEADER)