# Climatology 

### Purpose
This notebook assess how well the QARTOD climatology test, as implemented by OOI, performs with respect to accurately capturing and flagging the passage of a mid-water-column eddy across the OOI Global Argentine Basin array in July-August of 2016. 

In [None]:
import sys, os
import yaml
import json
import pandas as pd
import numpy as np
import xarray as xr
import datetime
import gc
import warnings
warnings.filterwarnings("ignore")

In [None]:
# Import the QARTOD test fitting
sys.path.append("/home/andrew/Documents/OOI-CGSN/oceanobservatories/ooi-data-explorations/python/ooi_data_explorations/")
from qartod import gross_range, climatology

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

#### Connect to OOI Data Explorer ERDDAP Server

In [None]:
from erddapy import ERDDAP
# Connect to the server
erd = ERDDAP(
    server="https://erddap.dataexplorer.oceanobservatories.org/erddap",
    protocol="tabledap"
)

#### Set OOINet API access
In order access and download data from OOINet, need to have an OOINet api username and access token. Those can be found on your profile after logging in to OOINet. Your username and access token should NOT be stored in this notebook/python script (for security). It should be stored in a yaml file, kept in the same directory, named user_info.yaml.

In [None]:
# Import the OOINet M2M tool
sys.path.append("/home/andrew/Documents/OOI-CGSN/ooinet/ooinet/")
from m2m import M2M

In [None]:
userinfo = yaml.load(open("../../../../QAQC_Sandbox/user_info.yaml"), Loader=yaml.SafeLoader)
username = userinfo["apiname"]
token = userinfo["apikey"]

In [None]:
OOINet = M2M(username, token)

---
## Identify Datasets
Identify all of the OOI **```CTD```** datasets that are located at the Global Argentine Basin Array on the Apex Surface Mooring, Flanking Mooring A, and Flanking Mooring B. This is done by querying OOINet and iteratively walking through all of the API endpoints. Additionally, we request the **vocab** associated with each dataset in order to get their deployed depth.

#### Argentine Basin Apex Surface Mooring (SUMO) CTDs

In [None]:
ga01sumo_datasets = OOINet.search_datasets(array="GA01SUMO", instrument="CTD", English_names=True)

# Get the instrument deployment depth
depths = pd.DataFrame()
for refdes in ga01sumo_datasets["refdes"]:
    vocab = OOINet.get_vocab(refdes)
    depths = depths.append({
        "refdes": refdes,
        "depth" : int(vocab["mindepth"])
    }, ignore_index=True)

# Merge the deployment depth with the datasets
ga01sumo_datasets = ga01sumo_datasets.merge(depths, how="left", left_on=["refdes"], right_on=["refdes"])
ga01sumo_datasets = ga01sumo_datasets.sort_values(by="depth")
ga01sumo_datasets

Save the datasets

In [None]:
ga01sumo_datasets.to_csv("../results/GA01SUMO_datasets.csv", index=False)

#### Argentine Basin Flanking Mooring A (FLMA) CTD Datasets

In [None]:
ga03flma_datasets = OOINet.search_datasets(array="GA03FLMA", instrument="CTD", English_names=True)

# Get the instrument deployment depth
depths = pd.DataFrame()
for refdes in ga03flma_datasets["refdes"]:
    vocab = OOINet.get_vocab(refdes)
    depths = depths.append({
        "refdes": refdes,
        "depth" : int(vocab["mindepth"])
    }, ignore_index=True)

# Merge the deployment depth with the datasets
ga03flma_datasets = ga03flma_datasets.merge(depths, how="left", left_on=["refdes"], right_on=["refdes"])
ga03flma_datasets = ga03flma_datasets.sort_values(by="depth")
ga03flma_datasets

Save the results

In [None]:
ga03flma_datasets.to_csv("../results/GA03FLMA_datasets.csv", index=False)

#### Argentine Basin Flanking Mooring B (FLMB) CTDs

In [None]:
ga03flmb_datasets = OOINet.search_datasets(array="GA03FLMB", instrument="CTD", English_names=True)

# Get the instrument deployment depth
depths = pd.DataFrame()
for refdes in ga03flmb_datasets["refdes"]:
    vocab = OOINet.get_vocab(refdes)
    depths = depths.append({
        "refdes": refdes,
        "depth" : int(vocab["mindepth"])
    }, ignore_index=True)

# Merge the deployment depth with the datasets
ga03flmb_datasets = ga03flmb_datasets.merge(depths, how="left", left_on=["refdes"], right_on=["refdes"])
ga03flmb_datasets = ga03flmb_datasets.sort_values(by="depth")
ga03flmb_datasets

Save the results

In [None]:
ga03flmb_datasets.to_csv("../results/GA03FLMB_datasets.csv", index=False)

---
## Data
The next step is to get the fetch the data from each of the datasets identified above. For this notebook, we're going to request, access, and open the CTD data from the OOI DataExplorer ERDDAP server using the **```ERDDAPY```** python package (https://github.com/ioos/erddapy). We'll save each dataset into a dictionary keyed with the dataset reference designator.

#### GA01SUMO

In [None]:
ga01sumo = dict.fromkeys(ga01sumo_datasets["refdes"])
for refdes in ga01sumo_datasets["refdes"]:
    datasets = pd.read_csv(erd.get_search_url(search_for=refdes.lower(), response="csv"))["Dataset ID"]
    dset = datasets.iloc[0]
    erd.dataset_id = dset
    data = erd.to_xarray()
    data = data.swap_dims({"obs":"time"})
    ga01sumo[refdes] = data

Next, we want to plot and explore the data to identify useful datasets for this analysis

In [None]:
nfigs = len(ga01sumo.keys())
nfigs = 15
nrows = np.ceil(nfigs / 2)
ncols = 2

In [None]:
param = "sea_water_temperature"
color = "tab:red"
nfigs = len(ga01sumo.keys())
xmin, xmax = None, None
# Plot the seawater temperature data for all of the ga01sumo datasets
fig, ax = plt.subplots(nrows=int(np.ceil(nfigs/2)), ncols=2, figsize=(20, 5*nrows))

for k, refdes in enumerate(ga01sumo.keys()):
    row, col = int(np.floor(k/2)), int(k % 2)
    data = ga01sumo.get(refdes)
    yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
    ymin, ymax = yavg-5*ystd, yavg+5*ystd
    ax[row, col].plot(data["time"], data[param], marker=".", linestyle="", color=color)
    ax[row, col].set_ylabel(data[param].attrs["long_name"] + "\n" + data[param].attrs["units"], fontsize=12)
    ax[row, col].set_title(data.attrs["title"] + " \n " + refdes)
    ax[row, col].set_ylim(ymin, ymax)
    ax[row, col].grid()
    xmin_k, xmax_k = ax[row, col].get_xlim()
    if xmin is None or xmin_k < xmin:
        xmin = xmin_k
    if xmax is None or xmax_k > xmax:
        xmax = xmax_k
    
# Set uniform xgrid
for k in np.arange(0, len(ax), 1):
    row, col = int(np.floor(k/2)), int(k % 2)
    ax[row, col].set_xlim(xmin, xmax)
    
#fig.autofmt_xdate()

In [None]:
saveDir = "../results/figures"
figName = "_".join(("GA01SUMO", "all", "ctds", param))
fig.savefig(f"{saveDir}/{figName}.png")

Plot all of the timeseries on top of each other, using different colors to denote different instruments

In [None]:
param = "sea_water_temperature"

fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 9))

# Set discrete category values
levels, categories = pd.factorize(ga01sumo_datasets['depth'])
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab20(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")
    
# Plot the data
for k, refdes in enumerate(ga01sumo.keys()):
    # Assign the category based on depth
    depth = int(ga01sumo_datasets[ga01sumo_datasets["refdes"] == refdes]["depth"])
    color = colors[list(categories).index(depth)]
    data = ga01sumo.get(refdes)
    yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
    ymin, ymax = yavg-5*ystd, yavg+5*ystd
    ax.plot(data["time"], data[param], marker=".", linestyle="", markersize=1, color=color, alpha=0.5)

ax.legend(handles=handles, title='Deployed Depth', fontsize=12, edgecolor="black")

#### GA03FLMA

In [None]:
ga03flma = dict.fromkeys(ga03flma_datasets["refdes"])
for refdes in ga03flma_datasets["refdes"]:
    datasets = pd.read_csv(erd.get_search_url(search_for=refdes.lower(), response="csv"))["Dataset ID"]
    dset = datasets.iloc[0]
    erd.dataset_id = dset
    data = erd.to_xarray()
    data = data.swap_dims({"obs":"time"})
    ga03flma[refdes] = data

In [None]:
param = "sea_water_temperature"
color = "tab:red"
nrows = len(ga03flma.keys())
xmin, xmax = None, None
# Plot the seawater temperature data for all of the ga01sumo datasets
fig, ax = plt.subplots(nrows=nrows, ncols=1, figsize=(15, 5*nrows))

for k, refdes in enumerate(ga03flma.keys()):
    data = ga03flma.get(refdes)
    yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
    ymin, ymax = yavg-5*ystd, yavg+5*ystd
    ax[k].plot(data["time"], data[param], marker=".", linestyle="", color=color)
    ax[k].set_ylabel(data[param].attrs["long_name"] + "\n" + data[param].attrs["units"], fontsize=12)
    ax[k].set_title(data.attrs["title"] + " :: " + refdes)
    ax[k].set_ylim(ymin, ymax)
    ax[k].grid()
    xmin_k, xmax_k = ax[k].get_xlim()
    if xmin is None or xmin_k < xmin:
        xmin = xmin_k
    if xmax is None or xmax_k > xmax:
        xmax = xmax_k
    
# Set uniform xgrid
for k in np.arange(0, len(ax), 1):
    ax[k].set_xlim(xmin, xmax)

In [None]:
saveDir = "../results/figures"
figName = "_".join(("GA03FLMA", "all", "ctds", param))
fig.savefig(f"{saveDir}/{figName}.png")

Plot a specific reference designator

In [None]:
refdes = "GA01SUMO-RII11-02-CTDMOQ016"
data = ga01sumo.get(refdes)

fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(16,10))

# Plot sea_water_temperature
param = "sea_water_temperature"
color = "tab:red"

yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
ymin, ymax = yavg-8*ystd, yavg+8*ystd

ax[0].plot(data["time"], data[param], marker=".", linestyle="", color=color)
ax[0].set_ylabel(data[param].attrs["long_name"] + "\n" + data[param].attrs["units"], fontsize=16, weight="bold")
ax[0].set_ylim(ymin, ymax)
ax[0].grid()
ax[0].set_title(data.attrs["title"] + " \n " + refdes, fontsize=16, weight="bold")

# Plot the practical salinity
param = "sea_water_practical_salinity"
color = "tab:blue"

yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
ymin, ymax = yavg-8*ystd, yavg+8*ystd

ax[1].plot(data["time"], data[param], marker=".", linestyle="", color=color)
ax[1].set_ylabel(data[param].attrs["long_name"], fontsize=16, weight="bold")
ax[1].set_ylim(ymin, ymax)
ax[1].grid()

fig.autofmt_xdate()

In [None]:
fig.savefig(f"../results/figures/{refdes}_temperature_salinity.png", facecolor="white", transparent=False)

#### GA03FLMB

In [None]:
ga03flmb = dict.fromkeys(ga03flmb_datasets["refdes"])
for refdes in ga03flmb_datasets["refdes"]:
    datasets = pd.read_csv(erd.get_search_url(search_for=refdes.lower(), response="csv"))["Dataset ID"]
    dset = datasets.iloc[0]
    erd.dataset_id = dset
    data = erd.to_xarray()
    data = data.swap_dims({"obs":"time"})
    ga03flmb[refdes] = data

In [None]:
param = "sea_water_practical_salinity"
color = "tab:blue"
nrows = len(ga03flmb.keys())
xmin, xmax = None, None
# Plot the seawater temperature data for all of the ga01sumo datasets
fig, ax = plt.subplots(nrows=nrows, ncols=1, figsize=(15, 5*nrows))

for k, refdes in enumerate(ga03flmb.keys()):
    data = ga03flmb.get(refdes)
    yavg, ystd = data[param].mean(skipna=True), data[param].std(skipna=True)
    ymin, ymax = yavg-5*ystd, yavg+5*ystd
    ax[k].plot(data["time"], data[param], marker=".", linestyle="", color=color)
    ax[k].set_ylabel(data[param].attrs["long_name"] + "\n" + data[param].attrs["units"], fontsize=12)
    ax[k].set_title(data.attrs["title"] + " :: " + refdes)
    ax[k].set_ylim(ymin, ymax)
    ax[k].grid()
    xmin_k, xmax_k = ax[k].get_xlim()
    if xmin is None or xmin_k < xmin:
        xmin = xmin_k
    if xmax is None or xmax_k > xmax:
        xmax = xmax_k
    
# Set uniform xgrid
for k in np.arange(0, len(ax), 1):
    ax[k].set_xlim(xmin, xmax)

In [None]:
saveDir = "../results/figures"
figName = "_".join(("GA03FLMB", "all", "ctds", param))
fig.savefig(f"{saveDir}/{figName}.png")

Plot a particular parameter (data variable) for all the given CTDs at a particular depth across all moorings in an array

In [None]:
param = "sea_water_temperature"
#param = "sea_water_practical_salinity"

In [None]:
# Set the colors
arrays = ["FLMA", "FLMB", "SUMO"]
levels, categories = pd.factorize(arrays)
colors = [plt.cm.tab10(i) for i in levels] # using the "tab10" colormap
handles = [matplotlib.patches.Patch(color=plt.cm.tab10(i), label=c) for i, c in enumerate(categories)]
for h in handles:
    h.set_edgecolor("black")

# Plot the data 
for depth in ga03flma_datasets["depth"]:
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(15, 9))

    # ----------------------------------------------------------
    # Flanking Mooring A
    flma = ga03flma_datasets[ga03flma_datasets["depth"] == depth]["refdes"].values[0]
    flma_data = ga03flma.get(flma)
    # Plot the flma data
    ax.plot(flma_data["time"], flma_data[param], linestyle="", marker=".", color=colors[0], alpha=0.7)
    yavg, ystd = flma_data[param].mean(skipna=True), flma_data[param].std(skipna=True)
    ymin, ymax = yavg-5*ystd, yavg+5*ystd
    
    # ----------------------------------------------------------
    # Flanking Mooring B
    flmb = ga03flmb_datasets[ga03flmb_datasets["depth"] == depth]["refdes"].values[0]
    flmb_data = ga03flmb.get(flmb)
    # Plot the flma data
    ax.plot(flmb_data["time"], flmb_data[param], linestyle="", marker=".", color=colors[1], alpha=0.7)
    
    # ----------------------------------------------------------
    # Surface Mooring
    try:
        sumo = ga01sumo_datasets[ga01sumo_datasets["depth"] == depth]["refdes"].values[0]
        sumo_data = ga01sumo.get(sumo)
        # Plot the flma data
        ax.plot(sumo_data["time"], sumo_data[param], linestyle="", marker=".", color=colors[2], alpha=0.7)
    except:
        sumo = None
        
    # ----------------------------------------------------------
    # Set the figure
    ax.grid()
    ax.set_ylabel(flma_data[param].attrs["long_name"] + "\n" + flma_data[param].attrs["units"], fontsize=16)
    #if depth >= 500:
    ax.set_ylim(ymin, ymax)
    ax.legend(handles=handles, title='Array', fontsize=12, edgecolor="black")
    ax.set_title(f"CTD Data at {int(depth)}m", fontsize=16)
    fig.autofmt_xdate()

In [None]:
saveDir = "../results/figures"
figName = "_".join((refdes, "time", "series"))
fig.savefig(f"{saveDir}/{figName}.png")

### Add Deployment Information
Datasets downloaded from Data Explorer do not come with the deployment information as a data variable, unlike datasets downloaded from the OOI Data Portal (ooinet.oceanobservatories.org). Here, we step through adding deployment information to the ERDDAP dataset by requesting the deployment times from the Data Portal via the OOI API.

In [None]:
for refdes in ga01sumo:
    # Get the deployment info for the given reference designator
    deployments = OOINet.get_deployments(refdes)
    
    # Get the dataset
    data = ga01sumo.get(refdes)

    # Add the deployment info to the dataset
    d = xr.DataArray(data=np.zeros, dims=data["time"].dims, coords=data["time"].coords, name="deployment")
    for depNum in sorted(deployments["deploymentNumber"]):
        # Get the start and end times of the deployments
        deployStart = deployments[deployments["deploymentNumber"] == depNum]["deployStart"].values[0]
        deployEnd = deployments[deployments["deploymentNumber"] == depNum]["deployEnd"].values[0]
        # Add in the deployment number
        d[(d.time >= deployStart) & (d.time <= deployEnd)] = depNum
        
    data[d.name] = d

In [None]:
for refdes in ga03flma:
    # Get the deployment info for the given reference designator
    deployments = OOINet.get_deployments(refdes)
    
    # Get the dataset
    data = ga03flma.get(refdes)

    # Add the deployment info to the dataset
    d = xr.DataArray(data=np.zeros, dims=data["time"].dims, coords=data["time"].coords, name="deployment")
    for depNum in sorted(deployments["deploymentNumber"]):
        # Get the start and end times of the deployments
        deployStart = deployments[deployments["deploymentNumber"] == depNum]["deployStart"].values[0]
        deployEnd = deployments[deployments["deploymentNumber"] == depNum]["deployEnd"].values[0]
        # Add in the deployment number
        d[(d.time >= deployStart) & (d.time <= deployEnd)] = depNum
        
    data[d.name] = d

In [None]:
for refdes in ga03flmb:
    # Get the deployment info for the given reference designator
    deployments = OOINet.get_deployments(refdes)
    
    # Get the dataset
    data = ga03flmb.get(refdes)

    # Add the deployment info to the dataset
    d = xr.DataArray(data=np.zeros, dims=data["time"].dims, coords=data["time"].coords, name="deployment")
    for depNum in sorted(deployments["deploymentNumber"]):
        # Get the start and end times of the deployments
        deployStart = deployments[deployments["deploymentNumber"] == depNum]["deployStart"].values[0]
        deployEnd = deployments[deployments["deploymentNumber"] == depNum]["deployEnd"].values[0]
        # Add in the deployment number
        d[(d.time >= deployStart) & (d.time <= deployEnd)] = depNum
        
    data[d.name] = d

In [None]:
ga03flma.get(refdes)

In [None]:
# Quick print of the CTD data
fig, (ax0, ax1, ax2) = plt.subplots(nrows=3, ncols=1, figsize=(15, 12))

# Plot the temperature data
avg, std = data.sea_water_temperature.mean(), data.sea_water_temperature.std()
ymin, ymax = avg-5*std, avg+5*std
ax0.plot(data.time, data.sea_water_temperature, marker=".", linestyle="", color="tab:red")
ax0.grid()
ax0.set_ylabel(data.sea_water_temperature.attrs["long_name"], fontsize=16)
#ax0.set_ylim((ymin, ymax))
#ax0.set_xlim((pd.to_datetime("2016-07-15"),pd.to_datetime("2016-08-17")))
ax0.set_title(data.attrs["title"] + "\n" + dset, fontsize=16)

# Plot the salinity data
avg, std = data.sea_water_practical_salinity.mean(), data.sea_water_practical_salinity.std()
ymin, ymax = avg-5*std, avg+5*std
ax1.plot(data.time, data.sea_water_practical_salinity, marker=".", linestyle="", color="tab:blue")
ax1.grid()
ax1.set_ylabel(data.sea_water_practical_salinity.attrs["long_name"], fontsize=16)
#ax1.set_ylim((ymin, ymax))
#ax1.set_xlim((pd.to_datetime("2016-07-15"),pd.to_datetime("2016-08-17")))


# Plot the pressure data
avg, std = data.sea_water_pressure.mean(), data.sea_water_pressure.std()
ymin, ymax = avg-5*std, avg+5*std
ax2.plot(data.time, data.sea_water_pressure, marker=".", linestyle="", color="black")
ax2.grid()
ax2.set_ylabel(data.sea_water_pressure.attrs["long_name"], fontsize=16)
#ax2.set_ylim((ymin, ymax))
#ax2.set_xlim((pd.to_datetime("2016-07-15"),pd.to_datetime("2016-08-17")))

ax2.invert_yaxis()

### Clean up dataset
Before we go any farther, going to clean up the downloaded data set. First, going to dump all the passed "qc_tests" in the dataset. We're doing this in order to be able to see for ourselves how these are calculated

#### Select a single dataset to analyze

In [None]:
refdes = "GA03FLMA-RIM01-02-CTDMOG047"
data = ga03flma.get(refdes)

In [None]:
for var in data.variables:
    if "qc" in var:
        data = data.drop(var)

## Implement QARTOD
Finally, we are ready to calculate the QARTOD flags and test

In [None]:
from qartod.gross_range import GrossRange
from qartod.climatology import Climatology
from ioos_qc.config import QcConfig, ContextConfig, Config
from ioos_qc.qartod import gross_range_test, climatology_test, ClimatologyConfig

#### Define some useful functions
Before calculating the QARTOD flags, we'll define a couple of functions to make adding the QARTOD flags to the datasets we've downloaded and cleaned up easier and simpler as well as nice 

In [None]:
def add_qartod_test(data, param, test, test_results):
    # Generate the test name
    testName = "_".join((param, "qartod", test))
    
    # Create an xarray DataArray of the test_results
    test_results = xr.DataArray(
        data=test_results,
        dims=data[param].dims,
        coords=data[param].coords,
    )
    
    data[testName] = test_results
    return data

In [None]:
def aggregate_results(data, param):
    """Aggregate the results of all qartod tests for a given parameter"""
    # Get the qartod variables
    qartod_vars = []
    for var in data:
        if (param in var) & ("qartod" in var):
            qartod_vars.append(var)
            
    # Aggregate and take max flag
    aggregate = np.vstack((data[v] for v in qartod_vars))
    aggregate = aggregate.max(axis=0)
    return aggregate

In [None]:
def plot_climatology(data, param, climatology, **kwargs):
    
    # Check the kwargs
    if "color" in kwargs.keys():
        color = kwargs.get("color")
    else:
        color = "tab:red"
    
    # Initialize the figure
    fig, ax = plt.subplots(figsize=(12,8))
    
    # Plot the Observations
    ax.plot(data.time, data[param], marker=".", linestyle="", color=color, zorder=0, label="Observations")
    yavg, ystd = np.mean(data[param]), np.std(data[param])
    ymin, ymax = yavg-ystd*7, yavg+ystd*7
    ax.set_ylim((ymin, ymax))
    
    # Standard Deviation +/- 3
    for t in climatology.fitted_data.index:
        t0 = pd.Timestamp(year=t.year, month=t.month, day=1)
        mu = climatology.monthly_fit.loc[t.month]
        std = climatology.monthly_std.loc[t.month]
        ax.hlines(mu, t0, t, color="black", linewidth=3, label="Climatological Fit")
        ax.fill_between([t0, t], [mu+2*std, mu+2*std], [mu-2*std, mu-2*std], color=color, alpha=0.3, label="2*$\sigma$")

    
    # Add legend and labels
    handles, labels = ax.get_legend_handles_labels()[0][0:3], ax.get_legend_handles_labels()[1][0:3]
    ax.legend(handles, labels, fontsize=12)
    ax.set_title(data.attrs["platform_name"], fontsize=16, weight="bold")
    ax.set_ylabel(data[param].attrs["long_name"], fontsize=16, weight="bold")
    ax.grid()
    fig.autofmt_xdate()
    
    return fig, ax

#### Calculate QARTOD

In [None]:
# Parameters and tests to run
parameters = ["sea_water_practical_salinity", "sea_water_temperature"]
tests = ["gross_range", "climatology"]

# Dictionary object to store output results
results = dict.fromkeys(parameters)
for key in results.keys():
    results[key] = dict.fromkeys(tests, {})

In [None]:
for param in parameters:
    if param == "sea_water_temperature":
        fail_span = (-5, 35)
    elif param == "sea_water_practical_salinity":
        fail_span = (0, 42)
    else:
        raise TypeError(f"Param {param} not found.")
    # Iterate through the tests
    for test in tests:
        if test == "gross_range":
            # Calculate the gross range ranges
            fail_min, fail_max = fail_span
            gross_range = GrossRange(fail_min, fail_max)
            gross_range.fit(data, param)
            # Run the gross range test
            test_results = gross_range_test(
                inp=data[param],
                fail_span=(gross_range.fail_min, gross_range.fail_max),
                suspect_span=(gross_range.suspect_min, gross_range.suspect_max)
            )
            # Save the fit results
            results[param][test] = gross_range
        elif test == "climatology":
            # Calculate the gross range ranges
            climatology = Climatology()
            climatology.fit(data, param, fail_span=fail_span)
            # Make the climatology config objecty
            c = ClimatologyConfig()
            for ind in climatology.monthly_fit.index:
                vmin = np.floor((climatology.monthly_fit.loc[ind] - 2*climatology.monthly_std.loc[ind])*100)/100
                vmax = np.ceil((climatology.monthly_fit.loc[ind] + 2*climatology.monthly_std.loc[ind])*100)/100
                c.add(tspan=[ind, ind],
                      vspan=[vmin, vmax],
                      fspan=[fail_span[0], fail_span[1]],
                      zspan=[0,5000],
                      period="month")
            # Run the climatology test
            test_results = climatology_test(c,
                               inp=data[param],
                               tinp=data["time"],
                               zinp=data["sea_water_pressure"]
                              )
            # Save the fit results
            results[param][test] = climatology
        else:
            print(f"Error with test {test} for param {param}")
        # Save the results
        data = add_qartod_test(data, param, test, test_results)
    
    # Now add the aggregate flag for the given parameter
    aggregate = aggregate_results(data, param)
    data = add_qartod_test(data, param, "aggregate", aggregate)

Check the QARTOD flags for the given dataset

In [None]:
results

In [None]:
# Initialize the figure
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(16, 12))

param = "sea_water_temperature"
color = "tab:red"

# Plot the Observations
ax[0].plot(data.time, data[param], marker=".", linestyle="", color=color, zorder=0, label="Observations")
yavg, ystd = np.mean(data[param]), np.std(data[param])
ymin, ymax = yavg-ystd*7, yavg+ystd*7
ax[0].set_ylim((ymin, ymax))

cdata = results[param]["climatology"]
# Standard Deviation +/- 3
#for t in cdata.fitted_data.index:
#    t0 = pd.Timestamp(year=t.year, month=t.month, day=1)
#    mu = cdata.monthly_fit.loc[t.month]
#    std = cdata.monthly_std.loc[t.month]
#    ax[0].hlines(mu, t0, t, color="black", linewidth=3, label="Climatological Fit")
#    ax[0].fill_between([t0, t], [mu+2*std, mu+2*std], [mu-2*std, mu-2*std], color=color, alpha=0.3, label="2*$\sigma$")


# Add legend and labels
handles, labels = ax[0].get_legend_handles_labels()[0][0:3], ax[0].get_legend_handles_labels()[1][0:3]
ax[0].legend(handles, labels, fontsize=12)
ax[0].set_title(data.attrs["platform_name"] + "\n" + refdes, fontsize=16, weight="bold")
ax[0].set_ylabel(data[param].attrs["long_name"], fontsize=16, weight="bold")
ax[0].grid()

for tick in ax[0].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")


param = "sea_water_practical_salinity"
color = "tab:blue"

# Plot the Observations
ax[1].plot(data.time, data[param], marker=".", linestyle="", color=color, zorder=0, label="Observations")
yavg, ystd = np.mean(data[param]), np.std(data[param])
ymin, ymax = yavg-ystd*7, yavg+ystd*7
ax[1].set_ylim((ymin, ymax))

cdata = results[param]["climatology"]
# Standard Deviation +/- 3
#for t in cdata.fitted_data.index:
#    t0 = pd.Timestamp(year=t.year, month=t.month, day=1)
#    mu = cdata.monthly_fit.loc[t.month]
#    std = cdata.monthly_std.loc[t.month]
#    ax[1].hlines(mu, t0, t, color="black", linewidth=3, label="Climatological Fit")
#    ax[1].fill_between([t0, t], [mu+2*std, mu+2*std], [mu-2*std, mu-2*std], color=color, alpha=0.3, label="2*$\sigma$")


# Add legend and labels
handles, labels = ax[1].get_legend_handles_labels()[0][0:3], ax[1].get_legend_handles_labels()[1][0:3]
ax[1].legend(handles, labels, fontsize=12)
ax[1].set_ylabel(data[param].attrs["long_name"], fontsize=16, weight="bold")
ax[1].grid()

fig.autofmt_xdate()

for tick in ax[1].xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax[1].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
refdes

In [None]:
fig.savefig(f"../results/figures/{refdes}_temp_sal.png", facecolor="white", transparent=False)

In [None]:
fig, ax = plot_climatology(data, param, results[param]["climatology"], color="tab:blue")
handles, labels = ax.get_legend_handles_labels()[0][0:3], ax.get_legend_handles_labels()[1][0:3]
ax.legend(handles, labels, fontsize=12, loc="lower right", edgecolor="black")

In [None]:
# Make the figName from the reference designator, tested parameter, and the specific QARTOD test
figName = "_".join((refdes, param, test))

# Save the results
saveDir = "../results/figures/"
if not os.path.isdir(saveDir):
    os.makedirs(saveDir)
fig.savefig(f"{saveDir}/{figName}.png", facecolor="white", transparent=False)

In [None]:
param = "sea_water_temperature"

In [None]:
fig, ax = plot_climatology(data, param, results[param]["climatology"], color="tab:red")
#handles, labels = ax.get_legend_handles_labels()
#first_legend.set_bbox_to_anchor((1, 0.7))
#ax.add_artist(first_legend)
xmin, xmax = ax.get_xlim()

# Plot the flagged data
flagged_data = data.where((data.sea_water_temperature_qartod_climatology == 3), drop=True)
ax.plot(flagged_data.time, flagged_data.sea_water_temperature, linestyle="", marker=".", color="tab:blue", label="Flagged")
handles, labels = ax.get_legend_handles_labels()[0][0:4], ax.get_legend_handles_labels()[1][0:4]
ax.legend(handles, labels, fontsize=12, loc="lower right", edgecolor="black")


for tick in ax.xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax.yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
figName = "_".join((refdes, param, test, "with_flags"))
figName

In [None]:
fig.savefig(f"{saveDir}/{figName}.png", facecolor="white", transparent=False)

Plot a histogram, by month, of the climatology-flagged data

In [None]:
param="sea_water_temperature"
climatology = results[param]["climatology"]

In [None]:
flags = (data.sea_water_temperature_qartod_climatology == 3).resample(time="M").sum()
flags

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=1, figsize=(15, 20), sharex=True)

# ------------------------------------------------
# Plot the first figure with the time series data
# Plot the Observations
ax[0].plot(data.time, data[param], marker=".", linestyle="", color="tab:red", zorder=0, label="Observations")
yavg, ystd = np.mean(data[param]), np.std(data[param])
ymin, ymax = yavg-ystd*7, yavg+ystd*7
ax[0].set_ylim((ymin, ymax))

# Standard Deviation +/- 3
for t in climatology.fitted_data.index:
    t0 = pd.Timestamp(year=t.year, month=t.month, day=1)
    mu = climatology.monthly_fit.loc[t.month]
    std = climatology.monthly_std.loc[t.month]
    ax[0].hlines(mu, t0, t, color="black", linewidth=3, label="Climatological Fit")
    ax[0].fill_between([t0, t], [mu+2*std, mu+2*std], [mu-2*std, mu-2*std], color="tab:red", alpha=0.3, label="2*$\sigma$")

# Plot the flagged data
flagged_data = data.where((data.sea_water_temperature_qartod_climatology == 3), drop=True)
ax[0].plot(flagged_data.time, flagged_data[param], linestyle="", marker=".", color="tab:blue", label="Flagged")
handles, labels = ax[0].get_legend_handles_labels()[0][0:4], ax[0].get_legend_handles_labels()[1][0:4]
ax[0].legend(handles, labels, fontsize=12, loc="lower right", edgecolor="black")

for tick in ax[0].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
# Add legend and labels
handles, labels = ax[0].get_legend_handles_labels()[0][0:3], ax[0].get_legend_handles_labels()[1][0:3]
ax[0].legend(handles, labels, fontsize=12)
ax[0].set_title(data.attrs["platform_name"], fontsize=16, weight="bold")
ax[0].set_ylabel(data[param].attrs["long_name"], fontsize=16, weight="bold")
ax[0].grid()
    
# -----------------------------------------------------
# Plot the histogram of the flags
ax[1].bar(flags.time, flags.values, width=24.0, edgecolor="k", color="tab:blue")
ax[1].set_ylabel(f"Suspect flag count for\n{flags.name}", fontsize=16, weight="bold")
#ax[1].set_title(refdes, fontsize=16, weight="bold")
ax[1].grid()



fig.autofmt_xdate()


for tick in ax[1].xaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")
    
for tick in ax[1].yaxis.get_major_ticks():
    tick.label1.set_fontsize(12)
    tick.label1.set_fontweight("bold")

In [None]:
fig.savefig(f"../results/figures/{refdes}_{param}_timeseries_with_histogram.png", facecolor="white", transparent=False)

In [None]:
data.sea_water_temperature_qartod_climatology.resample(time="M").sum()