# QARTOD - NetCDF Examples

This notebook provides examples of running QARTOD on a netCDF file. For background, see [NcQcConfig Usage](https://ioos.github.io/ioos_qc/usage.html#ncqcconfig) in the docs.

There are multiple ways that you can integrate `ioos_qc` into your netcdf-based workflow. 

**Option A:** Store test configurations in your netcdf file, then pass that file to `ioos_qc` and let it run tests and update the file with results
  * You only need to add test configurations to the file one time, and after that you could run tests over and over again on the same file
  * When storing results in the netcdf file, the `ioos_qc` library ensures those variables are CF-compliant  
  * This option is the most portable, since the data, configuration, and results are all in one place
  * The downside is, test configuration management is difficult since it's stored in the file instead of some common external location
  
  
**Option B:** Store test configurations externally, then pass your configuration and netcdf file to `ioos_qc`, and let it run tests and update the file with results
  * In this case, you manage your test configurations outside the file, which is better for managing many datasets
  * The library is still storing results in the netcdf file, which ensures self-describing, CF-compliant results


**Option C:** Store test configurations externally, pass your configuration and netcdf file to `ioos_qc`, and manually update netcdf variables with results of the test
  * In this case, you extract variables from the netcdf file, use `ioos_qc` methods to run tests, and then manually update the netcdf file with results
  * This provides the most control, but doesn't take advantage of shared code in the `ioos_qc` library
  * It's up to you to ensure your resulting netcdf is CF-compliant


In [None]:
# Setup directories
from pathlib import Path
basedir = Path().absolute()
libdir = basedir.parent.parent.parent

# Other imports
import pandas as pd
import numpy as np
import xarray as xr
from datetime import datetime

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file, output_notebook
output_notebook()

In [None]:
# # Install QC library
# !pip install git+git://github.com/ioos/ioos_qc.git

# # Alternative installation (install specific branch):
# !pip uninstall -y ioos_qc
# !pip install git+git://github.com/ioos/ioos_qc.git@BRANCHNAME

# Alternative installation (run with local updates):
!pip uninstall -y ioos_qc
import sys
sys.path.append(str(libdir))
    
from ioos_qc.config import NcQcConfig
from ioos_qc import qartod

## Load the netCDF dataset


The example netCDF dataset is a pCO2 sensor from the Ocean Observatories Initiative (OOI) Coastal Endurance Inshore Surface Mooring instrument frame at 7 meters depth located on the Oregon Shelf break.


In [None]:
filename = basedir.joinpath('pco2_netcdf_example.nc')
pco2 = xr.open_dataset(filename)

In [None]:
for dim in pco2.dims:
    print(dim)

In [None]:
for var in pco2.variables:
    print(var)

In [None]:
# Method to plot QC results using Bokeh
def plot_ncresults(ncdata, var_name, results, title, test_name):

    time = np.array(ncdata.variables['time'])
    obs = np.array(ncdata.variables[var_name])
    qc_test = results[var_name]['qartod'][test_name]

    qc_pass = np.ma.masked_where(qc_test != 1, obs)
    qc_suspect = np.ma.masked_where(qc_test != 3, obs)
    qc_fail = np.ma.masked_where(qc_test != 4, obs)
    qc_notrun = np.ma.masked_where(qc_test != 2, obs)

    p1 = figure(x_axis_type="datetime", title=test_name + ' : ' + title)
    p1.grid.grid_line_alpha=0.3
    p1.xaxis.axis_label = 'Time'
    p1.yaxis.axis_label = 'Observation Value'

    p1.line(time, obs,  legend_label='obs', color='#A6CEE3')
    p1.circle(time, qc_notrun, size=2, legend_label='qc not run', color='gray', alpha=0.2)
    p1.circle(time, qc_pass, size=4, legend_label='qc pass', color='green', alpha=0.5)
    p1.circle(time, qc_suspect, size=4, legend_label='qc suspect', color='orange', alpha=0.7)
    p1.circle(time, qc_fail, size=6, legend_label='qc fail', color='red', alpha=1.0)

    #output_file("qc.html", title="qc example")

    show(gridplot([[p1]], plot_width=800, plot_height=400))

# Option A

# Option B

# Option C: Manually run tests and store results

Store test configurations externally, pass your configuration and netcdf file to `ioos_qc`, and manually update netcdf variables with results of the test

## Setup & Run a single QC test

In [None]:
# Create the config object
# The key "pco2_seawater" indicates which variable in the netcdf file this config should run against
config = {
    'pco2_seawater': {
        'qartod': {
            'gross_range_test': {
                'suspect_span': [200, 600],
                'fail_span': [0, 1200]
            }
        }
    }
}

qc = NcQcConfig(config)

In [None]:
# To run the qc on a netCDF file, call the path to the file, not the load netCDF dataset
qc_gross_range = qc.run(filename)

In [None]:
# Check that the output is an OrderedDict and ran correctly
print(qc_gross_range)

In [None]:
plot_ncresults(pco2, 'pco2_seawater', qc_gross_range, 'pCO2 seawater', 'gross_range_test')

## Multiple tests 

When utilizing the NcQcConfig object with tests which require an ancillary variable, such as lat/lon for the location test or time for the rate_of_change_test, the ancillary variables must be pulled out of the netCDF file and passed back into the qc.run method as kwargs.

In [None]:
nclat = np.array(pco2.variables['lat'])
nclon = np.array(pco2.variables['lon'])
nctime = np.array(pco2.variables['time'])
ncobs = np.array(pco2.variables['pco2_seawater'])

In [None]:
nctime

In [None]:
# Create the config object
config = {
    'pco2_seawater': {
        'qartod': {
            'gross_range_test': {
                'suspect_span': [200, 600],
                'fail_span': [0, 1200]
            },
            'location_test': {
                'bbox': [-124.5, 44, -123.5, 45]
            },
            'spike_test': {
                'suspect_threshold': 10,
                'fail_threshold': 100                
            },
            'flat_line_test': {
                'tolerance': 1,
                'suspect_threshold': 3600,
                'fail_threshold': 86400
            },
            
        }
    }
}

qc = NcQcConfig(config)

In [None]:
qc_results = qc.run(filename, pco2_seawater={'lat':nclat,'lon':nclon,'tinp':nctime})
qc_results

In [None]:
plot_ncresults(pco2, 'pco2_seawater', qc_results, 'pCO2 seawater', 'flat_line_test')

In [None]:
plot_ncresults(pco2, 'pco2_seawater', qc_results, 'pCO2 seawater', 'spike_test')

In [None]:
plot_ncresults(pco2, 'pco2_seawater', qc_results, 'pCO2 seawater', 'gross_range_test')

In [None]:
plot_ncresults(pco2, 'pco2_seawater', qc_results, 'pCO2 seawater', 'location_test')

# TODO: Currently, the aggregate (roll-up) flag is not implemented for netCDF files