# QARTOD - NetCDF Examples

This notebook provides examples of running QARTOD on a netCDF file. For background, see [NcQcConfig Usage](https://ioos.github.io/ioos_qc/usage.html#ncqcconfig) in the docs.

There are multiple ways that you can integrate `ioos_qc` into your netcdf-based workflow. 

**Option A:** Store test configurations externally, pass your configuration and netcdf file to `ioos_qc`, and manually update netcdf variables with results of the test
  * In this case, you extract variables from the netcdf file, use `ioos_qc` methods to run tests, and then manually update the netcdf file with results
  * This provides the most control, but doesn't take advantage of shared code in the `ioos_qc` library
  * It's up to you to ensure your resulting netcdf is self-describing and CF-compliant

**Option B:** Store test configurations externally, then pass your configuration and netcdf file to `ioos_qc`, and let it run tests and update the file with results
  * This takes advantage of `ioos_qc` code to store results and configuration in the netCDF file, and ensure a self-describing, CF-compliant file
  * Managing your test configurations outside the file is better when dealing with a large number of datasets/configurations

**Option C:** Store test configurations in your netcdf file, then pass that file to `ioos_qc` and let it run tests and update the file with results
  * You only need to add test configurations to the file one time, and after that you could run tests over and over again on the same file
  * This option is the most portable, since the data, configuration, and results are all in one place
  * The downside is, test configuration management is difficult since it's stored in the file instead of some common external location
  
  

In [1]:
# Setup directories
from pathlib import Path
basedir = Path().absolute()
libdir = basedir.parent.parent.parent

# Other imports
import pandas as pd
import numpy as np
import xarray as xr
from datetime import datetime
import netCDF4 as nc4

import tempfile
import os
import shutil

from bokeh.layouts import gridplot
from bokeh.plotting import figure, show, output_file, output_notebook
output_notebook()

In [2]:
# # Install QC library
# !pip install git+git://github.com/ioos/ioos_qc.git

# # Alternative installation (install specific branch):
# !pip uninstall -y ioos_qc
# !pip install git+git://github.com/ioos/ioos_qc.git@BRANCHNAME

# Alternative installation (run with local updates):
!pip uninstall -y ioos_qc
import sys
sys.path.append(str(libdir))
    
from ioos_qc.config import NcQcConfig
from ioos_qc import qartod

Found existing installation: ioos-qc 1.0.0
Uninstalling ioos-qc-1.0.0:
  Successfully uninstalled ioos-qc-1.0.0


## Load the netCDF dataset


The example netCDF dataset is a pCO2 sensor from the Ocean Observatories Initiative (OOI) Coastal Endurance Inshore Surface Mooring instrument frame at 7 meters depth located on the Oregon Shelf break.


In [3]:
filename = basedir.joinpath('pco2_netcdf_example.nc')
pco2 = xr.open_dataset(filename)

In [4]:
for dim in pco2.dims:
    print(dim)

spectrum
time


In [5]:
for var in pco2.variables:
    print(var)

obs
time
deployment
id
dcl_controller_timestamp
driver_timestamp
ingestion_timestamp
internal_timestamp
light_measurements
passed_checksum
port_timestamp
preferred_timestamp
provenance
record_time
record_type
thermistor_raw
unique_id
voltage_battery
absorbance_ratio_434
absorbance_ratio_620
pco2w_thermistor_temperature
absorbance_blank_434
absorbance_blank_620
pco2_seawater
absorbance_ratio_434_qc_executed
absorbance_ratio_434_qc_results
absorbance_ratio_620_qc_executed
absorbance_ratio_620_qc_results
pco2w_thermistor_temperature_qc_executed
pco2w_thermistor_temperature_qc_results
pco2_seawater_qc_executed
pco2_seawater_qc_results
lat
lon


In [6]:
# Plot raw data
data=pco2['pco2_seawater']
t = np.array(pco2['time'])
x = np.array(data)

p1 = figure(x_axis_type="datetime", title='pco2_seawater')
p1.grid.grid_line_alpha=0.3
p1.xaxis.axis_label = 'Time'
p1.yaxis.axis_label = data.units
p1.line(t, x)

show(gridplot([[p1]], plot_width=800, plot_height=400))

# QC Configuration

Here we define the generic config object for multiple QARTOD tests, plus the aggregate/rollup flag.

In [7]:
# The key "pco2_seawater" indicates which variable in the netcdf file this config should run against
config = {
    'pco2_seawater': {
        'qartod': {
            'gross_range_test': {
                'suspect_span': [200, 2400],
                'fail_span': [0, 3000]
            },
            'spike_test': {
                'suspect_threshold': 500,
                'fail_threshold': 1000                
            },
            'location_test': {
                'bbox': [-124.5, 44, -123.5, 45]
            },
            'flat_line_test': {
                'tolerance': 1,
                'suspect_threshold': 3600,
                'fail_threshold': 86400
            },
            'aggregate': {}
        }
    }
}


In [8]:
# Helper method to plot QC results using Bokeh
def plot_ncresults(ncdata, var_name, results, title, test_name):

    time = np.array(ncdata.variables['time'])
    obs = np.array(ncdata.variables[var_name])
    qc_test = results[var_name]['qartod'][test_name]

    qc_pass = np.ma.masked_where(qc_test != 1, obs)
    num_pass = (qc_test == 1).sum()
    qc_suspect = np.ma.masked_where(qc_test != 3, obs)
    num_suspect = (qc_test == 3).sum()
    qc_fail = np.ma.masked_where(qc_test != 4, obs)
    num_fail = (qc_test == 4).sum()
    qc_notrun = np.ma.masked_where(qc_test != 2, obs)

    p1 = figure(x_axis_type="datetime", title=test_name + ' : ' + title + ' : p/s/f=' + str(num_pass) + '/' + str(num_suspect) + '/' + str(num_fail))
    p1.grid.grid_line_alpha=0.3
    p1.xaxis.axis_label = 'Time'
    p1.yaxis.axis_label = 'Observation Value'

    p1.line(time, obs,  legend_label='obs', color='#A6CEE3')
    p1.circle(time, qc_notrun, size=2, legend_label='qc not run', color='gray', alpha=0.2)
    p1.circle(time, qc_pass, size=4, legend_label='qc pass', color='green', alpha=0.5)
    p1.circle(time, qc_suspect, size=4, legend_label='qc suspect', color='orange', alpha=0.7)
    p1.circle(time, qc_fail, size=6, legend_label='qc fail', color='red', alpha=1.0)

    #output_file("qc.html", title="qc example")

    show(gridplot([[p1]], plot_width=800, plot_height=400))

# Option A: Manually run tests and store results

Store test configurations externally, pass your configuration and netcdf file to `ioos_qc`, and manually update netcdf variables with results of the test

In [9]:
# Create NcQcConfig object 
# Note: For tests that need tinp, zinp, etc, use args to define the t, x, y, z dimensions
#       In this case, we need latitude and longitude for the location test
qc = NcQcConfig(config, lon='lon', lat='lat')

# Run tests
# Note: pass in the path to the file, *not* the netCDF dataset object
results = qc.run(filename)

In [10]:
# The results are an OrderedDict, with an entry for each variable and test
results

OrderedDict([('pco2_seawater',
              OrderedDict([('qartod',
                            OrderedDict([('gross_range_test',
                                          masked_array(data=[4, 4, 1, ..., 1, 1, 1],
                                                       mask=False,
                                                 fill_value=999999,
                                                      dtype=uint8)),
                                         ('spike_test',
                                          masked_array(data=[2, 4, 4, ..., 1, 1, 2],
                                                       mask=False,
                                                 fill_value=999999,
                                                      dtype=uint8)),
                                         ('location_test',
                                          masked_array(data=[1, 1, 1, ..., 1, 1, 1],
                                                       mask=False,
                        

In [11]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'gross_range_test')

In [12]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'spike_test')

In [13]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'flat_line_test')

In [14]:
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'location_test')

In [15]:
# To see overall results, use the aggregate test
plot_ncresults(pco2, 'pco2_seawater', results, 'pCO2 seawater', 'aggregate')

In [16]:
# Store results manually
# This is just a simple example and stores the aggregate test flag as a variable. 
# You can expand upon this, or use the ioos_qc library to store the results for you (see subsequent examples)

# Create output file
outfile_a = os.path.join(tempfile.gettempdir(), 'out_a.nc')
shutil.copy(filename, outfile_a)

# Store results
with nc4.Dataset(outfile_a, 'r+') as nc_file:
    qc_agg = nc_file.createVariable('qartod_aggregate', 'u1', ('time',), fill_value=2)
    qc_agg[:] = results['pco2_seawater']['qartod']['aggregate']
    

In [17]:
# Print results 
out_a = xr.open_dataset(outfile_a)
print(out_a['qartod_aggregate'])

<xarray.DataArray 'qartod_aggregate' (time: 7339)>
array([4., 4., 4., ..., 3., 1., 1.], dtype=float32)
Coordinates:
    obs      (time) int64 ...
  * time     (time) datetime64[ns] 2015-10-08T19:35:30.569000448 ... 2015-04-...
    lat      (time) float64 ...
    lon      (time) float64 ...


# Option B

Store test configurations externally, then pass your configuration and netcdf file to `ioos_qc`, and let it run tests and update the file with results

In [18]:
# We already have results from the previous run, but re-create them here for completeness
qc = NcQcConfig(config, lon='lon', lat='lat')
results = qc.run(filename)
results

OrderedDict([('pco2_seawater',
              OrderedDict([('qartod',
                            OrderedDict([('gross_range_test',
                                          masked_array(data=[4, 4, 1, ..., 1, 1, 1],
                                                       mask=False,
                                                 fill_value=999999,
                                                      dtype=uint8)),
                                         ('spike_test',
                                          masked_array(data=[2, 4, 4, ..., 1, 1, 2],
                                                       mask=False,
                                                 fill_value=999999,
                                                      dtype=uint8)),
                                         ('location_test',
                                          masked_array(data=[1, 1, 1, ..., 1, 1, 1],
                                                       mask=False,
                        

In [19]:
# Create output file
outfile_b = os.path.join(tempfile.gettempdir(), 'out_b.nc')
shutil.copy(filename, outfile_b)

# Use the library to store the results to the netcdf file
qc.save_to_netcdf(outfile_b, results)


In [20]:
# Explore results: qc test variables are named [variable_name]_qartod_[test_name]
out_b = xr.open_dataset(outfile_b)
print(out_b)

<xarray.Dataset>
Dimensions:                                   (spectrum: 14, time: 7339)
Coordinates:
    obs                                       (time) int64 ...
  * time                                      (time) datetime64[ns] 2015-10-0...
    lat                                       (time) float64 ...
    lon                                       (time) float64 ...
Dimensions without coordinates: spectrum
Data variables:
    deployment                                (time) int32 ...
    id                                        (time) |S64 ...
    dcl_controller_timestamp                  (time) object ...
    driver_timestamp                          (time) datetime64[ns] ...
    ingestion_timestamp                       (time) datetime64[ns] ...
    internal_timestamp                        (time) datetime64[ns] ...
    light_measurements                        (time, spectrum) float32 ...
    passed_checksum                           (time) float32 ...
    port_timestamp   

In [21]:
# Gross range test
# Note how the config used is stored in the ioos_qc_* variables
out_b['pco2_seawater_qartod_gross_range_test']

In [22]:
# Aggregate/rollup flag
out_b['pco2_seawater_qartod_aggregate']

# Option C

Store test configurations in your netcdf file, then pass that file to `ioos_qc` and let it run tests and update the file with results.

In the example above, we used the library to store results and config in the netcdf file itself. At this point, we can load that same file and run tests again, without having to re-define config. This is very powerful!

In [23]:
# Create a copy of the output from B 
outfile_c = os.path.join(tempfile.gettempdir(), 'out_c.nc')
shutil.copy(outfile_b, outfile_c)

# Load this file into the NcQcConfig object
qc = NcQcConfig(outfile_c, lon='lon', lat='lat')

# Run tests and store results
results_c = qc.run(outfile_c)
qc.save_to_netcdf(outfile_c, results_c)


In [24]:
# Explore results
out_c = xr.open_dataset(outfile_c)
print(out_c)

<xarray.Dataset>
Dimensions:                                   (spectrum: 14, time: 7339)
Coordinates:
    obs                                       (time) int64 ...
  * time                                      (time) datetime64[ns] 2015-10-0...
    lat                                       (time) float64 ...
    lon                                       (time) float64 ...
Dimensions without coordinates: spectrum
Data variables:
    deployment                                (time) int32 ...
    id                                        (time) |S64 ...
    dcl_controller_timestamp                  (time) object ...
    driver_timestamp                          (time) datetime64[ns] ...
    ingestion_timestamp                       (time) datetime64[ns] ...
    internal_timestamp                        (time) datetime64[ns] ...
    light_measurements                        (time, spectrum) float32 ...
    passed_checksum                           (time) float32 ...
    port_timestamp   