# QARTOD - Single Test

This notebook shows the simplest use case for the IOOS QARTOD package - a single test performed on a timeseries loaded into a Pandas DataFrame. It shows how to define the test configuration and how the output is structured. At the end, there is an example of how to use the flags in data visualization. 

# Setup

In [None]:
import numpy as np
import pandas as pd

from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show
from bokeh.plotting import output_notebook

from ioos_qc import qartod
from ioos_qc.config import Config
from ioos_qc.stores import PandasStore
from ioos_qc.streams import PandasStream

output_notebook()

# Load data

Loads data from a local .csv file and put it into a Pandas DataFrame.

The data are some Water level from a fixed [station in Kotzebue, AK](https://www.google.com/maps?q=66.895035,-162.566752).

In [None]:
url = "https://github.com/ioos/ioos_qc/raw/master/docs/source/examples"
fname = f"{url}/water_level_example.csv"

variable_name = "sea_surface_height_above_sea_level"

data = pd.read_csv(fname, parse_dates=["time"])
data.head()

# Call test method directly

You can run all individual QARTOD tests directly, manually passing in data and parameters.

In [None]:
qc_results = qartod.spike_test(
    inp=data[variable_name],
    suspect_threshold=0.8,
    fail_threshold=3,
)

print(qc_results)

# QC configuration and Running

While you can call qartod methods directly, we recommend using IOOS-QC's conceptual objects instead. This involves:
1. Creation of a Config (configuration) object
2. Translation of the input data to a Stream object
3. Application of the Config to the Stream
4. Translation of the results to a Store object
5. Combination of the original data and the results into a new DataFrame

Descriptions of each test and its inputs can be found in the [ioos_qc.qartod module documentation](https://ioos.github.io/ioos_qc/api/ioos_qc.html#module-ioos_qc.qartod)

[QartodFlags](https://ioos.github.io/ioos_qc/api/ioos_qc.html#ioos_qc.qartod.QartodFlags) defines the flag meanings.


The configuration object can be initialized using a dictionary or a YAML file. Here is one example:

In [None]:
qc_config = {
    "streams": {
        variable_name: {
            "qartod": {
                "spike_test": {
                    "suspect_threshold": 0.8,
                    "fail_threshold": 3,
                },
            },
        },
    },
}
qc = Config(qc_config)

and now we can run the test.

In [None]:
# Convert the data to a Stream (Pandas dataframe to a PandasStream)
pandas_stream = PandasStream(data)

# Pass the configuration to the Stream's run method to create a results generator
result_generator = pandas_stream.run(qc)

# Put the results in a new Pandas dataframe and combine with the original data
store = PandasStore(result_generator)
results_store = store.save(write_data=False, write_axes=False)
qc_results = pd.concat([data, results_store], axis=1)

qc_results.head()

These results can be visualized using Bokeh.

Using Bokeh's ColumnDataSource avoids a complaint about timezones

In [None]:
# create a ColumnDataSource by passing the dataframe of results and original data
source = ColumnDataSource(data=qc_results)
title = "Water Level [MHHW] [m] : Kotzebue, AK"

p1 = figure(x_axis_type="datetime", title=f"Spike Test : {title}")
p1.grid.grid_line_alpha = 0.3
p1.xaxis.axis_label = "Time"
p1.yaxis.axis_label = "Spike Test Result"
p1.line(x='time', y='sea_surface_height_above_sea_level_qartod_spike_test', source=source, color="blue")

show(gridplot([[p1]], width=800, height=400))

# Alternative Configuration Method


Here is the same example but using the YAML file instead.

In [None]:
qc = Config("./spike_test.yaml")

# Convert the data to a Stream (Pandas dataframe to a PandasStream)
pandas_stream = PandasStream(data)

# Pass the run method the config to use
result_generator = pandas_stream.run(qc)

# Put the results in a new Pandas dataframe and combine with the original data
store = PandasStore(result_generator)
results_store = store.save(write_data=False, write_axes=False)
qc_results = pd.concat([data, results_store], axis=1)
qc_results.head()

# Using the Flags

The array of flags can then be used to filter data to create color plots

In [None]:
def plot_results(data, var_name, test_name, title):
    """Plot timeseries of original data colored by quality flag

    Args:
    ----
        data: pd.DataFrame of original data and results including a time variable
        var_name: string name of the variable to plot
        test_name: name of the test to determine which flags to use
        title: string to add to plot title
    """

    # create a ColumnDataSource by passing the dataframe of original data
    source = ColumnDataSource(data=data)
    qc_test = data[var_name + '_qartod_' + test_name]

    # Create a separate timeseries of each flag value
    source.data["qc_pass"] = np.ma.masked_where(qc_test != 1, data[var_name])
    source.data["qc_suspect"] = np.ma.masked_where(qc_test != 3, data[var_name])
    source.data["qc_fail"] = np.ma.masked_where(qc_test != 4, data[var_name])
    source.data["qc_notrun"] = np.ma.masked_where(qc_test != 2, data[var_name])  

    # start the figure
    p1 = figure(x_axis_type="datetime", title=test_name + " : " + title)
    p1.grid.grid_line_alpha = 0.3
    p1.xaxis.axis_label = "Time"
    p1.yaxis.axis_label = "Observation Value"

    # plot the data, and the data colored by flag
    p1.line(x='time', y=var_name, source=source, legend_label="obs", color="#A6CEE3")
    p1.scatter(
        x='time', y='qc_notrun', source=source,
        size=2, color="gray", alpha=0.2,
        legend_label="qc not run",
    )
    p1.scatter(
        x='time', y='qc_pass', source=source, 
        size=4, color="green", alpha=0.5,
        legend_label="qc pass"
    )
    p1.scatter(
        x='time', y='qc_suspect', source=source, 
        size=4, color="orange", alpha=0.7,
        legend_label="qc suspect",
    )
    p1.scatter(
        x='time', y='qc_fail', source=source,
        size=6, color="red", alpha=1.0,
        legend_label="qc fail"
    )

    # show the plot
    show(gridplot([[p1]], width=800, height=400))

In [None]:
plot_results(qc_results, variable_name, "spike_test", title)

Plot flag values again for comparison.

In [None]:
show(gridplot([[p1]], width=800, height=400))