## Parsing and Plotting QC Algorithm Results and Annotations

In this example we will learn how to programatically download OOI JSON data and work with the QC algorithm results as well as annotations. We will use data from the Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen for this example, but the mechanics apply to all datasets that are processed through the OOI Cyberinfrastructure (CI) system. You wil learn:

* how to find the data you are looking for
* how to use the machine to machine API to request JSON data
* how to explore and interactively plot data using bokeh
* how to parse and visualize QC results
* how to parse and visualize Annotations

For the instrument in this example, you will need the Reference Designator, Stream and Data Delivery Method to make the request to the M2M API. More information about the instrument can be found here:
http://ooi.visualocean.net/instruments/view/GI01SUMO-RID16-06-DOSTAD000

![GI01SUMO-RID16-06-DOSTAD000](../../images/GI01SUMO-RID16-06-DOSTAD000.png)

In [1]:
import requests
import datetime

Before we get started, login in at https://ooinet.oceanobservatories.org/ and obtain your <b>API username and API token</b> under your profile (top right corner), or use the credential provided below.

In [2]:
username = 'OOIAPI-D8S960UXPK4K03'
token = 'IXL48EQ2XY'

Specify your inputs.

In [3]:
subsite = 'GI01SUMO'
node = 'RID16'
sensor = '06-DOSTAD000'
method = 'recovered_host'
stream = 'dosta_abcdjm_dcl_instrument_recovered'
beginDT = '2015-09-01T01:01:01.900Z'
endDT = '2016-03-01T01:01:01.900Z'

Build the GET request URL and send the request to the M2M API endpoint.

In [14]:
base_url = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'

data_request_url ='/'.join((base_url,subsite,node,sensor,method,stream))
params = {
    'beginDT':beginDT,
    'endDT':endDT,
    'limit':10000,   
}

r = requests.get(data_request_url, params=params,auth=(username, token))
data = r.json()

How many data points were returned?

In [15]:
len(data)

10001

Examine the content of the first data point.

In [16]:
data[0]

{'raw_temperature': 441.0,
 'estimated_oxygen_concentration_qc_executed': 9,
 'ctdbp_cdef_dcl_instrument_recovered-pressure': 11.967000007629395,
 'red_phase': 8.777000427246094,
 'dcl_controller_timestamp': 'empty',
 'dosta_abcdjm_cspp_tc_oxygen_qc_results': 13,
 'product_number': 4831,
 'estimated_oxygen_saturation_qc_results': 1,
 'driver_timestamp': 3718195882.4196177,
 'dissolved_oxygen_qc_executed': 29,
 'internal_timestamp': 0.0,
 'optode_temperature': 8.5,
 'serial_number': '457',
 'temp_compensated_phase': 32.02000045776367,
 'optode_temperature_qc_executed': 29,
 'ctdbp_cdef_dcl_instrument_recovered-temp': 8.668100357055664,
 'dissolved_oxygen_qc_results': 29,
 'calibrated_phase': 32.02000045776367,
 'ingestion_timestamp': 3718195887.715,
 'port_timestamp': 3650058060.025,
 'estimated_oxygen_saturation': 99.99099731445312,
 'optode_temperature_qc_results': 21,
 'pk': {'node': 'RID16',
  'stream': 'dosta_abcdjm_dcl_instrument_recovered',
  'subsite': 'GI01SUMO',
  'deployment'

Convert the json response to a pandas dataframe and convert the time stamps.

In [19]:
import pandas as pd
import numpy as np
import json

In [20]:
df = pd.DataFrame.from_records(map(json.loads, map(json.dumps,data)))
df['time'] = pd.to_datetime(df['time'], unit='s', origin=pd.Timestamp('1900-01-01'))

Extract the dissolved oxygen parameter for plotting.

In [21]:
time = list(df['time'].values)
oxygen = list(df['dissolved_oxygen'].values)

Plot the data.

In [22]:
import os
from bokeh.plotting import figure, output_file, reset_output, show, ColumnDataSource, save
from bokeh.models import BoxAnnotation
from bokeh.io import output_notebook

In [23]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.circle(time, oxygen, fill_color='white', fill_alpha=0.2, size=4)
output_notebook()
show(p)

Extract only the qc results.

In [24]:
df = df[['time', 'dissolved_oxygen','dissolved_oxygen_qc_results','dissolved_oxygen_qc_executed']]
df.head()

Unnamed: 0,time,dissolved_oxygen,dissolved_oxygen_qc_results,dissolved_oxygen_qc_executed
0,2015-09-01 01:01:00.025,280.942387,29,29
1,2015-09-01 01:01:02.025,280.709263,29,29
2,2015-09-01 01:18:01.721,280.943044,29,29
3,2015-09-01 01:48:00.990,280.627839,29,29
4,2015-09-01 02:18:02.252,280.96351,29,29


The QC flags for all tests are OR'd together to produce a single value for each data point. So, given a qc_executed value of 29 we can see which tests were run by reversing the process:  

QC table
```
Test name              Bit position
                         15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
global_range_test         0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1
dataqc_localrangetest     0  0  0  0  0  0  0  0  0  0  0  0  0  0  1  0
dataqc_spiketest          0  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
dataqc_polytrendtest      0  0  0  0  0  0  0  0  0  0  0  0  1  0  0  0
dataqc_stuckvaluetest     0  0  0  0  0  0  0  0  0  0  0  1  0  0  0  0
dataqc_gradienttest       0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
dataqc_propagateflags     0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0
```



In [25]:
np.unpackbits(np.array(29).astype('uint8'))

array([0, 0, 0, 1, 1, 1, 0, 1], dtype=uint8)

If you compare this result to the table above you can see that the following tests were executed:

```
global_range_test
dataqc_spiketest
dataqc_polytrendtest
dataqc_stuckvaluetest
```

We can craft a function to create new booleans variables for each test run containing the pass/fail results from that test:

In [26]:
def parse_qc(df):
    vars = [x.split('_qc_results')[0] for x in df.columns if 'qc_results' in x]
    results = [x+'_qc_results' for x in vars]
    executed = [x+'_qc_executed' for x in vars]
    key_list = vars + results + executed

    for var in vars:
        qc_result = var + '_qc_results'
        qc_executed = var + '_qc_executed'
        names = {
            0: var + '_global_range_test',
            1: var + '_dataqc_localrangetest',
            2: var + '_dataqc_spiketest',
            3: var + '_dataqc_polytrendtest',
            4: var + '_dataqc_stuckvaluetest',
            5: var + '_dataqc_gradienttest',
            7: var + '_dataqc_propagateflags',
        }
        # Just in case a different set of tests were run on some datapoint. *This should never happen*
        executed = np.bitwise_or.reduce(df[qc_executed].values)
        executed_bits = np.unpackbits(executed.astype('uint8'))
        for index, value in enumerate(executed_bits[::-1]):
            if value:
                name = names.get(index)
                mask = 2 ** index
                values = (df[qc_result].values & mask) > 0
                df[name] = values
        df.drop([qc_executed, qc_result], axis=1, inplace=True)
    return df

Run the function. The result gives us the QC algorithm result for every data point. True = test passed.

In [27]:
df_qc = parse_qc(df)
df_qc.head()

Unnamed: 0,time,dissolved_oxygen,dissolved_oxygen_global_range_test,dissolved_oxygen_dataqc_spiketest,dissolved_oxygen_dataqc_polytrendtest,dissolved_oxygen_dataqc_stuckvaluetest
0,2015-09-01 01:01:00.025,280.942387,True,True,True,True
1,2015-09-01 01:01:02.025,280.709263,True,True,True,True
2,2015-09-01 01:18:01.721,280.943044,True,True,True,True
3,2015-09-01 01:48:00.990,280.627839,True,True,True,True
4,2015-09-01 02:18:02.252,280.96351,True,True,True,True


Select data points that failed the global range test, for example.

In [28]:
df_qc[df_qc['dissolved_oxygen_global_range_test'] == False]

Unnamed: 0,time,dissolved_oxygen,dissolved_oxygen_global_range_test,dissolved_oxygen_dataqc_spiketest,dissolved_oxygen_dataqc_polytrendtest,dissolved_oxygen_dataqc_stuckvaluetest
8717,2016-02-06 16:32:46.568,400.346893,False,True,True,True
8878,2016-02-09 14:48:02.315,419.155457,False,True,True,True
8879,2016-02-09 15:18:01.063,446.684005,False,True,True,True
8880,2016-02-09 15:33:00.387,409.086437,False,True,True,True
8984,2016-02-11 13:03:02.232,410.509816,False,True,True,True
9537,2016-02-21 14:45:30.841,400.732683,False,True,True,True
9541,2016-02-21 16:30:20.460,402.557524,False,True,True,True
9645,2016-02-23 13:48:02.166,405.283286,False,True,True,True
9646,2016-02-23 14:18:02.873,451.285173,False,True,True,True
9861,2016-02-27 12:17:44.221,403.570887,False,True,True,True


Plot points that failed the test in red.

In [29]:
colormap = {False: 'red', True: 'green'}
colors = [colormap[x] for x in df_qc['dissolved_oxygen_global_range_test']]

In [30]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.circle(time, oxygen, color=colors, fill_alpha=0.2, size=4)
output_notebook()
show(p)

Import annotations for 'GI01SUMO-RID16-06-DOSTAD000'. See the request_annotations.ipynb notebook for more details.

In [31]:
import netCDF4 as nc

In [32]:
beginDT = int(nc.date2num(datetime.datetime.strptime("2012-01-01T01:00:01Z",'%Y-%m-%dT%H:%M:%SZ'),'seconds since 1970-01-01')*1000)
endDT = int(nc.date2num(datetime.datetime.utcnow(),'seconds since 1970-01-01')*1000)

anno_base_url = 'https://ooinet.oceanobservatories.org/api/m2m/12580/anno/find?' # base url and port for annotations

params = { # define parameters
    'beginDT':beginDT,
    'endDT':endDT,
    'refdes':'GI01SUMO-RID16-06-DOSTAD000'
}

r = requests.get(anno_base_url, params=params,auth=(username, token)) # send data request

anno_data = pd.read_json(json.dumps(r.json())) # convert json response to pandas dataframe

Set up a function to convert the annotation milliseconds since 1970, which is a different time schema than is used for data, which is in seconds since 1900.

In [33]:
def convert_time(time_stamp):
    try: 
        time_stamp = (int(time_stamp)) / 1000
        time_stamp = nc.num2date(time_stamp,'seconds since 1970-01-01')
    except:
        pass
    return time_stamp

# convert time stamps
anno_data['beginDT'] = anno_data['beginDT'].apply(convert_time)
anno_data['endDT'] = anno_data['endDT'].apply(convert_time)

Print the annotations.

In [34]:
for i in range(len((anno_data['annotation'].values))):
    print(i)
    print(anno_data['annotation'].iloc[i])
    print('start time:', anno_data['beginDT'].iloc[i])
    print('end time:', anno_data['endDT'].iloc[i],'\n')

0
Deployment 1: Status data sent from the buoy included leak detects in the buoy well, drop in battery voltage, and loss of wind turbine input. Upon recovery, the buoy was primarily intact but several instruments were damaged and/or missing. Ice build-up on the tower is speculated to be the cause for much of the damage. No telemetered or recovered_host data expected. Functional instruments could continue to collect data using internal battery power and storage cards.
start time: 2015-02-15 00:00:00
end time: 2015-08-22 00:00:00 

1
Deployment 2: A period of violent weather caused power outages on multiple instruments. No telemetered or recovered_host data expected. Functional instruments could continue to collect data using internal battery power and storage cards. Upon recovery, the buoy well was flooded.
start time: 2016-01-27 00:00:00
end time: 2016-07-19 00:00:00 

2
Deployment 4: at 10:03 UTC on 12 October 2017 the Irminger Sea Surface Mooring stopped all communications. Current s

Select information from the fourth annotation and create the final plot.

In [35]:
anno_start_time = anno_data['beginDT'].iloc[4]
anno = anno_data['annotation'].iloc[4]

In [36]:
p = figure(width=800,
           height=400,
           title='Global Irminger Sea Apex Surface Mooring - Near Surface Instrument Frame - Dissolved Oxygen',
           x_axis_label='Time (GMT)',
           y_axis_label='Oxygen umol kg-1',
           x_axis_type='datetime')

p.line([anno_start_time,time[-1]], [(min(oxygen)-10),(min(oxygen)-10)], line_width=10, legend='Annotation: '+anno)
p.circle(time, oxygen, color=colors, fill_alpha=0.2, size=4)
p.legend.location = "top_left"

output_notebook()
show(p)


Optionally, you can save the plot as an html file for sharing.

In [37]:
output_file(os.getcwd())
save(p, filename='plot.html')

'/Users/knuth/Documents/ooi/repos/github/ooi_datateam_notebooks/notebooks/metadata_access/plot.html'