# Example Notebook:
This notebook provides an example on how to use the OOINet download tool to perform the following functions:
* Search for datasets
* Identify desired reference designator
* Get the associated metadata for a given reference designator
* Request netCDF datasets for a reference designator
* Download the netCDF dataset to your local machine

The key parameters which the OOINet API requires is the "reference designator." A reference designator may be thought of as a type of instrument located at a fixed location and depth. For example, below use the **CP01CNSM-RID27-03-CTDBPC000**, which is the CTD located at 7 meters depth on the Pioneer Array Central Surface Mooring at approximately (latitude, longitude) of (40.14, -70.7783).

In [23]:
# This is necessary if not installed as a package
import sys
import pandas as pd
import numpy as np
sys.path.append("../ooinet/")

In [2]:
# Import the M2M module
from m2m import M2M

## Initialize the Tool
In order to utilize the OOINet tool, it needs to be initialized with your OOINet **username** and **token**. These may be found by logging onto ooinet.oceanobservatories.org and looking under your profile. If you have not registered with OOI, you cannot query OOINet via M2M, since it require authentication.

Personally, I store my OOI username and password locally in a yaml file which is excluded from git tracking. 

In [5]:
import yaml
import warnings
warnings.filterwarnings("ignore")
# Import user info for accessing UFrame
userinfo = yaml.load(open('/home/andrew/Documents/OOI-CGSN/QAQC_Sandbox/user_info.yaml'))
username = userinfo['apiname']
token = userinfo['apikey']

In [6]:
OOI = M2M(username, token)

---
## Search Datasets
First, we can search the available OOI Reference Designators (i.e. "refdes" for short) on the following keys: **array**, **node**, **instrument**. Additionally, can request for "**English_names**", which will return the descriptive name for the associated array, node, and instrument. Below, we will search for the available CTD instruments on the Pioneer Array Central Surface Mooring.

The major caveat with the search is, similar to searching on ERDDAP datasets, the search terms must be partial or full match based on OOI nomenclature. For example, we have to search for "CTD", "CTDBP", or the full instrument name "03-CTDBPC000". We can't search "conductivity", "temperature" or other CTD-related instrument terms.

In [8]:
datasets = OOI.search_datasets(array="CP01CNSM", instrument="CTD", English_names=True)
datasets

https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/CP01CNSM


Unnamed: 0,array,array_name,node,node_name,instrument,instrument_name,refdes,url,deployments
0,CP01CNSM,Coastal Pioneer Central Surface Mooring,RID27,Near Surface Instrument Frame,03-CTDBPC000,CTD,CP01CNSM-RID27-03-CTDBPC000,https://ooinet.oceanobservatories.org/api/m2m/...,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]"
1,CP01CNSM,Coastal Pioneer Central Surface Mooring,MFD37,Seafloor Multi-Function Node (MFN),03-CTDBPD000,CTD,CP01CNSM-MFD37-03-CTDBPD000,https://ooinet.oceanobservatories.org/api/m2m/...,"[1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]"


From the above datasets, we're going to select the CTDBP instrument on the Pioneer Array Central Surface Mooring Near-Surface Instrument Frame (located at 7m depth), which has a reference designator **CP01CNSM-RID27-03-CTDBPC000**. 

In [9]:
refdes = "CP01CNSM-RID27-03-CTDBPC000"

---
## Metadata
Next, we can query OOINet for the metadata associated with the selected reference designator. The metadata contains such valuable information such as the available methods and streams (which are required to download the data), the particleKeys (the data variable names), and the associated units. 

In [10]:
metadata = OOI.get_metadata(refdes=refdes)
metadata

Unnamed: 0,pdId,particleKey,type,shape,units,fillValue,stream,unsigned,method,count,beginTime,endTime,refdes
0,PD1,conductivity,FLOAT,SCALAR,S m-1,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
1,PD2,pressure,FLOAT,SCALAR,dbar,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
2,PD6,temp,FLOAT,SCALAR,ºC,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
3,PD7,time,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
4,PD10,port_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
5,PD11,driver_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
6,PD12,internal_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
7,PD16,preferred_timestamp,STRING,SCALAR,1,empty,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
8,PD93,date_time_string,STRING,SCALAR,1,empty,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000
9,PD863,ingestion_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999,ctdbp_cdef_dcl_instrument,False,telemetered,2402863,2013-11-21T18:16:00.000Z,2021-07-14T10:02:52.791Z,CP01CNSM-RID27-03-CTDBPC000


---
## Deployment Information
When we searched for CTD datasets on the Pioneer Central Surface Mooring, it returned a table which listed the available deployment numbers for each of the datasets. We can get much more detailed information on the deployments for a particular reference designator by requesting the deployment information from OOINet.

In [11]:
deployments = OOI.get_deployments(refdes=refdes)
deployments

Unnamed: 0,deploymentNumber,uid,assetId,latitude,longitude,depth,deployStart,deployEnd,deployCruise,recoverCruise
0,14,CGINS-CTDBPC-50109,3092,40.1328,-70.778,7,2021-03-31 15:36:00,,AR52,
0,13,CGINS-CTDBPC-06841,3164,40.1405,-70.7696,7,2020-10-29 14:57:00,2021-04-03 12:05:00,AR48,AR52
0,12,CGINS-CTDBPC-07208,1323,40.1332,-70.7783,7,2019-09-27 18:30:00,2020-11-06 13:17:00,AR39,AR48
0,11,CGINS-CTDBPC-50002,1415,40.1401,-70.7714,7,2019-04-06 14:35:00,2019-09-26 17:15:00,AR34,AR39
0,10,CGINS-CTDBPC-50108,3090,40.1334,-70.7777,7,2018-10-30 01:48:00,2019-04-07 18:08:00,AR31,AR34
0,9,CGINS-CTDBPC-50002,1415,40.1397,-70.7713,7,2018-03-24 21:32:00,2018-10-29 12:31:00,AR28,AR31
0,8,CGINS-CTDBPC-07208,1323,40.1334,-70.7783,7,2017-10-29 14:15:00,2018-03-29 19:37:00,AR24,AR28
0,7,CGINS-CTDBPC-50002,1415,40.1398,-70.7712,7,2017-06-09 14:24:00,2017-11-01 20:33:00,AR18,AR24
0,6,CGINS-CTDBPC-50108,3090,40.1334,-70.7785,7,2016-10-13 18:36:00,2017-06-09 16:05:00,AR8,AR18
0,5,CGINS-CTDBPC-06841,3164,40.1404,-70.7713,7,2016-05-13 13:50:00,2016-10-13 19:34:00,AR4,AR8


---
## Vocab Information
Additionally, if we are interested in more detailed information on the location that the reference designator is assigned to, we can request the vocab information for the given reference designator. The vocab information includes some of the "**English_names**" info we requested when searching for datasets, as well as instrument model, manufacturer, and the descriptive names for the reference designator location.

In [12]:
vocab = OOI.get_vocab(refdes=refdes)
vocab

Unnamed: 0,@class,vocabId,refdes,instrument,tocL1,tocL2,tocL3,manufacturer,model,mindepth,maxdepth
0,.VocabRecord,475,CP01CNSM-RID27-03-CTDBPC000,CTD,Coastal Pioneer,Central Surface Mooring,Near Surface Instrument Frame,Sea-Bird,SBE 16plusV2,7.0,7.0


---
## Calibration Information
We can also request the calibration information for a given reference designator. Since individual instruments are swapped during each mooring deployment & recovery, the calibration coefficients for a reference designator are different for each deployment. The way OOI operates is that it loads all the available calibration coefficients for a given reference designator. Then, for each deployment, it finds the calibration coefficients with the most recent calibration date which most closely _precedes_ the start of the deployment. The result is a table, sorted by deployment number for a reference designator, with the uid of the specific instrument, its calibration coefficients, when the instrument was calibrated, and the source of the calibration coefficients.

In [13]:
calibrations = OOI.get_calibrations(refdes, deployments)
calibrations

Unnamed: 0,deploymentNumber,uid,calCoef,calDate,value,calFile
0,14,CGINS-CTDBPC-50109,CC_i,2020-01-16,-1.052683e-04,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
1,14,CGINS-CTDBPC-50109,CC_h,2020-01-16,1.508985e-01,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
2,14,CGINS-CTDBPC-50109,CC_g,2020-01-16,-9.743448e-01,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
3,14,CGINS-CTDBPC-50109,CC_pa0,2020-01-16,-3.282650e-02,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
4,14,CGINS-CTDBPC-50109,CC_pa2,2020-01-16,-4.223009e-12,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
...,...,...,...,...,...,...
303,1,CGINS-CTDBPC-07208,CC_a3,2012-10-25,1.864655e-07,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
304,1,CGINS-CTDBPC-07208,CC_ptempa0,2012-10-25,-6.337669e+01,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
305,1,CGINS-CTDBPC-07208,CC_cpcor,2012-10-25,-9.570000e-08,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
306,1,CGINS-CTDBPC-07208,CC_ctcor,2012-10-25,3.250000e-06,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx


It is also possible to request the calibration history for a specific instrument by utilizing the **uid** of the instrument and using the lower-level ```_get_api``` method and ```urls``` attribute to construct your own request.

In [27]:
# Set up the calibration url and arguments to pass to the request
cal_url = OOI.urls["cal"]
uid = "CGINS-CTDBPC-50109" # This is unique to each instrument
params = {
    "uid": uid
}

# Make the request
calInfo = OOI._get_api(cal_url, params=params)

# Put the data into a pandas dataframe, sorted by calibration date and coefficient name
columns = ["uid", "calCoef", "calDate", "value", "calFile"]
instrumentCals = pd.DataFrame(columns=columns)
for c in calInfo["calibration"]:
    for cc in c["calData"]:
        instrumentCals = instrumentCals.append({
            "uid": cc["assetUid"],
            "calCoef": cc["eventName"],
            "calDate": OOI._convert_time(cc["eventStartTime"]),
            "value": cc["value"],
            "calFile": cc["dataSource"]
        }, ignore_index=True)
instrumentCals.sort_values(by=["calDate", "calCoef"], inplace=True)
instrumentCals

Unnamed: 0,uid,calCoef,calDate,value,calFile
61,CGINS-CTDBPC-50109,CC_a0,2015-04-03,1.245406e-03,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
74,CGINS-CTDBPC-50109,CC_a1,2015-04-03,2.778690e-04,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
77,CGINS-CTDBPC-50109,CC_a2,2015-04-03,-1.490805e-06,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
89,CGINS-CTDBPC-50109,CC_a3,2015-04-03,1.957949e-07,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
96,CGINS-CTDBPC-50109,CC_cpcor,2015-04-03,-9.570000e-08,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
...,...,...,...,...,...
49,CGINS-CTDBPC-50109,CC_ptcb1,2020-01-16,3.000000e-04,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
36,CGINS-CTDBPC-50109,CC_ptcb2,2020-01-16,0.000000e+00,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
90,CGINS-CTDBPC-50109,CC_ptempa0,2020-01-16,-5.388684e+01,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx
84,CGINS-CTDBPC-50109,CC_ptempa1,2020-01-16,5.517781e+01,CGINS-CTDBPC-50109__20200116_Cal_Info.xlsx


---
## Download Datasets
The ultimate goal of the queries above were to identify what data streams(s) we are interested in requesting data from to download. Now we want to be able to request those data streams and get the associated netCDF files. This process involves the following steps:
1. Identify the methods and data streams for the selected reference designator
2. Request the THREDDS server url for the data sets
3. Get the catalog of datasets on the THREDDS server
4. Parse the catalog for the desired netCDF files
5. Download the identified netCDF files to a local directory

**1.** Get the methods and data streams associated with the given reference designator:

In [11]:
streams = OOI.get_datastreams(refdes)
streams

Unnamed: 0,refdes,method,stream
0,CP01CNSM-RID27-03-CTDBPC000,recovered_host,ctdbp_cdef_dcl_instrument_recovered
1,CP01CNSM-RID27-03-CTDBPC000,recovered_inst,ctdbp_cdef_instrument_recovered
2,CP01CNSM-RID27-03-CTDBPC000,telemetered,ctdbp_cdef_dcl_instrument


**2.** Now, we request the THREDDS server url from OOINet. At a minimum, this requires the reference designator, method, and stream as inputs. This will request the datasets for _all_ deployments.

If we want to further limit the request to a specific deployment or a specific time period, we can do that by passing the arguments **beginDT** (begin datetime) and **endDT** (end datetime). 

Additionally, we can input some optional arguments that will return diagnostic information. The **include_provenance** will return a separate text file with information on the provenance of the data, such as the calibration coefficients applied. The **include_annotations** returns a separate text file of annotations, which are descriptions of issues and information associated with the given dataset.

In [12]:
method = "recovered_inst"
stream = "ctdbp_cdef_instrument_recovered"

In [13]:
thredds_url = OOI.get_thredds_url(refdes, method, stream)
thredds_url

'https://opendap.oceanobservatories.org/thredds/catalog/ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/catalog.html'

**3.** With the appropriate THREDDS url, we can query the THREDDS catalog to identify:

In [14]:
catalog = OOI.get_thredds_catalog(thredds_url)
catalog

['',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/status.txt',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/status.json',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0011_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_aggregate_provenance.json',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0011_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20190406T144517-20190926T161808.nc',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0011_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered.ncml',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01C

**4.** Next, we want to parse the THREDDS catalog for the netCDF datasets which contain the relevant data. We can pass to the **exclude** keyword netCDF datasets which may be delivered as part of the THREDDS request, but which we aren't interested in. For example, Glider datasets frequently returned engineering (ENG) datastreams which contain engineering files but no relevant data.

In [15]:
netCDF_datasets = OOI.parse_catalog(catalog, exclude=["ENG", "gps"])
netCDF_datasets

['ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0011_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20190406T144517-20190926T161808.nc',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0008_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20171029T141519-20180329T190320.nc',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0007_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20170609T142931-20171101T202931.nc',
 'ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0006_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20161013T183731-20161122T190001.nc',
 'ooi/areed@whoi.edu/2020080

**5.** Finally, we can download the netCDF files to a specified save directory (**save_dir**):

In [17]:
save_dir = "/home/andrew/Documents/OOI-CGSN/ooinet/examples/"
OOI.download_netCDF_files(netCDF_datasets, save_dir=save_dir)

Downloading file 1 of 7: ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0011_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20190406T144517-20190926T161808.nc 

Downloading file 2 of 7: ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0008_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20171029T141519-20180329T190320.nc 

Downloading file 3 of 7: ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0007_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20170609T142931-20171101T202931.nc 

Downloading file 4 of 7: ooi/areed@whoi.edu/20200804T213650353Z-CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0006_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp

---
## Open netCDF Datasets
This section is still in development. OOI makes use of overlapping deployments: a replacement mooring is deployed before the mooring currently in the water is recovered. This typically results in several days overlap of data for a given mooring, allowing for direct comparision between two deployed instruments at the same location. 

However, this is complicated by the way ```xarray``` handles indexing, which disallows overlapping time dimensions as a primary index. This can be avoided by concatenating but beware: this requires loading _all_ data into memory, and for large datasets (such as for profilers) you will run out of memory before the dataset is fully concatentated. 

The alternative is to pass a ```preprocess``` routine into the ```xarray.open_mfdataset``` method which trims deployments based on start and end times of the preceding and following deployments to avoid overlaps. This is the approach I've taken to developing the method to facilitate opening multiple deployments as a single multidimensional dataset.