# Example Notebook:
This notebook provides an example on how to use the OOINet download tool to perform the following functions:
* Search for datasets
* Identify desired reference designator
* Get the associated metadata for a given reference designator
* Request netCDF datasets for a reference designator
* Download the netCDF dataset to your local machine

The key parameters which the OOINet API requires is the "reference designator." A reference designator may be thought of as a type of instrument located at a fixed location and depth. For example, below use the **CP01CNSM-RID27-03-CTDBPC000**, which is the CTD located at 7 meters depth on the Pioneer Array Central Surface Mooring at approximately (latitude, longitude) of (40.14, -70.7783).

In [1]:
# This is necessary if not installed as a package
import sys
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
sys.path.append("../ooinet/")

In [2]:
# Import the M2M module
import M2M

## Initialize the Tool
In order to utilize the OOINet tool, it needs to be initialized with your OOINet **username** and **token**. These may be found by logging onto ooinet.oceanobservatories.org and looking under your profile. If you have not registered with OOI, you cannot query OOINet via M2M, since it require authentication.

Personally, I store my OOI username and password locally in a yaml file which is excluded from git tracking. 

In [3]:
import yaml

# Import user info for accessing UFrame
userinfo = yaml.load(open('/home/areed/Documents/OOI/reedan88/QAQC_Sandbox/user_info.yaml'), Loader=yaml.SafeLoader)
username = userinfo['apiname']
token = userinfo['apikey']

In [4]:
OOI = M2M(username, token)

TypeError: 'module' object is not callable

---
## Search Datasets
First, we can search the available OOI Reference Designators (i.e. "refdes" for short) on the following keys: **array**, **node**, **instrument**. Additionally, can request for "**English_names**", which will return the descriptive name for the associated array, node, and instrument. Below, we will search for the available CTD instruments on the Pioneer Array Central Surface Mooring.

The major caveat with the search is, similar to searching on ERDDAP datasets, the search terms must be partial or full match based on OOI nomenclature. For example, we have to search for "CTD", "CTDBP", or the full instrument name "03-CTDBPC000". We can't search "conductivity", "temperature" or other CTD-related instrument terms.

In [5]:
datasets = M2M.search_datasets(array="CP01CNSM", instrument="CTD", English_names=True)
datasets

Output()

Unnamed: 0,array,array_name,node,node_name,instrument,instrument_name,refdes,url,deployments
0,CP01CNSM,Coastal Pioneer Central Surface Mooring,RID27,Near Surface Instrument Frame,03-CTDBPC000,CTD,CP01CNSM-RID27-03-CTDBPC000,https://ooinet.oceanobservatories.org/api/m2m/...,"[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14..."
1,CP01CNSM,Coastal Pioneer Central Surface Mooring,MFD37,Seafloor Multi-Function Node (MFN),03-CTDBPD000,CTD,CP01CNSM-MFD37-03-CTDBPD000,https://ooinet.oceanobservatories.org/api/m2m/...,"[1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1..."


From the above datasets, we're going to select the CTDBP instrument on the Pioneer Array Central Surface Mooring Near-Surface Instrument Frame (located at 7m depth), which has a reference designator **CP01CNSM-RID27-03-CTDBPC000**. 

In [6]:
refdes = "CP01CNSM-RID27-03-CTDBPC000"

---
## Metadata
Next, we can query OOINet for the metadata associated with the selected reference designator. The metadata contains such valuable information such as the available methods and streams (which are required to download the data), the particleKeys (the data variable names), and the associated units. 

In [7]:
metadata = M2M.get_metadata(refdes=refdes)
metadata

Unnamed: 0,pdId,particleKey,type,shape,units,fillValue,stream,unsigned,method,count,beginTime,endTime,refdes
0,PD1,conductivity,FLOAT,SCALAR,S m-1,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
1,PD2,pressure,FLOAT,SCALAR,dbar,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
2,PD5,density,FLOAT,FUNCTION,kg m-3,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
3,PD6,temp,FLOAT,SCALAR,ºC,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
4,PD7,time,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
5,PD10,port_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
6,PD11,driver_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
7,PD12,internal_timestamp,DOUBLE,SCALAR,seconds since 1900-01-01,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
8,PD13,practical_salinity,FLOAT,FUNCTION,1,-9999999,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000
9,PD16,preferred_timestamp,STRING,SCALAR,1,empty,ctdbp_cdef_dcl_instrument,False,telemetered,2768311,2013-11-21T18:16:05.230Z,2022-07-14T18:03:00.557Z,CP01CNSM-RID27-03-CTDBPC000


---
## Deployment Information
When we searched for CTD datasets on the Pioneer Central Surface Mooring, it returned a table which listed the available deployment numbers for each of the datasets. We can get much more detailed information on the deployments for a particular reference designator by requesting the deployment information from OOINet.

In [8]:
deployments = M2M.get_deployments(refdes=refdes)
deployments

Unnamed: 0,deploymentNumber,uid,assetId,latitude,longitude,depth,deployStart,deployEnd,deployCruise,recoverCruise
0,1,CGINS-CTDBPC-07208,1405,40.13678,-70.76978,7.0,2013-11-21 18:16:00,2014-04-18 10:33:00,KN214,KN217
1,2,CGINS-CTDBPC-06841,3258,40.1339,-70.7789,7.0,2014-12-13 18:47:00,2014-12-15 20:58:00,KN224,KN224
2,3,CGINS-CTDBPC-06841,3258,40.14022,-70.77128,7.0,2015-05-07 17:34:00,2015-10-23 19:40:00,AT27,AT31
3,4,CGINS-CTDBPC-50002,1497,40.13323,-70.77843,7.0,2015-10-23 18:49:00,2016-04-04 12:03:00,AT31,AR1-07
4,5,CGINS-CTDBPC-06841,3258,40.14037,-70.77133,7.0,2016-05-13 13:50:00,2016-10-13 19:34:00,AR4,AR8
5,6,CGINS-CTDBPC-50108,3184,40.13342,-70.77847,7.0,2016-10-13 18:36:00,2017-06-09 16:05:00,AR8,AR18
6,7,CGINS-CTDBPC-50002,1497,40.139817,-70.77115,7.0,2017-06-09 14:24:00,2017-11-01 20:33:00,AR18,AR24
7,8,CGINS-CTDBPC-07208,1405,40.133383,-70.7783,7.0,2017-10-29 14:15:00,2018-03-29 19:37:00,AR24,AR28
8,9,CGINS-CTDBPC-50002,1497,40.13975,-70.77128,7.0,2018-03-24 21:32:00,2018-10-29 12:31:00,AR28,AR31
9,10,CGINS-CTDBPC-50108,3184,40.133367,-70.7777,7.0,2018-10-30 01:48:00,2019-04-07 18:08:00,AR31,AR34


---
## Vocab Information
Additionally, if we are interested in more detailed information on the location that the reference designator is assigned to, we can request the vocab information for the given reference designator. The vocab information includes some of the "**English_names**" info we requested when searching for datasets, as well as instrument model, manufacturer, and the descriptive names for the reference designator location.

In [9]:
vocab = M2M.get_vocab(refdes=refdes)
vocab

Unnamed: 0,@class,vocabId,refdes,instrument,tocL1,tocL2,tocL3,manufacturer,model,mindepth,maxdepth
0,.VocabRecord,475,CP01CNSM-RID27-03-CTDBPC000,CTD,Coastal Pioneer,Central Surface Mooring,Near Surface Instrument Frame,Sea-Bird,SBE 16plusV2,7.0,7.0


---
## Calibration Information
We can also request the calibration information for a given reference designator. Since individual instruments are swapped during each mooring deployment & recovery, the calibration coefficients for a reference designator are different for each deployment. The way OOI operates is that it loads all the available calibration coefficients for a given reference designator. Then, for each deployment, it finds the calibration coefficients with the most recent calibration date which most closely _precedes_ the start of the deployment. The result is a table, sorted by deployment number for a reference designator, with the uid of the specific instrument, its calibration coefficients, when the instrument was calibrated, and the source of the calibration coefficients.

In [10]:
calibrations = M2M.get_calibrations_by_refdes(refdes, deployments)
calibrations

Unnamed: 0,deploymentNumber,uid,calCoef,value,calFile
0,1,CGINS-CTDBPC-07208,CC_i,-3.231306e-04,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
1,1,CGINS-CTDBPC-07208,CC_h,1.352599e-01,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
2,1,CGINS-CTDBPC-07208,CC_g,-9.907533e-01,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
3,1,CGINS-CTDBPC-07208,CC_pa0,-1.261873e-02,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
4,1,CGINS-CTDBPC-07208,CC_pa2,-4.831339e-12,CGINS-CTDBPC-07208__20121025_Cal_Info.xlsx
...,...,...,...,...,...
347,16,CGINS-CTDBPC-50109,CC_a3,1.770880e-07,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
348,16,CGINS-CTDBPC-50109,CC_ptempa0,-5.388587e+01,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
349,16,CGINS-CTDBPC-50109,CC_cpcor,-9.570000e-08,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
350,16,CGINS-CTDBPC-50109,CC_ctcor,3.250000e-06,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx


It is also possible to request the calibration history for a specific instrument by utilizing the **uid** of the instrument and using the lower-level ```_get_api``` method and ```OOINet.URLS``` attribute to construct your own request.

In [11]:
uid = "CGINS-CTDBPC-50109" # This is unique to each instrument
instrument_cals = M2M.get_calibrations_by_uid(uid)
instrument_cals

Unnamed: 0,uid,calCoef,calDate,value,calFile
77,CGINS-CTDBPC-50109,CC_a0,2015-04-03 00:00:00,0.001245,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
88,CGINS-CTDBPC-50109,CC_a1,2015-04-03 00:00:00,0.000278,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
91,CGINS-CTDBPC-50109,CC_a2,2015-04-03 00:00:00,-0.000001,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
102,CGINS-CTDBPC-50109,CC_a3,2015-04-03 00:00:00,0.0,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
115,CGINS-CTDBPC-50109,CC_cpcor,2015-04-03 00:00:00,-0.0,CGINS-CTDBPC-50109__20150403_Cal_Info.xlsx
...,...,...,...,...,...
59,CGINS-CTDBPC-50109,CC_ptcb1,2022-01-30 00:00:00,0.0003,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
45,CGINS-CTDBPC-50109,CC_ptcb2,2022-01-30 00:00:00,0.0,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
113,CGINS-CTDBPC-50109,CC_ptempa0,2022-01-30 00:00:00,-53.88587,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx
100,CGINS-CTDBPC-50109,CC_ptempa1,2022-01-30 00:00:00,55.17642,CGINS-CTDBPC-50109__20220130_Cal_Info.xlsx


---
## Download Datasets
The ultimate goal of the queries above were to identify what data streams(s) we are interested in requesting data from to download. Now we want to be able to request those data streams and get the associated netCDF files. This process involves the following steps:
1. Identify the methods and data streams for the selected reference designator
2. Request the THREDDS server url for the data sets
3. Get the catalog of datasets on the THREDDS server
4. Parse the catalog for the desired netCDF files
5. Download the identified netCDF files to a local directory

**1.** Get the methods and data streams associated with the given reference designator:

In [12]:
streams = M2M.get_datastreams(refdes)
streams

Unnamed: 0,refdes,method,stream
0,CP01CNSM-RID27-03-CTDBPC000,recovered_host,ctdbp_cdef_dcl_instrument_recovered
1,CP01CNSM-RID27-03-CTDBPC000,recovered_inst,ctdbp_cdef_instrument_recovered
2,CP01CNSM-RID27-03-CTDBPC000,telemetered,ctdbp_cdef_dcl_instrument


**2.** Now, we request the THREDDS server url from OOINet. At a minimum, this requires the reference designator, method, and stream as inputs. This will request the datasets for _all_ deployments.

If we want to further limit the request to a specific deployment or a specific time period, we can do that by passing the arguments **beginDT** (begin datetime) and **endDT** (end datetime). 

Additionally, we can input some optional arguments that will return diagnostic information. The **include_provenance** will return a separate text file with information on the provenance of the data, such as the calibration coefficients applied. The **include_annotations** returns a separate text file of annotations, which are descriptions of issues and information associated with the given dataset.

In [13]:
method = "recovered_inst"
stream = "ctdbp_cdef_instrument_recovered"

In [14]:
thredds_url = M2M.get_thredds_url(refdes, method, stream, goldCopy=True)
thredds_url

'https://thredds.dataexplorer.oceanobservatories.org/thredds/catalog/ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/catalog.html'

**3.** With the appropriate THREDDS url, we can query the THREDDS catalog to get the netCDF datasets. Additionally, we'll clean the datasets to delete potentially empty datasets in the catalog (this is common for some instruments):

In [15]:
catalog = M2M.get_thredds_catalog(thredds_url)
catalog

['catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0001_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20131121T181601-20140217T132711.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0004_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20151023T191528-20160402T034848.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0005_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20160513T135001-20161013T193001.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0006_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20161013T183731-20161122T190001.nc',
 'catalog.html?dataset=o

**4.** Next, we want to clean up the THREDDS catalog of either unwanted datasets or empty datasets. Depending on the instrument, it may generate a dataset file even if there was no data in the time period of the file. 

In [19]:
catalog = M2M.clean_catalog(catalog, stream, deployments)
catalog

['catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0001_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20131121T181601-20140217T132711.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0004_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20151023T191528-20160402T034848.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0005_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20160513T135001-20161013T193001.nc',
 'catalog.html?dataset=ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0006_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20161013T183731-20161122T190001.nc',
 'catalog.html?dataset=o

**5.** Finally, we can download the netCDF files to a specified save directory (**saveDir**). The available function utilizes multithreading to speed up I/O operations. 

In [20]:
saveDir = "/home/areed/Documents/OOI/reedan88/ooinet/examples/data/"
M2M.download_netCDF_files(catalog, goldCopy=True, saveDir=saveDir)

Output()

Downloading https://thredds.dataexplorer.oceanobservatories.org/thredds/fileServer/ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0005_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20160513T135001-20161013T193001.nc to /home/areed/Documents/OOI/reedan88/ooinet/examples/data//deployment0005_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20160513T135001-20161013T193001.ncDownloading https://thredds.dataexplorer.oceanobservatories.org/thredds/fileServer/ooigoldcopy/public/CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered/deployment0001_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20131121T181601-20140217T132711.nc to /home/areed/Documents/OOI/reedan88/ooinet/examples/data//deployment0001_CP01CNSM-RID27-03-CTDBPC000-recovered_inst-ctdbp_cdef_instrument_recovered_20131121T181601-20140217T132711.ncDownloading https://thredds.

---
## Open netCDF Datasets
If, instead of downloading the datasets to a local directory, we wanted to load those datasets into a single dataset remotely? This can be done using the ```load_netCDF_ddatasets``` method, which takes in the THREDDS catalog queried above and utilizes ```xarray.open_mfdataset``` to load the datasets into a single ```xarray.DataSet```. Additionally, the ```load_netCDF_datasets``` checks each file to make sure it is properly formed by identifying empty or malformed datasets.

However, there is a complication when opening multiple netCDF files as a single dataset. OOI makes use of overlapping deployments, such that the CTD for deployment 11 goes into the water and starts collecting data _before_ the CTD from deployment 10 comes out of the water. Unfortunately, ```xarray.open_mfdatasets``` disallows overlapping primary dimensions. Consequently, the ```load_netCDF_datasets``` method utilizes a preprocessing routine to trim datasets to avoid overlapping times. In the above example, the trimming would cut off the end of the CTD record for deployment 10 in favor of keeping the start of the CTD record from deployment 11.

If you desire to retain the overlapping time periods, this may be accomplished by concatentating the datasets. Beware, concatentation requires loading _all_ of the data into memory and cannot take advantage of the built-in dask parallel processing, which significantly slows down dataset loading and, for large datasets (such as profilers) you are likely to run out of memory before the dataset is fully concatenated.