## A more detailed overview of this notebook

This notebook began as a comparison between profiler chlorophyll measurements near the
surface nine times per day to surface chlorophyll observations by the 
MODIS satellite once every eight days. Its scope expanded from there, considerably. 


One such expansion is considering other sources of data. We have for example 
a snapshot of the global ocean called GLODAP. After inspecting that on a global
scale we turn to a comparison of vertical profiles through the water column, 
specifically salinity and temperature. We want to compare GLODAP profiles as somewhat 
*static* snapshots with ongoing active profile measurements from ARGO drifters.


The Regional Cabled Array (RCA)
is an observatory stretching across the sea floor from the coast of Oregon 500 km out to
Axial Seamount. This observatory includes two types of profilers that rise and fall through
the water column: Both deep profilers that ascend from the sea floor and shallow profilers 
that rest on platforms at 200 meters depth and ascend to within a few meters of the surface.


We begin the RCA work focused on the shallow profiler as this is where the highest
concentration of chlorophyll is found.


* Regional Cabled Array (RCA): A cabled observatory on the sea floor off the coast of Oregon
* Site: A location in the RCA
* Platform: A mechanical structure -- static or mobile -- that resides at a site.
* Instrument: An electronic device carrying one or more sensors
* Sensor: A device that measures some aspect of the ocean like pH or temperature
* Stream: A stream of data produced by a sensor as part of an instrument located on a platform at a site in the RCA


This notebook describes a Python package called **yodapy** used to obtain stream data.


Here we use the traditional data path model


* search for data
* order data
* download data
* analyze data


We prefer a newer approach where data are already in place on the public cloud and the model is

* analyze data


Since that is our end-goal some of the data for this project will be (not done yet 3-20) set
in place in advance. 


Back to our process here: Once the data are in place we say that **yodapy** has finished its task.
We then turn to analysis using Python and particularly **XArray**. 

## Purpose of the **yodapy** Python package

`yodapy` is a contraction of **Y**our **O**cean **DA**ta **PY**thon library. It was written 
by Don Setiawan to facilitate working with **OOI** data in general (not just profiler data).


Before `yodapy` was written the process of finding and ordering data for OOI 
was a bit *involved*.  `yodapy` was developed to make this process more 
*programmable* and to provide search capability
without having to know precise terms like 
`RS01SBPS-SF01A-3D-SPKIRA101-streamed-spkir_data_record`.  



This notebook uses `yodapy` to search for, identify, order and download data, all in Python. 
A future extension of `yodapy` will make this process even simpler, referencing data that
are already in place on the public cloud. Rather than order and download data you simply
start working with it using the `XArray` idiom. 

<BR>

> **Takeaway 1: This notebook reviews `yodapy` specific to Regional Cabled Array (RCA) 
data but the pattern of use is relevant to data from other OOI segments.**

<BR>

> **Takeaway 2: A heads-up on authenticating: The OOI system requires you to *authenticate* your identity.
You do this by registering your email at their website. This is unrestricted, there is
no cost and it only takes a couple of minutes. `yodapy` helps you manage your resulting
credentials so once this is set up you are authenticated automatically.**

## Notebook features

Come back and re-write this (and never index your own book)

### Section on GLODAP and ARGO

### Regional Cabled Array and MODIS

- OOI / RCA data orders with `yodapy`
- working with `xarray` `DataArrays` and `Datasets` 
- plotting with `matplotlib`
  - line and scatter plots, multiple y-axes, labels, marker type and size
  - profiler curtain plots: time - depth - chlorophyll (as color) 
  - animation of time series data
  - interactivity
  - color bars
  - intrinsic plotting from DataArrays (with modifiers)

### Ordering and pulling ARGO data from the Coriolis system

## Data management

This Jupyter notebook resides in 
a sub-directory of the User's home directory `~`. It is bundled 
as an open source 
[github repository](https://github.com/robfatland/chlorophyll).
(abbreviated 'repo') on GitHub using the
Linux `git` utility. 
The repo is not intended for large data volumes.  


The data must reside *elsewhere* in the 
working environment, i.e. not within the repo directory.
I use `~/data/` with sub-directories to organize data content outside
of the `~/chlorophyll` directory. 
Each data source (MODIS, GLODAP, ARGO, RCA, ...) gets a dedicated sub-directory in `~/data`.


`xarray` has a wildcard multi-file open utility: `xr.open_mfdataset("Identifier*.nc")`.
This maps multiple NetCDF files to a single Dataset. 


The RCA data are ordered using a less convenient dimension, namely
observation number `obs`. This is just an ordinal integer 1, 2, 3, ...
The code in this notebook modifies this to use dimension `time`.

## Obtain Regional Cabled Array data using `yodapy`

As noted above the `yodapy` library enables Python-based access to OOI data. In this case we will focus
on the Regional Cabled Array (RCA) and particularly on the shallow profiler found at the site 
**Oregon Slope Base**. This site is at the base of the continental shelf in about 3000 meters of water.
The shallow profiler rises and falls nine times per day through the upper 200 meters of the water column.


### OOI data access back-story


To order data from **OOI** requires you to pre-register (free, using your email address). This provides you 
credentials when placing a data order. Orders typically take a few minutes for the OOI
servers to assemble; after which you receive an email with a download link. You download the data to local storage
and read files into memory and proceed from there, a very labor-intensive process.


### How `yodapy` helps


[`yodapy`](http://github.com/cormorack/yodapy) helps you automate OOI data access at each step. 
It sets up a credentials directory within your home directory;
and in so doing helps you avoid accidentally pushing your credentials to `github` where they would be public. `yodapy` 
allows you to create a Python object called an `OOI()` that includes methods for finding sensor data of interest; 
for ordering time-bounded datasets for those sensors; for downloading this data; and for attaching it to a data 
structure (an `xarray Dataset`) for further analysis. It is at this point when you have your data present as a 
`Dataset` that `yodapy` has completed its job. 


The next cell installs `yodapy`. Run this each time you start up this notebook server unless your installation
of the `yodapy` library persists. 


### Getting OOI credentials


To get data from OOI you first create a User account as follows:


- Visit the [OOI website](https://ooinet.oceanobservatories.org/#)
- On the login menu (upper right) select **Register**
- Fill out the New User Registration Form
- Once you have your login credentials: Log in
- The 'Login' menu should be replaced with your User name at the upper right: Also a dropdown menu
  - Use this menu to select User Profile
- At the bottom of your User Profile page you should find **API Username** and **API Token**
  - These two strings comprise your authentication 
  - Keep them somewhere safe
  - Notice that the **Refresh API Token** button permits you to regenerate them whenever you like


Use your OOI API Token with `yodapy` as described further down to automate your authentication process.
If this works as intended you can safely use OOI and not have to worry about cutting and pasting these
token strings every time you want to get data access.

## install yodapy if needed

In [None]:
# Ensure that the latest build of yodapy is installed directly from github using
!pip install git+https://github.com/cormorack/yodapy.git -q     # -q cuts the stdout clutter

# this line of code verifies yodapy is installed
from yodapy.utils.creds import set_credentials_file

## One time only: Configure OOI credentials using `yodapy`

Only the first time through here: Carefully follow the instructions in the Python cell below.
You are (temporarily) telling `yodapy` what your `OOI username` and `token` are. 
`yodapy` creates a hard-to-notice sub-directory of your home directory
that contains these credentials in a text file. As long as you are not publishing
your home directory someplace public your credentials will be hidden away.


#### 'Why am I doing this *credentials* business?'


When you use `yodapy` to order data from OOI it will use this 'hidden away' copy
of your credentials to convince OOI your order is legitimate.  

In [2]:
# Run the next line of code to create authentication credentials for the OOI data system. Do this
# by ***carefully**** substituting your actual credentials in the username and token strings
# in this line of code:


if False: 
    
    set_credentials_file(data_source='ooi', username='OOIAPI-XXXXXXXXXXXXXX', token='XXXXXXXXXXXX')


# Un-comment the code and run the cell, just the one line above.
# Once it runs: Comment it out again and delete your credentials. You can obscure them with XXXXX as they are seen now.
# After you obscure your credentials: Be sure not to run this code again as it will break your authentication info.
#
# You can verify this worked by examining the .credentials file in ~/.yodapy. The credentials should match. Notice that 
#   this (slightly hidden) directory is directly connected to your home directory; whereas this IPython notebook 
#   is presumably in a distinct directory; so there should be no chance of a GitHub push sending your 
#   credentials to GitHub. 

## Regional Cabled Array data for 2019

* 3 sites: OSB, AXB, OOE for Oregon Slope Base, Axial Base, Oregon Offshore Endurance
* 3 Platforms: Shallow and Deep profilers plus shallow platform (fixed at 200m depth)
* Large collection of instruments, each with one or more sensors
  * CTD + Dissolved Oxygen
  * PAR, Spectral Irradiance, Spectrophotometer (attenuation / absorbance), Fluorometers 
  * Nitrate, pH, pCO2
  * Ocean velocity measurement

## Initialize the `OOI()` object