# Introduction




These notebooks tell *data interpretation* stories. 


While data interpretation is the main focus, the mechanics of bringing the 
data into view are present here as well. Python code is relegated to modules
with names like `NotebookModule.py`. This streamlines the notebooks for reading 
while keeping (hopefully instructive) code close by.


Data stories: The starting point is the "shallow profiler". From there: Expand further
into the OOI sensor diaspora; and then introduce datasets from other programs. Other
programs include particularly ARGO, MODIS and the ROMS model.


The initial focus on the cabled array shallow profiler needs a link added here. 
Shallow profilers are really 
two connected entities: The **profiler** and its supporting **platform**. 



## Science questions


The following questions and fragmentary 'answers' are jotted down to begin framing
data interpretation stories. Suggestion: Read them as if -- to your sudden astonishment --
you found yourself able to hear the thoughts of an oceanographer riding the same bus as you.



If we start at the ocean surface at noon and swim down 200 meters: What happens
to the sunlight? Answer: It dwindles to nothing; very dark.



How deep is the ocean? 3700 meters on average.



Does this mean that 35/37ths of the **entire ocean** is just *dark* all the time? Answer: Yes.
Unless one brings a flashlight... or if there are bioluminescent fish... or an erupting volcano happens
to be nearby. Mostly it is dark.



What about temperature? Answer: It gets colder as you go down... but never so cold that it freezes.



What about salinity? Answer: The water gets slightly saltier as you go down.



What about oxygen dissolved in the water? Answer: Hmmmmm... not sure.



What about carbon dioxide? Answer: Not sure. I think this is complicated by 
what happens to carbon dioxide when it dissolves into sea water. (This is called
*carbonate chemistry*.)



What about chlorophyll? Where is that found? Answer: Solar-powered algae are the ocean's version
of plants; and for energy just as on land these algae need sunlight. So they are going to prefer to
live up above 200 meters depth: where there is light. So as a *guess*: Chlorophyll 
is at a maximum near the surface. (They use chlorophyll to store energy in the form of sugar.)



What about particulate backscatter? Fluorescent organic molecules? pH? Nitrates? Isn't iron important?
Answer: I'd like to buy another vowel. 



How do these "upper 200 meters" profiles of the ocean change over the course of a single day?
How do they change over a month? A season? A year? Seven years?



How do these profiles change on the continental shelf? At the subduction boundary? Further offshore in the deep ocean? 



Rather than diving and rising at a single location: If we use an autonomous glider to run transect profiles
back and forth: Does this not create a kind of "wall of data"?  How would such a data wall change our picture of the 
water column? 



How can satellite data expand the view we have of surface waters to more of a map-like representation?



Can surface water maps of temperature or color be augmented to include maps of variations in sea surface height?
Answer: Yes. Question: What do those tell us?



Can we extend the characterization of the water column downward below 200 meters? Can we extend it to other locations?
Answer: Yes, particularly by means of an observation program called ARGO. 



Can we explain variations in surface salinity in terms of nearby estuary discharge and coastal marine currents?



This and related questions about surface water conditions suggest an entire field of study: How does 
the surface of ocean interact with the bottom layer of earth's atmosphere?



Supposing we can identify freshwater pulses offshore in surface layers: Do those freshwater masses have 
distinctive signatures indicating their history of origination on dry land? (Such water is called 'terrigenous'.)



Can we identify upwelling events along continental margins? 



Can we compare upwelling signals at a given location to terrigenous signals? 



Is it possible to identify and characterize smaller ('dissolved') organic carbon complexes in relation to
plankton ecosystems?




## Context of OOI



### What is OOI?


OOI stands for Ocean Observatories Initiative. It is a collection of **arrays**,
localized collections of sensors and supporting infrastructure. A single array 
spans an area of typically several thousand square kilometers.
There are seven of these arrays in total in OOI: Five in the 
northern hemisphere and two in the southern ocean. The southern ocean arrays operated and gathered data for 
a few years starting in 2014; but those sensor deployments have been discontinued. The northern hemisphere arrays are 
called (in no particular order) the Regional Cabled Array,
Global Station Papa Array, Coastal Endurance Array, Coastal Pioneer Array and Global Irminger Sea Array.
Each has a unique location(s), history and relevance to global ocean research.



### What is the agenda of this repository?


Demonstrate data interpretation; first as a **technical process** and
second in terms of ocean science. 



### What are some technical details to watch out for?


#### Open and subset a NetCDF data file via the `xarray Dataset`  


Data provided by OOI tends to be "not ready for use". There are several steps needed; and
these are not automated. They require some interactive thought and refinement. 


- Convert the principal dimension from `obs` or `row` to `time` 
    - `obs/row` are generic terms with values running 1, 2, 3... (hinders combining files into longer time series)
- Re-name certain data variables for easier use; and delete anything that is not of interest
- Identify the time range of interest
- Write a specific subset file
    - For example: Subset files that are small can live within the repo


```
# This code runs 'one line at a time' (not as a block) to iteratively streamline the data

#   Suggestion: Pay particular attention to the construct ds = ds.some_operation(). This ensures 
#     that the results of some_operation() are retained in the new version of the Dataset. 

ds = xr.open_dataset(filename)
ds                                         # notice the output will show dimension as "row" and "time" as a data variable


ds = ds.swap_dims({'row': 'time'})         # moves 'time' into the dimension slot
ds = ds.rename({'some_ridiculously_long_data_variable_name':'temperature'})
ds = ds.drop('some_data_variable_that_has_no_interest_at_this_point')


ds = ds.dropna('time')                     # if any data variable value == 'NaN' this entry is deleted: Includes all
                                           #   corresponding data variable values, corresponding coordinates and 
                                           #   the corresponding dimension value. This enables plotting data such
                                           #   as pH that happens to be rife with NaNs. 

ds.z.plot()                                # this produces a simple chart showing gaps in the data record
ds.somedata.mean()                         # prints the mean of the given data variable

ta0 = dt64_from_doy(2021, 60)              # these time boundaries are set iteratively...
ta1 = dt64_from_doy(2021, 91)              #   ...to focus in on a particular time range with known data...
ds.sel(time=slice(ta0,  ta1)).z.plot()     #   ...where this plot is the proof


ds.sel(time=slice(ta0,  ta1)).to_netcdf(outputfile)           # writes a time-bounded data subset to a new NetCDF file
```

### Depth and time


Datasets have a depth attribute `z` and a time dimension `time`. These are derived by the data 
system and permit showing sensor values (like temperature) either in terms of depth below the 
surface; or in time relative to some benchmark. 

### Some complicating data features


- Some signals may have dropouts: Missing data is usually flagged as `NaN`
    - See the section above on using the xarray `.dropna(dimension)` feature to clean this up
- Nitrate data also features ***dark sample*** data
- Spectrophotometer instruments measure both ***optical absorption*** and ***beam attenuation***
    - For both of these about 82 individual channel values are recorded
        - Each channel is centered at a unique wavelength in the visible spectrum
        - The wavelength channels are separated by about 4 nm
        - The data are noisy
        - Some channels contain no data
    - Sampling frequency needed
- Spectral irradiance carries seven channels (wavelengths) of data
- Current measurements give three axis results: north, east, up
    - ADCP details needed




### Data product levels


The 
[OOI Data Catalog Documentation](https://dataexplorer.oceanobservatories.org/help/overview.html#data-products) 
describes three levels of data product, summarized: 


* Level 1 ***Instrument deployment***: Unprocessed, parsed data parameter that is in instrument/sensor 
units and resolution. See note below defining a *deployment*. This is not data we are interested in using, as a rule.


* Level 1+ ***Full-instrument time series***: A join of recovered and telemetered 
streams for non-cabled instrument deployments. For high-resolution cabled and recovered data, this product is 
binned to 1-minute resolution to allow for efficient visualization and downloads for users that do not need 
the full-resolution, gold copy (Level 2) time series. We'd like to hold out for 'gold standard'.


* Level 2 ***Full-resolution, gold standard time series***: The calibrated full-resolution dataset 
(scientific units). L2 data have been processed, pre-built, and served 
from the OOI system to the 
[OOI Data Explorer](https://dataexplorer.oceanobservatories.org/)
and to Users. The mechanisms are THREDDS and ERDDAP; file format  
NetCDF-CF. There is one file for every instrument, stream, and deployment.  For more refer to this
[Data Download](https://dataexplorer.oceanobservatories.org/help/overview.html#download-data-map-overview) link.



## OOI terminology



- **instrument**: A physical device with one or more sensors.
- **stream**: Sensor data.
- **deployment**: The act of putting infrastructure in the water, or the length of 
time between a platform going in the water and being recovered and brought back to shore.There are 
multiple deployment files per instrument. 



## Shallow profiler overview



The deployments run 2015 to present; with intermittent interruptions due to servicing and *force majeure*. 
There are three shallow profilers in the cabled array:


1. 500m depth: Outer edge of the continental shelf off of central Oregon
2. 3100m depth: Further out at the base of the continental shelf off central Oregon
3. 2100m depth: At the Juan de Fuca plate boundary at the base of Axial Seamount



### Platform


The shallow profiler **platform** is tethered to
the sea floor by means of two long cables. It is positively buoyant and sits 200 meters below 
the surface. The platform has both power and a data connection back to shore. 


### Profiler


The shallow profiler **profiler** rests in a cradle on the **platform**. 
It is also positively buoyant. Under normal circumstances
this profiler is allowed to rise to near the surface (depth of approximately 10 meters) nine times each day. 
This is accomplished by means of a single cable on a winch. As the profiler ascends its "upward facing"
sensors acquire data. Once the profiler reaches the top of the profile it is winched back down again. 


Mean time in minutes for...

```
Ascent:    67
Descent:   45      (exception: local noon and midnight descents are about an hour longer)
Rest:      45
```


#### Shallow profiler + platform data considerations


Much of the data are acquired by instruments / sensors on a continuous basis. This means that 
data acquired on the *platform* can be compared to parked *profiler* data. (See related notebook.)
Other data are acquired only at particular parts of the profiling sequence.


The profiler
rises through and disturbs the water on ascent; and then is winched back down. Ascent data are
considered more pristine; although pH and pCO2 are unique in that they are recorded on *descent*.


The table below shows data available from the profiler and its 200m-depth retaining platform.
Pressure, density, salinity, temperature and depth are interrelated. In particular, pressure 
in decibars and depth in meters are very nearly the same. Charts of sensor value against depth 
effectively treat profiles as "instantaneous" snapshots of upper water column. 


To help economize the code I have implemented single-letter codes for the various data types.


```
Temporary fixed width font section should be replaced with a table


Abbrev   Instrument / sensor                     Profiler                         Platform
Letter
PA       CTD / Pressure                          Yes                              Yes
K        CTD / Density                           Yes                              Yes
Z        CTD / Depth                             Yes                              Yes
T        CTD / Temperature                       Yes                              Yes
S        CTD / Salinity                          Yes                              Yes
O        CTD / Dissolved Oxygen                  Yes                              ?
A        Fluorometer / Chlorophyll-A             Yes                              ?
B        Fluorometer / Backscatter               Yes                              ?
C        Fluorometer / CDOM / FDOM               Yes                              ?
H        pH                                      Yes                              No
R        pCO2                                    Yes                              ?
N        nitrate                                 Yes                              ?
ND       nitrate - dark counts                   Yes                              ?
P        PAR                                     Yes                              ?
L        Spectrophotometer: 83 channels          Yes                              No
I        Spectral Irradiance: 7 channels         Yes                              ?
UE,UN,UU Current east / north / up               Yes                              No
DE,DN,DU ADCP                                    No                               Yes
```