![](https://github.com/simonscmap/pycmap/blob/master/docs/figures/CMAP.png?raw=true)

*Mohammad Dehghani Ashkezari <mdehghan@uw.edu>* 

*Ginger Armbrust*

*Raphael Hagen*

*Michael Denholtz*

<a href="https://colab.research.google.com/github/simonscmap/pycmap/blob/master/docs/SCOPE2019.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a><a href="https://mybinder.org/v2/gh/simonscmap/pycmap/master?filepath=docs%2FSCOPE2019.ipynb"><img align="right" src="https://mybinder.org/badge_logo.svg" alt="Open in Colab" title="Open and Execute in Binder"></a>

<a class="anchor" id="toc"></a>

## Table of Contents:
* [Installation](#installation)
* [**Data Retrieval (selected methods)**](#dataRetrieval)
    * [API](#api) 
    * [Catalog](#catalog)
    * [Search Catalog](#searchCatalog)
    * [Cruise Trajectory](#cruiseTrajectory)
    * [Subset by Space-Time](#spaceTime)
    * [Colocalize Along Cruise Track](#matchCruise)
    * [Query](#query)

* [**Data Visulization**](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_data_vizualization.html)
    * [Histogram](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_histogram.html#histogram)
    * [Time Series](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_time_series.html#timeseries)
    * [Regional Map, Contour Plot, 3D Surface Plot](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_rm_cp_3d.html#rmcp3d)
    * [Section Map, Section Contour](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_section_map_contour.html#sectionmapcontour)
    * [Depth Profile](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_depth_profile.html#depthprofile)
    * [Cruise Track](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_cruise_track.html#cruisetrackplot)
    * [Correlation Matrix Along Cruise Track](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_correlation_matrix_cruise_track.html#corrmatrixcruise)
    
* [**Case Studies**](#caseStudy)
    * [Attach Environmental Parameters to the SeaFlow Observations](#caseStudy1)
    * [Inter-Annual Variability of Eddy Induced Temperature Anomaly](#caseStudy2)


<a class="anchor" id="dataRetrieval"></a>
<br/><br/><br/><br/><br/><br/><br/><br/>
<center>
<h1> API: Data Retrieval </h1>
</center>
<br/><br/><br/><br/><br/><br/><br/><br/>

In [None]:
# enable intellisense
%config IPCompleter.greedy=True

<a class="anchor" id="installation"></a> 
<a href="#toc" style="float: right;">Table of Contents</a>
## Installation
pycmap can be installed using *pip*: 
<br />`pip install pycmap`

In order to use pycmap, you will need to obtain an API key from SimonsCMAP website:
<a href="https://simonscmap.com">https://simonscmap.com</a>.

### Note:
You may install pycmap on cloud-based jupyter notebooks (such as [Colab](https://colab.research.google.com/)) by running the following command in a code-block: 
<br />`!pip install pycmap`

In [None]:
# !pip install pycmap -q    #uncomment to install pycmap on Colab
import pycmap
pycmap.__version__

<a class="anchor" id="api"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*API( )*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_api.html#pycmapapi)
To retrieve data, we need to create an instance of the system's API and pass the API key. It is not necessary to pass the API key every time you run pycmap, because the key will be stored locally. The API class has other optional parameters to adjust its behavior. All parameters can be updated persistently at any point in the code.

Register at https://simonscmap.com and get and API key, if you haven't already.

In [None]:
api = pycmap.API()

<a class="anchor" id="catalog"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*get_catalog()*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog)

Returns a dataframe containing the details of all variables at Simons CMAP database. 
<br />This method requires no input.

In [None]:
api.get_catalog()

<a class="anchor" id="searchCatalog"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*search_catalog(keywords)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_search_catalog.html#searchcatalog)


Returns a dataframe containing a subset of Simons CMAP catalog of variables. 

All variables at Simons CMAP catalog are annotated with a collection of semantically related keywords. This method takes the passed keywords and returns all of the variables annotated with similar keywords. The passed keywords should be separated by blank space. The search result is not sensitive to the order of keywords and is not case sensitive. The passed keywords can provide any 'hint' associated with the target variables. Below are a few examples: 

* the exact variable name (e.g. NO3), or its linguistic term (Nitrate) 
* methodology (model, satellite ...), instrument (CTD, seaflow), or disciplines (physics, biology ...) 
* the cruise official name (e.g. KOK1606), or unofficial cruise name (Falkor) 
* the name of data producer (e.g Penny Chisholm) or institution name (MIT) 

<br />If you searched for a variable with semantically-related-keywords and did not get the correct results, please let us know. We can update the keywords at any point.


### Example:
Returns a list of Nitrite measurements during the Falkor cruise, if exists.

In [None]:
api.search_catalog('nitrite falkor')

<a class="anchor" id="cruiseTrajectory"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*cruise_trajectory(cruiseName)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_cruise_trajectory.html#cruise-traj)

Returns a dataframe containing the trajectory of the specified cruise.

### Example:
Returns the meso_scope cruise trajectory.
The example below passes 'scope' as cruise name. All cruises that have the term 'scope' in their name are returned and asks for more specific name.

In [None]:
api.cruise_trajectory('scope')

<a class="anchor" id="cruiseVariables"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*cruise_variables(cruiseName)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_cruise_variables.html#cruisevars)

Returns a dataframe containing all registered variables (at Simons CMAP) during the specified cruise.

### Example:
Returns a list of measured variables during the *Diel* cruise (KM1513).

In [None]:
api.cruise_variables('diel')

<a class="anchor" id="spaceTime"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*space_time(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_subset_ST.html#subset-st)

Returns a subset of data according to the specified space-time constraints (dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2).
<br />The results are ordered by time, lat, lon, and depth (if exists), respectively.

> **Parameters:** 
>> **table: string**
>>  <br />Table name (each dataset is stored in a table). A full list of table names can be found in [catalog](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog).
>> <br />
>> <br />**variable: string**
>>  <br />Variable short name which directly corresponds to a field name in the table. A subset of this variable is returned by this method according to the spatio-temporal cut parameters (below). Pass **'\*'** wild card to retrieve all fields in a table. A full list of variable short names can be found in [catalog](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog).
>> <br />
>> <br />**dt1: string**
>>  <br />Start date or datetime. This parameter sets the lower bound of the temporal cut. <br />Example values: '2016-05-25' or '2017-12-10 17:25:00'
>> <br />
>> <br />**dt2: string**
>>  <br />End date or datetime. This parameter sets the upper bound of the temporal cut. 
>> <br />
>> <br />**lat1: float**
>>  <br />Start latitude [degree N]. This parameter sets the lower bound of the meridional cut. Note latitude ranges from -90&deg; to 90&deg;.
>> <br />
>> <br />**lat2: float**
>>  <br />End latitude [degree N]. This parameter sets the upper bound of the meridional cut. Note latitude ranges from -90&deg; to 90&deg;.
>> <br />
>> <br />**lon1: float**
>>  <br />Start longitude [degree E]. This parameter sets the lower bound of the zonal cut. Note longitue ranges from -180&deg; to 180&deg;.
>> <br />
>> <br />**lon2: float**
>>  <br />End longitude [degree E]. This parameter sets the upper bound of the zonal cut. Note longitue ranges from -180&deg; to 180&deg;.
>> <br />
>> <br />**depth1: float**
>>  <br />Start depth [m]. This parameter sets the lower bound of the vertical cut. Note depth is a positive number (it is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**depth2: float**
>>  <br />End depth [m]. This parameter sets the upper bound of the vertical cut. Note depth is a positive number (it is 0 at surface and grows towards ocean floor).


>**Returns:** 
>>  Pandas dataframe.


### Example:
This example retrieves a subset of in-situ salinity measurements by [Argo floats](https://cmap.readthedocs.io/en/latest/catalog/datasets/Argo.html#argo).

In [None]:
api.space_time(
              table='tblArgoMerge_REP', 
              variable='argo_merge_salinity_adj', 
              dt1='2015-05-01', 
              dt2='2015-05-30', 
              lat1=28, 
              lat2=38, 
              lon1=-71, 
              lon2=-50, 
              depth1=0, 
              depth2=100
              ) 

<a class="anchor" id="matchCruise"></a>
<a href="#toc" style="float: right;">Table of Contents</a>

## [*along_track(cruise, targetTables, targetVars, depth1, depth2, temporalTolerance, latTolerance, lonTolerance, depthTolerance)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_match_cruise_track_datasets.html#matchcruise)

This method colocalizes a cruise trajectory with the specified target variables. The matching results rely on the tolerance parameters because these parameters set the matching boundaries between the cruise trajectory and target datasets. Please note that the number of matching entries for each target variable might vary depending on the temporal and spatial resolutions of the target variable. In principle, if the cruise trajectory is fully covered by the target variable's spatio-temporal range, there should always be matching results if the tolerance parameters are larger than half of their corresponding spatial/temporal resolutions. Please explore the [catalog](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog) to find appropriate target variables to colocalize with the desired cruise. 

<br />This method returns a dataframe containing the cruise trajectory joined with the target variable(s).

## <br/><br/>(see slides &rarr;)


> **Parameters:** 
>> **cruise: string**
>>  <br />The official cruise name. If applicable, you may also use cruise "nickname" ('Diel', 'Gradients_1' ...). <br />A full list of cruise names can be retrieved using [cruise](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog) method.
>> <br />
>> <br />**targetTables: list of string**
>>  <br />Table names of the target datasets to be matched with the cruise trajectory. Notice cruise trajectory can be matched with multiple target datasets. A full list of table names can be found in [catalog](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog).
>> <br />
>> <br />**targetVars: list of string**
>>  <br />Variable short names to be matched with the cruise trajectory. A full list of variable short names can be found in [catalog](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_catalog.html#getcatalog).
>> <br />
>> <br />**depth1: float**
>>  <br />Start depth [m]. This parameter sets the lower bound of the depth cut on the traget datasets. 'depth1' and 'depth2' allow matching a cruise trajectory (which is at the surface, hopefully!) with traget varaiables at lower depth. Note depth is a positive number (depth is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**depth2: float**
>>  <br />End depth [m]. This parameter sets the upper bound of the depth cut on the traget datasets. Note depth is a positive number (depth is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**temporalTolerance: list of int**
>> <br />Temporal tolerance values between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single integer value is given, that would be applied to all target datasets. This parameter is in day units except when the target variable represents monthly climatology data in which case it is in month units. Notice fractional values are not supported in the current version.
>> <br />
>> <br />**latTolerance: list of float or int**
>> <br />Spatial tolerance values in meridional direction [deg] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A "safe" value for this parameter can be slightly larger than the half of the traget variable's spatial resolution.
>> <br />
>> <br />**lonTolerance: list of float or int**
>> <br />Spatial tolerance values in zonal direction [deg] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A "safe" value for this parameter can be slightly larger than the half of the traget variable's spatial resolution.
>> <br />
>> <br />**depthTolerance: list of float or int**
>> <br />Spatial tolerance values in vertical direction [m] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. 

>**Returns:** 
>>  Pandas dataframe.

### Example:
Colocalizes the Gradients_1 cruise with prochloro_abundance and prokaryote_c01_darwin_clim variables from the Seaflow and Darwin (climatology) Data sets, respectively.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pycmap

api = pycmap.API()
df = api.along_track(
                    cruise='gradients_3', 
                    targetTables=['tblSeaFlow', 'tblDarwin_Nutrient_Climatology'],
                    targetVars=['prochloro_abundance', 'PO4_darwin_clim'],
                    depth1=0, 
                    depth2=5, 
                    temporalTolerance=[0, 0],
                    latTolerance=[0.01, 0.25],
                    lonTolerance=[0.01, 0.25],
                    depthTolerance=[5, 5]
                    )




################# Simple Plot #################
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
c1, c2 = 'firebrick', 'slateblue'
t1, t2 = 'tblSeaFlow', 'tblDarwin_Nutrient_Climatology'
v1, v2 = 'prochloro_abundance', 'PO4_darwin_clim'
ax1.plot(df['lat'], df[v1], 'o', color=c1, markeredgewidth=0, label='SeaFlow', alpha=0.2)
ax1.tick_params(axis='y', labelcolor='r')
ax1.set_ylabel(v1 + api.get_unit(t1, v1), color='r')
ax2.plot(df['lat'], df[v2], 'o', color=c2, markeredgewidth=0, label='Darwin', alpha=0.2)
ax2.tick_params(axis='y', labelcolor='b')
ax2.set_ylabel(v2 + api.get_unit(t2, v2), color='b')
ax1.set_xlabel('Latitude')
fig.tight_layout()
plt.show()

<a class="anchor" id="query"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*query(query)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_query.html#query)
<br />Simons CMAP datasets are hosted in a SQL database and pycmap package provides the user with a number of pre-developed methods to extract and retrieve subsets of the data. The rest of this documentation is dedicated to explore and explain these methods. In addition to the pre-developed methods, we intend to leave the database open to custom scan queries for interested users. This method takes a custom SQL query statement and returns the results in form of a Pandas dataframe. The full list of table names and variable names (fields) can be obtained using the [get_catalog()](Catalog.ipynb) method. In fact, one may use this very method to retrieve the table and field names: `query('EXEC uspCatalog')`. A Dataset is stored in a table and each table field represents a variable. All data tables have the following fields:

* [time] [date or datetime] NOT NULL,
* [lat] [float] NOT NULL,
* [lon] [float] NOT NULL,
* [depth] [float] NOT NULL,

### Note:
Tables which represent a climatological dataset, such as 'tblDarwin_Nutrient_Climatology', will not have a 'time' field. Also, if a table represents a surface dataset, such as satellite products, there would be no 'depth' field. 'depth' is a positive number in meters unit; it is zero at the surface growing towards the ocean's floor. 'lat' and 'lon' are in degrees units, ranging from -90&deg; to 90&deg; and -180&deg; to 180&deg;, respectively.

<br />Please keep in mind that some of the datasets are massive in size (10s of TB), avoid queries without WHERE clause (`SELECT * FROM TABLENAME`). Always try to add some constraints on time, lat, lon, and depth fields (see the basic examples below). 

<br/>Moreover, the database hosts a wide range of predefined stored procedures and functions to streamline nearly all CMAP data services. For instance retrieving the catalog information is achieved using a single call of this procedure: *uspCatalog*. These predefined procedures can be called using the pycmap package (see example below). Alternatively, one may use any SQL client to execute these procedures to retrieve and visualize data (examples: [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-server-ver15), or [Plotly Falcon](https://plot.ly/free-sql-client-download/)). Using the predefined procedures all CMAP data services are centralized at the database layer which dramatically facilitates the process of developing apps with different programming languages (pycmap, web app, cmap4r, ...). Please note that you can improve the current procedures or add new procedures by contributing at the [CMAP database repository](https://github.com/simonscmap/DB). 
Below is a selected list of stored procedures and functions, their arguments will be described in more details subsequently:



* uspCatalog
* uspSpaceTime
* uspTimeSeries
* uspDepthProfile
* uspSectionMap
* uspCruises
* uspCruiseByName
* uspCruiseBounds
* uspWeekly
* uspMonthly
* uspQuarterly
* uspAnnual
* uspMatch
* udfDatasetReferences
* udfMetaData_NoRef





<br />Happy SQL Injection!
<br />
<br />
<br />

### Example:
A sample stored procedure returning the list of all cruises hosted by Simons CMAP.

In [None]:
api.query('EXEC uspCruises')

### Example:
A sample query returning the timeseries of sea surface temperature (sst).

In [None]:
api.query(
         '''
         SELECT [time], AVG(lat) AS lat, AVG(lon) AS lon, AVG(sst) AS sst FROM tblsst_AVHRR_OI_NRT
         WHERE
         [time] BETWEEN '2016-06-01' AND '2016-10-01' AND
         lat BETWEEN 23 AND 24 AND
         lon BETWEEN -160 AND -158
         GROUP BY [time]
         ORDER BY [time]
         '''
         )

<a class="anchor" id="caseStudy"></a>
<br/><br/><br/><br/><br/><br/><br/><br/>
<center>
<h1> Study Cases </h1>
</center>
<br/><br/><br/><br/><br/><br/><br/><br/>

<a class="anchor" id="caseStudy1"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## (see slides &rarr;)<br/><br/>
## Case Study 1: <br/><br/> Attach Environmental Parameters to the SeaFlow Observations 

In this study, we take all seaflow cruises (approximately 35 cruises) and colocalize them with 50+ environmental variables. The idea is to identify the highly-correlated environmental variables (correlated with seaflow abundances). These variables then serve as predictors of machine learning algorithms that capture the seaflow variations. The trained machine learning models then are used to generate spatial maps of pico-phytoplanktons (Prochlorococcus, Synechococcus, and pico-Eukaryotes).   

In [None]:
"""
Author: Mohammad Dehghani Ashkezari <mdehghan@uw.edu>

Date: 2019-08-13

Function: Colocalizes tens of variables along-track of cruises with underway Seaflow measurements.
"""

%%time
import os
import pycmap
from collections import namedtuple
import pandas as pd


def all_cruises(api):
    """
    Returns a list of seaflow cruises, excluding the AMT cruises.
    """
    cruises = api.cruises().Name
    return list(cruises[~cruises.str.contains("AMT")])


def match_params():
    """
    Creates a collection of variables (and their tolerances) to be colocalized along the cruise trajectory.
    """
    Param = namedtuple('Param', ['table', 'variable', 'temporalTolerance', 'latTolerance', 'lonTolerance', 'depthTolerance'])
    params = []
    params.append(Param('tblSeaFlow', 'prochloro_abundance', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'prochloro_diameter', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'prochloro_carbon_content', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'prochloro_biomass', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'synecho_abundance', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'synecho_diameter', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'synecho_carbon_content', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'synecho_biomass', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'picoeuk_abundance', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'picoeuk_diameter', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'picoeuk_carbon_content', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'picoeuk_biomass', 0, 0.1, 0.1, 5))
    params.append(Param('tblSeaFlow', 'total_biomass', 0, 0.1, 0.1, 5))

    ######## Ship Data (not calibrated)
    params.append(Param('tblCruise_Salinity', 'salinity', 0, 0.1, 0.1, 5))
    params.append(Param('tblCruise_Temperature', 'temperature', 0, 0.1, 0.1, 5))

    ######## satellite
    params.append(Param('tblSST_AVHRR_OI_NRT', 'sst', 1, 0.25, 0.25, 5))
    params.append(Param('tblSSS_NRT', 'sss', 1, 0.25, 0.25, 5))
    params.append(Param('tblCHL_REP', 'chl', 4, 0.25, 0.25, 5))
    params.append(Param('tblModis_AOD_REP', 'AOD', 15, 1, 1, 5))
    params.append(Param('tblAltimetry_REP', 'sla', 1, 0.25, 0.25, 5))
    params.append(Param('tblAltimetry_REP', 'adt', 1, 0.25, 0.25, 5))
    params.append(Param('tblAltimetry_REP', 'ugos', 1, 0.25, 0.25, 5))
    params.append(Param('tblAltimetry_REP', 'vgos', 1, 0.25, 0.25, 5))

    ######## model
    params.append(Param('tblPisces_NRT', 'Fe', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'NO3', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'O2', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'PO4', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'Si', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'PP', 4, 0.5, 0.5, 5))
    params.append(Param('tblPisces_NRT', 'CHL', 4, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'NH4_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'NO2_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'SiO2_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'DOC_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'DON_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'DOP_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'DOFe_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'PIC_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'ALK_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Nutrient_Climatology', 'FeT_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Plankton_Climatology', 'prokaryote_c01_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Plankton_Climatology', 'prokaryote_c02_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Plankton_Climatology', 'picoeukaryote_c03_darwin_clim', 0, 0.5, 0.5, 5))
    params.append(Param('tblDarwin_Plankton_Climatology', 'picoeukaryote_c04_darwin_clim', 0, 0.5, 0.5, 5))

    ####### World Ocean Atlas (WOA)
    params.append(Param('tblWOA_Climatology', 'density_WOA_clim', 0, .75, .75, 5))
    params.append(Param('tblWOA_Climatology', 'nitrate_WOA_clim', 0, 0.75, 0.75, 5))
    params.append(Param('tblWOA_Climatology', 'phosphate_WOA_clim', 0, 0.75, 0.75, 5))
    params.append(Param('tblWOA_Climatology', 'silicate_WOA_clim', 0, 0.75, 0.75, 5))
    params.append(Param('tblWOA_Climatology', 'oxygen_WOA_clim', 0, 0.75, 0.75, 5))
    params.append(Param('tblWOA_Climatology', 'salinity_WOA_clim', 0, 0.75, 0.75, 5))

    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = [], [], [], [], [], []
    for i in range(len(params)):
        tables.append(params[i].table)
        variables.append(params[i].variable)
        temporalTolerance.append(params[i].temporalTolerance)
        latTolerance.append(params[i].latTolerance)
        lonTolerance.append(params[i].lonTolerance)
        depthTolerance.append(params[i].depthTolerance)
    return tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance




def main():
    api = pycmap.API()
    cruises = all_cruises(api)
    cruises = ['KOK1606']   # limiting to only one cruise (for presentation)
    exportDir = './export/'
    if not os.path.exists(exportDir): os.makedirs(exportDir) 
    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = match_params()
    df = pd.DataFrame({})
    for cruise in cruises:
        print('\n********************************')
        print('Preparing %s cruise...' % cruise)
        print('********************************\n')
        data = api.along_track(
                              cruise=cruise,     
                              targetTables=tables,
                              targetVars=variables,
                              temporalTolerance=temporalTolerance, 
                              latTolerance=latTolerance, 
                              lonTolerance=lonTolerance, 
                              depthTolerance=depthTolerance,
                              depth1=0,
                              depth2=5
                              )
        if len(df) < 1:
            df = data
        else:
            df = pd.concat([df, data], ignore_index=True)
        data.to_csv('%s%s.csv' % (exportDir, cruise), index=False)
    df.to_csv('%ssfMatch.csv' % exportDir, index=False)      
    return df    
    
    
    
    

##############################
#                            #
#           main             #
#                            #
##############################


if __name__ == '__main__':
    df = main()
    

<a class="anchor" id="caseStudy2"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## (see slides &rarr;)<br/><br/>
## Case Study 2: <br/><br/>Inter-Annual Variability of Eddy Induced Temperature Anomaly 
In this example, we iteratively retrieve daily eddy locations and colocalize them with satellite and model variables (SST, CHL, SLA, and NO3). To infer the eddy induced effects we also compute an estimate of the local background. Subtracting the background field from that of eddy domain results in the eddy induced effects. For demonstration purposes, the script below is limited to a small region within a one-day period (see the root of the script).

In [None]:
"""
Author: Mohammad Dehghani Ashkezari <mdehghan@uw.edu>

Date: 2019-11-01

Function: Colocalize (match) eddy data set with a number of satellite & model variables (e.g. SST, CHL, NO3, etc ...).
"""

%%time
import os
import pycmap
from collections import namedtuple
import pandas as pd
from datetime import datetime, timedelta, date




def sparse_dates(y1, y2, m1, m2, d1, d2):
    dts = []
    for y in range(y1, y2+1):
        for m in range(m1, m2+1):
            for d in range(d1, d2+1):
                dts.append(datetime(y, m, d))
    return dts            


def eddy_time_range(api):
    """
    Returns the start-date and end-date of the eddy dataset.
    """
    query = "SELECT min([time]) AS min_time, max([time]) max_time FROM tblMesoscale_Eddy"
    df = api.query(query)
    dt1 = datetime.strptime(df.loc[0, 'min_time'], '%Y-%m-%dT%H:%M:%S.000Z')
    dt2 = datetime.strptime(df.loc[0, 'max_time'], '%Y-%m-%dT%H:%M:%S.000Z')
    return [dt1 + timedelta(days=x) for x in range((dt2-dt1).days + 1)]


def daily_eddies(api, day, lat1, lat2, lon1, lon2):
    """
    Returns eddies at a given date (day) delimited by the spatial parameters (lat1, lat2, lon1, lon2).
    """
    query = """
            SELECT * FROM tblMesoscale_Eddy 
            WHERE 
            [time]='%s' 
            AND
            lat BETWEEN %f AND %f AND
            lon BETWEEN %f AND %f
            """ % (day, lat1, lat2, lon1, lon2)
    return api.query(query)


def match_covariate(api, table, variable, dt1, dt2, lat, del_lat, lon, del_lon, depth, del_depth):
    """
    Returns the mean and standard-deviation of variable within the eddy domain and with the background field.
    """

    def has_depth(table):
        return table in ['tblPisces_NRT', 'tblDarwin_Nutrient', 'tblDarwin_Ecosystem', 'tblDarwin_Phytoplankton']
    query = "SELECT AVG(%s) AS %s, STDEV(%s) AS %s FROM %s " % (variable, variable, variable, variable+'_std', table)
    query += "WHERE [time] BETWEEN '%s' AND '%s' AND " % (dt1, dt2)
    query += "[lat] BETWEEN %f AND %f AND " % (lat-del_lat, lat+del_lat)
    query += "[lon] BETWEEN %f AND %f " % (lon-del_lon, lon+del_lon)
    if has_depth(table):   
        query += " AND [depth] BETWEEN %f AND %f " % (depth-del_depth, depth+del_depth)
    try:
        signal = api.query(query)
    except:
        return None, None, None, None    

    outer, inner = 4, 2
    query = "SELECT AVG(%s) AS %s, STDEV(%s) AS %s FROM %s " % (variable, variable+'_bkg', variable, variable+'_bkg_std', table)
    query += "WHERE [time] BETWEEN '%s' AND '%s' AND " % (dt1, dt2)
    query += "[lat] BETWEEN %f AND %f AND " % (lat-outer*del_lat, lat+outer*del_lat)
    query += "[lat] NOT BETWEEN %f AND %f AND " % (lat-inner*del_lat, lat+inner*del_lat)
    query += "[lon] BETWEEN %f AND %f AND " % (lon-outer*del_lon, lon+outer*del_lon)
    query += "[lon] NOT BETWEEN %f AND %f " % (lon-inner*del_lon, lon+inner*del_lon)
    if has_depth(table):    
        query += "AND [depth] BETWEEN %f AND %f " % (depth-del_depth, depth+del_depth)
    try:     
        background = api.query(query)
    except:
        return None, None, None, None    
    sig, sig_bkg = None, None
    try:
        if len(signal)>0: sig, sig_bkg = signal.loc[0, variable], signal.loc[0, variable+'_std']
    except:
        sig, sig_bkg = None, None    
    bkg, bkg_std = None, None
    try:
        if len(background)>0: bkg, bkg_std = background.loc[0, variable+'_bkg'], background.loc[0, variable+'_bkg_std']
    except:
        bkg, bkg_std = None, None    
    return sig, sig_bkg, bkg, bkg_std 




def match_params():
    """
    Prepares a list variables (and their associated tolerances) to be colocalized with eddies.
    """
    Param = namedtuple('Param', ['table', 'variable', 'temporalTolerance', 'latTolerance', 'lonTolerance', 'depthTolerance'])
    params = []

    ######## satellite
    params.append(Param('tblSST_AVHRR_OI_NRT', 'sst', 0, 0.5, 0.5, 5))
#     params.append(Param('tblSSS_NRT', 'sss', 0, 0.5, 0.5, 5))
    params.append(Param('tblCHL_REP', 'chl', 4, 0.5, 0.5, 5))
#     params.append(Param('tblModis_AOD_REP', 'AOD', 15, 1, 1, 5))
    params.append(Param('tblAltimetry_REP', 'sla', 0, 0.5, 0.5, 5))

    ######## model
    params.append(Param('tblPisces_NRT', 'NO3', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'Fe', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'O2', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'PO4', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'Si', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'PP', 4, 0.5, 0.5, 5))
#     params.append(Param('tblPisces_NRT', 'CHL', 4, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Nutrient', 'PO4', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Nutrient', 'SiO2', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Nutrient', 'O2', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Ecosystem', 'phytoplankton', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Ecosystem', 'zooplankton', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Ecosystem', 'CHL', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Ecosystem', 'primary_production', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Phytoplankton', 'diatom', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Phytoplankton', 'coccolithophore', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Phytoplankton', 'picoeukaryote', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Phytoplankton', 'picoprokaryote', 2, 0.5, 0.5, 5))
#     params.append(Param('tblDarwin_Phytoplankton', 'mixotrophic_dinoflagellate', 2, 0.5, 0.5, 5))


    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = [], [], [], [], [], []
    for i in range(len(params)):
        tables.append(params[i].table)
        variables.append(params[i].variable)
        temporalTolerance.append(params[i].temporalTolerance)
        latTolerance.append(params[i].latTolerance)
        lonTolerance.append(params[i].lonTolerance)
        depthTolerance.append(params[i].depthTolerance)
    return tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance



def main(y1, y2, m1, m2, d1, d2, edd_lat1, edd_lat2, edd_lon1, edd_lon2):
    """
    Instantiates the API class and using the 'match_covariate()' function colocalizes the retrieved eddies 
    with the specified variables. 
    """
    api = pycmap.API()
    daysDir = './export/eddy/days/'
    if not os.path.exists(daysDir): os.makedirs(daysDir) 

    days = sparse_dates(y1, y2, m1, m2, d1, d2)
    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = match_params()

    for day_ind, day in enumerate(days):
        eddies = daily_eddies(api, str(day), edd_lat1, edd_lat2, edd_lon1, edd_lon2)
        eddies['time'] = pd.to_datetime(eddies['time'])
        for variable in variables:
            eddies[variable] = None
            eddies[variable+'_std'] = None
            eddies[variable+'_bkg'] = None
            eddies[variable+'_bkg_std'] = None
        print('Day %s:  %d / %d' % (str(day), day_ind+1, len(days)))
        for e in range(len(eddies)):
            print('\tEddy %d / %d' % (e+1, len(eddies)))
            for i in range(len(variables)):
                # print('\t\t%d. Matching %s' % (i+1, variables[i]))
                dt1 = str(eddies.loc[e, 'time'] + timedelta(days=-temporalTolerance[i]))
                dt2 = str(eddies.loc[e, 'time'] + timedelta(days=temporalTolerance[i]))
                lat, del_lat = eddies.loc[e, 'lat'], latTolerance[i]
                lon, del_lon = eddies.loc[e, 'lon'], lonTolerance[i]
                depth, del_depth = 0, depthTolerance[i]
                v, v_std, bkg, bkg_std = match_covariate(api, tables[i], variables[i], dt1, dt2, lat, del_lat, lon, del_lon, depth, del_depth)
                eddies.loc[e, variables[i]] = v
                eddies.loc[e, variables[i]+'_std'] = v_std
                eddies.loc[e, variables[i]+'_bkg'] = bkg 
                eddies.loc[e, variables[i]+'_bkg_std'] = bkg_std
        eddies.to_csv(daysDir+str(day.date())+'.csv', index=False)
    return eddies




    
    

##############################
#                            #
#           main             #
#                            #
##############################




if __name__ == '__main__':
    ### time window
    y1, y2 = 2014, 2014
    m1, m2 = 1, 1
    d1, d2 = 1, 1
    ### spatial range
    edd_lat1, edd_lat2 = 20, 30
    edd_lon1, edd_lon2 = -160, -150
    eddies = main(y1, y2, m1, m2, d1, d2, edd_lat1, edd_lat2, edd_lon1, edd_lon2)
    


<a class="anchor" id="studyCase3"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## Study Case 3: Colocalize Eddies with Seaflow (slides &rarr;)

In [None]:
"""
Author: Mohammad Dehghani Ashkezari <mdehghan@uw.edu>

Date: 2019-10-22

Function: Colocalize (match) eddy data set with seaflow variables.
"""

import os
import pycmap
from collections import namedtuple
import pandas as pd



def match_params():
    """Creates a collection of variables (and their tolerances) to be colocalized with the seaflow data set."""
    Param = namedtuple('Param', ['table', 'variable', 'temporalTolerance', 'latTolerance', 'lonTolerance', 'depthTolerance'])
    params = []
    ######## other seaflow
    params.append(Param('tblSeaFlow', 'prochloro_abundance', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'prochloro_diameter', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'prochloro_carbon_content', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'prochloro_biomass', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'synecho_abundance', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'synecho_diameter', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'synecho_carbon_content', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'synecho_biomass', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'picoeuk_abundance', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'picoeuk_diameter', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'picoeuk_carbon_content', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'picoeuk_biomass', 1, 0.5, 0.5, 0))
#     params.append(Param('tblSeaFlow', 'total_biomass', 1, 0.5, 0.5, 0))

    ######## eddy vars
    params.append(Param('tblMesoscale_Eddy', 'eddy_polarity', 1, 0.5, 0.5, 0))
    params.append(Param('tblMesoscale_Eddy', 'eddy_age', 1, 0.5, 0.5, 0))
#     params.append(Param('tblMesoscale_Eddy', 'eddy_radius', 1, 0.5, 0.5, 0))
#     params.append(Param('tblMesoscale_Eddy', 'eddy_A', 1, 0.5, 0.5, 0))
#     params.append(Param('tblMesoscale_Eddy', 'eddy_U', 1, 0.5, 0.5, 0))
#     params.append(Param('tblMesoscale_Eddy', 'track', 1, 0.5, 0.5, 0))

    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = [], [], [], [], [], []
    for i in range(len(params)):
        tables.append(params[i].table)
        variables.append(params[i].variable)
        temporalTolerance.append(params[i].temporalTolerance)
        latTolerance.append(params[i].latTolerance)
        lonTolerance.append(params[i].lonTolerance)
        depthTolerance.append(params[i].depthTolerance)
    return tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance



def main():
    api = pycmap.API()
    days = api.query("SELECT DISTINCT CONVERT(DATE, [time]) [time] FROM tblSeaFlow where [time]<='2018-01-17' ORDER BY [time] DESC")['time']
    tables, variables, temporalTolerance, latTolerance, lonTolerance, depthTolerance = match_params()
    df = pd.DataFrame({})
    exportDir = './export/eddy/'
    if not os.path.exists(exportDir): os.makedirs(exportDir) 
    for ind, day in enumerate(days):
        day = day.split('T')[0]
        print('\n********************************')
        print('Preparing %s (%d / %d) ...' % (day, ind, len(days)))
        print('********************************\n')
        data = api.match(
                         sourceTable='tblSeaFlow',
                         sourceVar='',   
                         targetTables=tables,
                         targetVars=variables,
                         dt1=day,
                         dt2=day,
                         lat1=-90,
                         lat2=90,
                         lon1=-180,
                         lon2=180,
                         depth1=0,
                         depth2=5,
                         temporalTolerance=temporalTolerance, 
                         latTolerance=latTolerance, 
                         lonTolerance=lonTolerance, 
                         depthTolerance=depthTolerance
                         )
        if len(df) < 1:
            df = data
        else:
            df = pd.concat([df, data], ignore_index=True)    
        data.to_csv('%s%s.csv' % (exportDir, day), index=False)
    df.to_csv('%sedMatch.csv' % exportDir, index=False)    
    
    
    
    

##############################
#                            #
#           main             #
#                            #
##############################


if __name__ == '__main__':
    main()
    