![](https://github.com/simonscmap/pycmap/blob/master/docs/figures/CMAP.png?raw=true)

*Mohammad D. Ashkezari*

eScience Institute, Jan 2023



<br/><br/><br/>

<a href="https://colab.research.google.com/github/simonscmap/Workshops/blob/master/eScience_Jan2023/eScience_Jan2023.ipynb"><img align="left" src="colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>


<a class="anchor" id="toc"></a>

## Table of Contents:
* [Installation](#installation)
* [**Data Retrieval (selected methods)**](#dataRetrieval)
    * [API](#api) 
    * [Catalog](#catalog)
    * [Search Catalog](#searchCatalog)
    * [List of Cruises](#cruises)
    * [Cruise Trajectory](#cruiseTrajectory)
    * [Retrieve Dataset](#getDataset)
    * [Subset by Space-Time](#spaceTime)
    * [Colocalize](#matchCruise)   
    * [List of Pre-Colocalized Datasets](#datasetsWithAncillary) 
    * [Retrieve Dataset With Pre-Colocalized Data](#getDatasetWithAncillary)
    * [Dynamic Climatology](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_climatology.html#climatology)
    * [Custom SQL Query](#query)
    
    

* [**Data Visulization (selected methods)**](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_data_vizualization.html)
    * [Histogram](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_histogram.html#histogram)
    * [Time Series](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_time_series.html#timeseries)
    * [Regional Map, Contour Plot, 3D Surface Plot](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_rm_cp_3d.html#rmcp3d)
    * [Section Map, Section Contour](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_section_map_contour.html#sectionmapcontour)
    * [Depth Profile](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_depth_profile.html#depthprofile)
    * [Cruise Track](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_vizualization/pycmap_cruise_track.html#cruisetrackplot)    
    


<br/><br/><br/><br/>
## See Docomentation For More:

In [None]:
from IPython.display import IFrame
IFrame("https://cmap.readthedocs.io/en/latest/user_guide/API_ref/api_ref.html", width=1400, height=1000)

<a class="anchor" id="dataRetrieval"></a>
<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>
<center>
<h1> API: Data Retrieval </h1>
</center>
<br/><br/><br/><br/><br/><br/><br/><br/>

<a class="anchor" id="installation"></a> 
<a href="#toc" style="float: right;">Table of Contents</a>
## Installation
pycmap can be installed using *pip*: 
<br />`pip install pycmap`

In order to use pycmap, you will need to obtain an API key from SimonsCMAP website:
<a href="https://simonscmap.com">https://simonscmap.com</a>.

### Note:
You may install pycmap on cloud-based jupyter notebooks (such as [Colab](https://colab.research.google.com/)) by running the following command in a code-block: 
<br />`!pip install pycmap`

In [None]:
# !pip install pycmap -q    #uncomment to install pycmap on Colab
import pycmap
pycmap.__version__

<a class="anchor" id="api"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*API( )*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_api.html#pycmapapi)
To retrieve data, we need to create an instance of the system's API and pass the API key. It is not necessary to pass the API key every time you run pycmap, because the key will be stored locally. The API class has other optional parameters to adjust its behavior. All parameters can be updated persistently at any point in the code.

Register at https://simonscmap.com and get and API key, if you haven't already.

In [None]:
api = pycmap.API(token="YOUR_KEY")

<a class="anchor" id="catalog"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*get_catalog()*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_catalog.html#getcatalog)

Returns a dataframe containing the details of all variables at Simons CMAP database. 
<br />This method requires no input.

In [None]:
api.get_catalog()

<a class="anchor" id="searchCatalog"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*search_catalog(keywords)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_search_catalog.html#searchcatalog)


Returns a dataframe containing a subset of Simons CMAP catalog of variables. 

All variables at Simons CMAP catalog are annotated with a collection of semantically related keywords. This method takes the passed keywords and returns all of the variables annotated with similar keywords. The passed keywords should be separated by blank space. The search result is not sensitive to the order of keywords and is not case sensitive. The passed keywords can provide any 'hint' associated with the target variables. Below are a few examples: 

* the exact variable name (e.g. NO3), or its linguistic term (Nitrate) 
* methodology (model, satellite ...), instrument (CTD, seaflow), or disciplines (physics, biology ...) 
* the cruise official name (e.g. KOK1606), or unofficial cruise name (Falkor) 
* the name of data producer (e.g Penny Chisholm) or institution name (MIT) 

<br />If you searched for a variable with semantically-related-keywords and did not get the correct results, please let us know. We can update the keywords at any point.


In [None]:
api.search_catalog("silicate in-situ")

<a class="anchor" id="cruises"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*cruises()*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_list_cruises.html#list-cruises)

Returns a dataframe containing the list of cruises registered at Simons CMAP.

In [None]:
api.cruises()

<a class="anchor" id="cruiseTrajectory"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*cruise_trajectory(cruiseName)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_cruise_trajectory.html#cruise-traj)

Returns a dataframe containing the trajectory of the specified cruise.

> **Parameters:** 
>> **cruiseName: string**
>>  <br />The official cruise name. If applicable, you may also use cruise “nickname” (‘Diel’, ‘Gradients_1’ …). A full list of cruise names can be retrieved using the `cruises()` method.
>> <br />


>**Returns:** 
>>  Pandas dataframe.


In [None]:
api.cruise_trajectory("KM1712")

In [None]:
from pycmap.viz import plot_cruise_track
plot_cruise_track(["KM1712"])

<a class="anchor" id="cruiseVariables"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*cruise_variables(cruiseName)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_cruise_variables.html#cruisevars)

Returns a dataframe containing all registered variables (at Simons CMAP) during the specified cruise.
> **Parameters:** 
>> **cruiseName: string**
>>  <br />The official cruise name. If applicable, you may also use cruise “nickname” (‘Diel’, ‘Gradients_1’ …). A full list of cruise names can be retrieved using the `cruises()` method.
>> <br />


>**Returns:** 
>>  Pandas dataframe.

### Example:
Returns a list of measured variables during the KM1712 cruise.

In [None]:
api.cruise_variables("KM1712")

<a class="anchor" id="getDataset"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*get_dataset(tableName)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_retrieve_dataset.html#retrieve-dataset)

Returns the entire dataset. Note that this method does not return the dataset metadata. Use the Metadata method to get the dataset metadata.

> **Parameters:** 
>> **tableName: string**
>>  <br />Table name (each dataset is stored in a table). A full list of table names can be found in [catalog](https://simonscmap.com/catalog).
>> <br />


>**Returns:** 
>>  Pandas dataframe.



In [None]:
api.get_dataset("tblAMT13_Chisholm")

<a class="anchor" id="spaceTime"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*space_time(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_subset_ST.html#subset-st)

Returns a subset of data according to the specified space-time constraints (dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2).
<br />The results are ordered by time, lat, lon, and depth (if exists), respectively.

> **Parameters:** 
>> **table: string**
>>  <br />Table name (each dataset is stored in a table). A full list of table names can be found in [catalog](https://simonscmap.com/catalog).
>> <br />
>> <br />**variable: string**
>>  <br />Variable short name which directly corresponds to a field name in the table. A subset of this variable is returned by this method according to the spatio-temporal cut parameters (below). Pass **'\*'** wild card to retrieve all fields in a table. A full list of variable short names can be found in [catalog](https://simonscmap.com/catalog).
>> <br />
>> <br />**dt1: string**
>>  <br />Start date or datetime. This parameter sets the lower bound of the temporal cut. <br />Example values: '2016-05-25' or '2017-12-10 17:25:00'
>> <br />
>> <br />**dt2: string**
>>  <br />End date or datetime. This parameter sets the upper bound of the temporal cut. 
>> <br />
>> <br />**lat1: float**
>>  <br />Start latitude [degree N]. This parameter sets the lower bound of the meridional cut. Note latitude ranges from -90&deg; to 90&deg;.
>> <br />
>> <br />**lat2: float**
>>  <br />End latitude [degree N]. This parameter sets the upper bound of the meridional cut. Note latitude ranges from -90&deg; to 90&deg;.
>> <br />
>> <br />**lon1: float**
>>  <br />Start longitude [degree E]. This parameter sets the lower bound of the zonal cut. Note longitue ranges from -180&deg; to 180&deg;.
>> <br />
>> <br />**lon2: float**
>>  <br />End longitude [degree E]. This parameter sets the upper bound of the zonal cut. Note longitue ranges from -180&deg; to 180&deg;.
>> <br />
>> <br />**depth1: float**
>>  <br />Start depth [m]. This parameter sets the lower bound of the vertical cut. Note depth is a positive number (it is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**depth2: float**
>>  <br />End depth [m]. This parameter sets the upper bound of the vertical cut. Note depth is a positive number (it is 0 at surface and grows towards ocean floor).


>**Returns:** 
>>  Pandas dataframe.


### Example:
This example retrieves a subset of in-situ salinity measurements by [Argo floats](https://simonscmap.com/catalog/datasets/ARGO_Core).

In [None]:
api.space_time(
              table="tblArgoCore_REP", 
              variable="PSAL", 
              dt1="2022-05-01", 
              dt2="2022-05-30", 
              lat1=28, 
              lat2=38, 
              lon1=-71, 
              lon2=-50, 
              depth1=0, 
              depth2=100
              ) 

<a class="anchor" id="matchCruise"></a>
<a href="#toc" style="float: right;">Table of Contents</a>

## [*along_track(cruise, targetTables, targetVars, depth1, depth2, temporalTolerance, latTolerance, lonTolerance, depthTolerance)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/pycmap_match_cruise_track_datasets.html#matchcruise)

This method colocalizes a cruise trajectory with the specified target variables. The matching results rely on the tolerance parameters because these parameters set the matching boundaries between the cruise trajectory and target datasets. Please note that the number of matching entries for each target variable might vary depending on the temporal and spatial resolutions of the target variable. In principle, if the cruise trajectory is fully covered by the target variable's spatio-temporal range, there should always be matching results if the tolerance parameters are larger than half of their corresponding spatial/temporal resolutions. Please explore the [catalog](https://simonscmap.com/catalog) to find appropriate target variables to colocalize with the desired cruise. 

<br />This method returns a dataframe containing the cruise trajectory joined with the target variable(s).



> **Parameters:** 
>> **cruise: string**
>>  <br />The official cruise name. If applicable, you may also use cruise "nickname" ('Diel', 'Gradients_1' ...). <br />A full list of cruise names can be retrieved using cruise method.
>> <br />
>> <br />**targetTables: list of string**
>>  <br />Table names of the target datasets to be matched with the cruise trajectory. Notice cruise trajectory can be matched with multiple target datasets. A full list of table names can be found in [catalog](https://simonscmap.com/catalog).
>> <br />
>> <br />**targetVars: list of string**
>>  <br />Variable short names to be matched with the cruise trajectory. A full list of variable short names can be found in [catalog](https://simonscmap.com/catalog).
>> <br />
>> <br />**depth1: float**
>>  <br />Start depth [m]. This parameter sets the lower bound of the depth cut on the traget datasets. 'depth1' and 'depth2' allow matching a cruise trajectory (which is at the surface, hopefully!) with traget varaiables at lower depth. Note depth is a positive number (depth is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**depth2: float**
>>  <br />End depth [m]. This parameter sets the upper bound of the depth cut on the traget datasets. Note depth is a positive number (depth is 0 at surface and grows towards ocean floor).
>> <br />
>> <br />**temporalTolerance: list of int**
>> <br />Temporal tolerance values between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single integer value is given, that would be applied to all target datasets. This parameter is in day units except when the target variable represents monthly climatology data in which case it is in month units. Notice fractional values are not supported in the current version.
>> <br />
>> <br />**latTolerance: list of float or int**
>> <br />Spatial tolerance values in meridional direction [deg] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A "safe" value for this parameter can be slightly larger than the half of the traget variable's spatial resolution.
>> <br />
>> <br />**lonTolerance: list of float or int**
>> <br />Spatial tolerance values in zonal direction [deg] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. A "safe" value for this parameter can be slightly larger than the half of the traget variable's spatial resolution.
>> <br />
>> <br />**depthTolerance: list of float or int**
>> <br />Spatial tolerance values in vertical direction [m] between the cruise trajectory and target datasets. The size and order of values in this list should match those of targetTables. If only a single float value is given, that would be applied to all target datasets. 

>**Returns:** 
>>  Pandas dataframe.

### Example:
Colocalizes the Gradients_1 cruise with prochloro_abundance and prokaryote_c01_darwin_clim variables from the Seaflow and Darwin (climatology) Data sets, respectively.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import pycmap

api = pycmap.API()
df = api.along_track(
                    cruise='gradients_3', 
                    targetTables=['tblSeaFlow_v1_5', 'tblDarwin_Nutrient_Climatology'],
                    targetVars=['abundance_prochloro', 'PO4_darwin_clim'],
                    depth1=0, 
                    depth2=5, 
                    temporalTolerance=[0, 0],
                    latTolerance=[0.01, 0.25],
                    lonTolerance=[0.01, 0.25],
                    depthTolerance=[5, 5]
                    )




################# Simple Plot #################
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
c1, c2 = 'firebrick', 'slateblue'
t1, t2 = 'tblSeaFlow_v1_5', 'tblDarwin_Nutrient_Climatology'
v1, v2 = 'abundance_prochloro', 'PO4_darwin_clim'
ax1.plot(df['lat'], df[v1], 'o', color=c1, markeredgewidth=0, label='SeaFlow', alpha=0.2)
ax1.tick_params(axis='y', labelcolor='r')
ax1.set_ylabel(v1 + api.get_unit(t1, v1), color='r')
ax2.plot(df['lat'], df[v2], 'o', color=c2, markeredgewidth=0, label='Darwin', alpha=0.2)
ax2.tick_params(axis='y', labelcolor='b')
ax2.set_ylabel(v2 + api.get_unit(t2, v2), color='b')
ax1.set_xlabel('Latitude')
fig.tight_layout()

api.get_dataset("tblAMT13_Chisholm")<a class="anchor" id="sample"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## Custom Colocalization:
<code>Sample(source, targets, replaceWithMonthlyClimatolog)<code/>

<br />Samples the targest datasets using the time-location of the source dataset
<br />Returns a dataframe containing the original source data and the joined colocalized target variables.
<br />



> **Parameters:** 
>> **source: dataframe**
>>  <br />A dataframe containing the source datasets (must have time-location columns).
>> <br />
>> <br />**targets: dict**
>>  <br />A dcitionary containing the target table/variables and tolerance parameters. The items in `tolerances` list are: temporal tolerance [days], meridional tolerance [deg], 
>>    zonal tolerance [deg], and vertical tolerance [m], repectively.
>>    Below is an example for `targets` parameter:<br />
>>    <br />targets = {
>>    <br />        "tblSST_AVHRR_OI_NRT": {
>>    <br />                                "variables": ["sst"],
>>    <br />                                "tolerances": [1, 0.25, 0.25, 5]
>>    <br />                                },
>>    <br />        "tblAltimetry_REP": {
>>    <br />                                "variables": ["sla", "adt", "ugosa", "vgosa"],
>>    <br />                                "tolerances": [1, 0.25, 0.25, 5]
>>    <br />                               }
>>    <br />        }
>> <br />
>> <br />**replaceWithMonthlyClimatolog: boolean**
>>  <br />If `True`, monthly climatology of the target variables is colocalized when the target dataset's temporal range does not cover the source data. If `False`, only contemporaneous target data are colocalized. 
>> <br />

>**Returns:** 
>>  Pandas dataframe.

In [None]:
targets = {
            "tblSST_AVHRR_OI_NRT": {
                                    "variables": ["sst"],
                                    "tolerances": [0, 0.25, 0.25, 0]
                                    },
    
            }


pycmap.Sample(
              source=api.get_dataset("tblAMT13_Chisholm"), 
              targets=targets, 
              replaceWithMonthlyClimatolog=True
             )

<a class="anchor" id="query"></a>
<a href="#toc" style="float: right;">Table of Contents</a>
## [*query(sql)*](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_query.html#query)
<br />Simons CMAP datasets are hosted in a SQL database and pycmap package provides the user with a number of pre-developed methods to extract and retrieve subsets of the data. The rest of this documentation is dedicated to explore and explain these methods. In addition to the pre-developed methods, we intend to leave the database open to custom scan queries for interested users. This method takes a custom SQL query statement and returns the results in form of a Pandas dataframe. The full list of table names and variable names (fields) can be obtained using the [get_catalog()](https://cmap.readthedocs.io/en/latest/user_guide/API_ref/pycmap_api/data_retrieval/pycmap_catalog.html#getcatalog) method. In fact, one may use this very method to retrieve the table and field names: `query('EXEC uspCatalog')`. A Dataset is stored in a table and each table field represents a variable. All data tables have the following fields:

* [time] [date or datetime] NOT NULL,
* [lat] [float] NOT NULL,
* [lon] [float] NOT NULL,
* [depth] [float] NOT NULL,

### Note:
Tables which represent a climatological dataset, such as 'tblDarwin_Nutrient_Climatology', will not have a 'time' field. Also, if a table represents a surface dataset, such as satellite products, there would be no 'depth' field. 'depth' is a positive number in meters unit; it is zero at the surface growing towards the ocean's floor. 'lat' and 'lon' are in degrees units, ranging from -90&deg; to 90&deg; and -180&deg; to 180&deg;, respectively.

<br />Please keep in mind that some of the datasets are massive in size (10s of TB), avoid queries without WHERE clause (`SELECT * FROM TABLENAME`). Always try to add some constraints on time, lat, lon, and depth fields (see the basic examples below). 

<br/>Moreover, the database hosts a wide range of predefined stored procedures and functions to streamline nearly all CMAP data services. For instance retrieving the catalog information is achieved using a single call of this procedure: *uspCatalog*. These predefined procedures can be called using the pycmap package (see example below). Alternatively, one may use any SQL client to execute these procedures to retrieve and visualize data (examples: [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-server-ver15), or [Plotly Falcon](https://plot.ly/free-sql-client-download/)). Using the predefined procedures all CMAP data services are centralized at the database layer which dramatically facilitates the process of developing apps with different programming languages (pycmap, web app, cmap4r, ...). Please note that you can improve the current procedures or add new procedures by contributing at the [CMAP database repository](https://github.com/simonscmap/DB). 
Below is a selected list of stored procedures and functions, their arguments will be described in more details subsequently:



* uspCatalog
* uspSpaceTime
* uspTimeSeries
* uspDepthProfile
* uspSectionMap
* uspCruises
* uspCruiseByName
* uspCruiseBounds
* uspWeekly
* uspMonthly
* uspQuarterly
* uspAnnual
* uspMatch
* udfDatasetReferences
* udfMetaData_NoRef





<br />Happy SQL Injection!
<br />
<br />
<br />

### Example:
A sample stored procedure returning the list of all cruises hosted by Simons CMAP.

In [None]:
api.query('EXEC uspCruises')

### Example:
A sample query returning the timeseries of sea surface temperature (sst).

In [None]:
api.query(
         '''
         SELECT [time], AVG(lat) AS lat, AVG(lon) AS lon, AVG(sst) AS sst FROM tblsst_AVHRR_OI_NRT
         WHERE
         [time] BETWEEN '2016-06-01' AND '2016-10-01' AND
         lat BETWEEN 23 AND 24 AND
         lon BETWEEN -160 AND -158
         GROUP BY [time]
         ORDER BY [time]
         '''
         )