<a class="anchor" id="top"></a>

## Outline:
* [Getting Started](#getting-started)
* [Data Structure](#data-structure)
* [Catalog](#catalog)
<br/>
* [**Data Visulization**](#data-visualization)
    * [Histogram](#histogram)
    * [Regional Map](#map)
    * [Time Series](#timeseries)
    * [Mutual Trends](#mutual)
    * [Section Map](#section)
    * [Depth Profile](#depth-profile)
    * [Cruise Sampling](#cruise)
    * [Amplicon 16s](#amplicon)
    * [Colocalize Amplicon](#colocalize-amplicon)
    
* [**Data Retrieval**](#retrieval)
    * [Calling Pre-defined Functions](#retrieval) 
        * [Space-Time Subset](#space-time)
        * [Time Series Subset](#time-series-subset)
        * [Depth Profile Subset](#depth-profile-subset)
        * [Section Subset](#section-subset)
    * [Direct SQL Query](#sql)
        * [SQL: Regional Map](#sql-regional)
        * [SQL: Time Series](#sql-time-series)
        
* [**Synthesis Analysis**](#synthesis)
    * [Colocalize Custom External Dataset](#external)

* [**Use Case**](#use-case)
    * [CP-Lyase (Oscar Sosa)](#cplyase)

* [**CMAP Online Documentation**](#doc)
* [**Open Discussion**](#discussion)
* [**Contact**](#contact)

<a class="anchor" id="getting-started"></a>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>


<center>
<h1> Getting Started </h1>
</center>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





In [None]:
import IPython
IPython.display.IFrame('https://simons-ocean-atlas-documentation.readthedocs.io/en/latest/getting_started/installation.html', width=1200, height=800)

<a class="anchor" id="data-structure"></a>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>


<center>
<h1> Dataset Strucure</h1>
</center>
<br/>
<br/>



| time        | lat           | lon  | depth | [var1] | [...] | [varn] |
| -----------   | -----------   | ----- | ----- | ----- | ----- | ----- |
| <%Y-%m-%dT%H:%M:%S>  | [-90, 90] | [-180, 180] | positive number | number | number | number |

<br/>
<br/>

<center>
<h3>    
see <a href=https://github.com/mdashkezari/opedia/tree/master/template> here</a> for more details
</h3>    
</center>


<br/>
<br/>
<br/>

<center> <h3> A sample dataset </h3> </center>
<center> provided by Katherine Heal <i>et al.</i> (Inglass Lab, UW) </center>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





In [None]:
import pandas as pd
pd.read_excel('./data/KM1314_ParticulateCobalamins_2018_06_12_vPublished.xlsx')

<a class="anchor" id="catalog"></a>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>


<center>
<h1> Catalog </h1>
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





In [None]:
from opedia import getCatalog
import pandas as pd


df = pd.read_csv('./data/catalog.csv')
print(df[['Variable', 'Table_Name']].to_string())

<a class="anchor" id="data-visualization"></a>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>


<center>
<h1> Data Visualization </h1>
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





<a class="anchor" id="histogram"></a>

# Plot Distribution (Satellite, Core Argo Floats)

Create histograms of sea surface temperature (satellite), and temperature / salinity measurements by Argo floats.
<br/> <br/>
**Note:**<br/> 
* Satellite SST data set is a daily-global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

* Argo float data set has irregular temporal and spatial resolution. <br/>


In [None]:
from opedia import plotDist as DIS

tables = ['tblSST_AVHRR_OI_NRT', 'tblArgoMerge_REP', 'tblArgoMerge_REP']           # see catalog.csv  for the complete list of tables and variable names    
variables = ['sst', 'argo_merge_temperature_adj', 'argo_merge_salinity_adj']       # see catalog.csv  for the complete list of tables and variable names
startDate = '2016-04-30'   
endDate = '2016-04-30'
lat1, lat2 = 20, 24
lon1, lon2 = -170, 150
depth1, depth2 = 0, 1500
fname = 'Dist'
exportDataFlag = False      # True if you you want to download data

DIS.plotDist(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<a class="anchor" id="map"></a>

# Plot Regional Maps (Satellite, Model)

Create a regional map using satellite and model data.
<br/> <br/>
**Notes:**<br/> 
* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>

* Satellite SST data set is a daily-global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

In [None]:
from opedia import plotRegional as REG


tables = ['tblsst_AVHRR_OI_NRT', 'tblPisces_NRT']    # see catalog.csv  for the complete list of tables and variable names
variables = ['sst', 'Fe']                            # see catalog.csv  for the complete list of tables and variable names   
startDate = '2016-04-30'
endDate = '2016-04-30'
lat1, lat2 = 10, 70
lon1, lon2 = -180, -80
depth1, depth2 = 0, 0.5
fname = 'regional'
exportDataFlag = False       # True if you you want to download data

REG.regionalMap(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<a class="anchor" id="timeseries"></a>

# Plot Time Seriese (Model, Satellite)

Create time series plots using sattelite and model data.
<br/> <br/>
**Note:**<br/> 
* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>

* Satellite wind data set is a 6-hourly global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

* Satellite Altimetry data set is a daily-global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

In [None]:
from opedia import plotTS as TS

tables = ['tblSST_AVHRR_OI_NRT', 'tblAltimetry_REP', 'tblPisces_NRT']    # see catalog.csv  for the complete list of tables and variable names
variables = ['sst', 'sla', 'NO3']                                        # see catalog.csv  for the complete list of tables and variable names
startDate = '2016-03-29'
endDate = '2016-05-29'
lat1, lat2 = 25, 30
lon1, lon2 = -160, -155
depth1, depth2 = 0, 5
fname = 'TS'
exportDataFlag = False                                                   # True if you you want to download data

TS.plotTS(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<a class="anchor" id="mutual"></a>

# Plot one dataset against another (Model, Satellite)

Create plotXY using sattelite and model data.
<br/> <br/>
**Note:**<br/> 
* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>

* Satellite wind data set is a 6-hourly global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

* Satellite Altimetry data set is a daily-global product with spatial resolution $\frac{1}{4}^\circ \times \frac{1}{4}^\circ$.<br/>

In [None]:
from opedia import plotXY as XY

tables = ['tblSST_AVHRR_OI_NRT', 'tblAltimetry_REP', 'tblPisces_NRT']    # see catalog.csv  for the complete list of tables and variable names
variables = ['sst', 'sla', 'NO3']                                        # see catalog.csv  for the complete list of tables and variable names
startDate = '2015-03-29'
endDate = '2016-03-29'
lat1, lat2 = 35, 40
lon1, lon2 = -160, -155
depth1, depth2 = 0, 5
fname = 'XY'
exportDataFlag = False                                                   # True if you you want to download data

XY.plotXY(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<a class="anchor" id="section"></a>

# Plot Section Map (Model outputs)

Create section maps using Darwin and PISCES model outputs.
<br/> <br/>
**Notes:**
* Darwin_Climatology is a monthly climatology version of the Darwin model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$.<br/>

* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>


In [None]:
from opedia import plotSection as SEC

tables = ['tblDarwin_Nutrient_Climatology', 'tblPisces_NRT']     # see catalog.csv  for the complete list of tables and variable names      
variables = ['CDOM_darwin_clim', 'Fe']                           # see catalog.csv  for the complete list of tables and variable names
startDate = '2016-04-30'                                         # PISCES is a weekly model, and here we are using monthly climatology of Darwin model
endDate = '2016-04-30'
lat1, lat2 = 20, 55
lon1, lon2 = -159, -157
depth1, depth2 = 0, 6000
fname = 'SEC'
exportDataFlag = False                                           # True if you you want to download data

SEC.sectionMap(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<a class="anchor" id="depth-profile"></a>

# Plot Depth Profile (BGC-Argo Floats, Model outputs)

Create depth profile plots using model and BGC-Argo float profiles.
<br/> <br/>
**Notes:**
* Darwin_Climatology is a monthly climatology version of the Darwin model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$.<br/>

* Argo float data set has irregular temporal and spatial resolution. <br/>

In [None]:
from opedia import plotDepthProfile as DEP

tables = ['tblArgoMerge_REP', 'tblDarwin_Chl_Climatology']     # see catalog.csv  for the complete list of tables and variable names      
variables = ['argo_merge_chl_adj', 'chl01_darwin_clim']        # see catalog.csv  for the complete list of tables and variable names
startDate = '2016-04-30'   
endDate = '2016-04-30'
lat1, lat2 = 20, 24
lon1, lon2 = -170, -160
depth1, depth2 = 0, 1500
fname = 'DEP'
exportDataFlag = False                                         # True if you you want to download data

DEP.plotDepthProfile(tables, variables, startDate, endDate, lat1, lat2, lon1, lon2, depth1, depth2, fname, exportDataFlag)

<br/> <br/>
<a class="anchor" id="cruise"></a>

# Colocalize Darwin model and satellite data with cruise

Compare the underway (in-situ) picoeukaryote abundance measurements performed during the "Gradient1.0" cruise with satellite chlorophyll data and picoeukaryote climatological estimates provided by Darwin model.

<br/> 
**Notes:**<br/> 

* In-Situ picoeukaryote abundance measurements are results of the SeaFlow data set with 3-minute temporal resultion and irregular spatial resultion.

* Satellite Chlorophyll data used in this example is a daily-global reprocessed and optimally interpolated data set with $4~{\rm km}\times4~{\rm km}$ spatial resolution. 

* Darwin_Climatology is a monthly climatology version of the Darwin model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$.<br/>

<br/>


In [None]:
from opedia import plotCruise as CRS

DB_Cruise = True                 # < True > if cruise trajectory already exists in DB. < False > if arbiturary cruise file (e.g. virtual) 
source = 'tblSeaFlow'            # cruise table name or path to csv trajectory file    
cruise = 'Gradients1.0'              # cruise name, or file name of the csv trajectory file     
resampTau = '6H'                 # resample the cruise trajectory making trajectory time-space resolution coarser: e.g. '6H' (6 hourly), '3T' (3 minutes), ... '0' (ignore)  
fname = 'alongTrack'             # figure filename
tables = ['tblSeaFlow', 'tblDarwin_Plankton_Climatology', 'tblCHL_OI_REP']    # list of varaible table names               
variables = ['picoeuk', 'picoeukaryote_c03_darwin_clim', 'chl']               # list of variable names           
spatialTolerance = 0.3           # colocalizer spatial tolerance (+/- degrees) 
exportDataFlag = False           # export the cruise trajectory and colocalized data on disk
depth1 = 0                      # depth range start (m) 
depth2 = 5                       # depth range end (m)  


df = CRS.getCruiseTrack(DB_Cruise, source, cruise)
df = CRS.resample(df, resampTau) 
loadedTrack = CRS.plotAlongTrack(tables, variables, cruise, resampTau, df, spatialTolerance, depth1, depth2, fname, exportDataFlag, marker='-', msize=30, clr='darkturquoise')

<br/> <br/> 
<a class="anchor" id="amplicon"></a>

# Exact Amplicon Sequence Variants (16S) Along Cruise Track
### Query by taxonomy level, clustering thereshold, and size fraction

The example below retrieves the "topN" number of most abundant sequenced organisms along track of the cruise. One can aggregate and visualize the relative abundance of the organisms according to their taxonomy level, clustering levels, and size fractions. The cruise, 'ANT28-5', is an Atlantic latitudinal transect. <br/> <br/>

**Thanks to Jed Fuhrman and Jesse McNichol (USC) for the beautiful dataset!**  <br/> <br/> 

In [None]:
from opedia import esv

############## set parameters ################
# only plot the top_N number of most abundant organisms
topN = 5           
# aggregate organisims by their taxa level
tax = ['domain', 'kingdom', 'phylum', 'class', 'order', 'genus', 'species'][5]
depth1 = 20
depth2 = depth1
cruise_name = 'ANT28-5'
cluster_level = [89, 92, 96, 97, 98, 99, 100][0]        # minimum similarity precentage to be clustred
size_frac_lower = [0.2, 3, 8][0]                        # size in micro-meter
size_frac_upper = [None, 3, 8][1]                       # size in micro-meter
##############################################

esv.plotESVs(topN, tax, depth1, depth2, cruise_name, cluster_level, size_frac_lower, size_frac_upper)

<br/><br/>
<a class="anchor" id="colocalize-amplicon"></a>

# Colocalize 16S dataset (or any other) with Model and Satellite

Here, the retrieved trends of relative abundances are colocalized with other datasets, in this case with Darwin model. The results are stored in a .csv file in the ./data directory. 

In [None]:
from opedia import colocalize as COL

DB = False                           # < True > if source data exists in the database. < 0 > if the source data set is a spreadsheet file on disk. 
source = './data/esv.csv'            # the source table name (or full filename)    
temporalTolerance = 3                # colocalizer temporal tolerance (+/- degrees)
latTolerance = 0.3                   # colocalizer meridional tolerance (+/- degrees)
lonTolerance = 0.3                   # colocalizer zonal tolerance (+/- degrees) 
depthTolerance = 5                   # colocalizer depth tolerance (+/- meters)
tables = ['tblDarwin_Plankton_Climatology', 'tblDarwin_Plankton_Climatology', 'tblDarwin_Plankton_Climatology']    # list of varaible table names               
variables = ['prokaryote_c01_darwin_clim', 'prokaryote_c02_darwin_clim', 'cocco_c05_darwin_clim']                  # list of variable names           
exportPath = './data/loaded.csv'     # path to save the colocalized data set 
    
COL.matchSource(DB, source, temporalTolerance, latTolerance, lonTolerance, depthTolerance, tables, variables, exportPath)    

<a class="anchor" id="retrieval"></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Data Retrieval </h1>
<h3> Extract customized subsets of data:  calling pre-defined functions</h3> 
</center>



<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





<a class="anchor" id="space-time"></a>

# Space-Time subset
This tutorial shows how to retrieve a generic distribution of a variable within a predefined space-time domain. You need to know the variable and table names, both of which can be found in the catalog. Data is retrieved in form of a dataframe with time, space, and variable columns. <br/> <br/> 

In [None]:
from opedia import subset

############## set parameters ################
table = 'tblsst_AVHRR_OI_NRT'
variable = 'sst'       
dt1 = '2016-06-01'
dt2 = '2016-06-05'
lat1, lat2, lon1, lon2 = 23, 24, -160, -158  
depth1, depth2 = 0, 0
##############################################

subset.spaceTime(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)    # retrieves a DataFrame
#df.to_csv('data.csv', index=False)      # save the retrieved data into a csv file

<a class='anchor' id='time-series-subset'> </a>

# Time series subset
This tutorial shows how to retrieve time series of a variable within a predefined space-time domain. You need to know the variable and table names, both which can be found in the catalog. The *timeSeries* function computes the mean and standard deviation of the variable per time period. Data is retrieved in form of a dataframe with time, space, and variable columns. <br/> <br/> 

In [None]:
from opedia import subset

############## set parameters ################
table = 'tblsst_AVHRR_OI_NRT'
variable = 'sst'       
dt1 = '2016-06-01'
dt2 = '2016-07-01'
lat1, lat2, lon1, lon2 = 23, 24, -160, -158  
depth1, depth2 = 0, 0
##############################################

subset.timeSeries(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)    # retrieves a DataFrame
#df.to_csv('data.csv', index=False)      # save the retrieved data into a csv file

<a class='anchor' id='depth-profile-subset'></a>

# Depth profile subset
This tutorial shows how to retrieve depth profile of a variable within a predefined space-time domain. You need to know the variable and table names, both of which can be found in the catalog. The *depthProfile* function computes the mean and standard deviation of the variable per depth period. Data is retrieved in form of a dataframe. <br/> <br/> 

In [None]:
from opedia import subset

############## set parameters ################
table = 'tblPisces_NRT'
variable = 'CHL'       
dt1 = '2016-04-30'
dt2 = '2016-04-30'
lat1, lat2, lon1, lon2 = 23, 24, -160, -158  
depth1, depth2 = 0, 6000
##############################################

subset.depthProfile(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)    # retrieves a DataFrame
#df.to_csv('data.csv', index=False)      # save the retrieved data into a csv file

<a class='anchor' id='section-subset'></a>

# Section subset
This tutorial shows how to retrieve section profile of a variable within a predefined space-time domain. You need to know the variable and table names, both of which can be found in the catalog. Data is retrieved in form of a dataframe with time, space, and variable columns. <br/> <br/> 

In [None]:
from opedia import subset

############## set parameters ################
table = 'tblPisces_NRT'
variable = 'Fe'       
dt1 = '2016-04-30'
dt2 = '2016-04-30'
lat1, lat2, lon1, lon2 = 22, 50, -160, -158  
depth1, depth2 = 0, 6000
##############################################

subset.section(table, variable, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)    # retrieves a DataFrame
#df.to_csv('data.csv', index=False)      # save the retrieved data into a csv file

<a class="anchor" id="sql"></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Data Retrieval </h1>
<h3> Extract customized subsets of data: <u>direct SQL query</u> <h3>      
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





<a class='anchor' id='sql-regional'></a>

# SQL: Regional Map
If you are familiar with SQL or T-SQL language, you can use "dbfFetch()" function to execute any generic query and retrieve data. Below is a simple example showing how to retrieve a snapshot and plot a basic map.

In [None]:
from opedia import db
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


def plot(dt, lat, lon, data):
    plt.imshow(data, extent=[np.min(lon), np.max(lon), np.min(lat), np.max(lat)], origin='bottom', vmin=0, vmax=1e-4)
    plt.title(field + '\n ' + dt1)
    plt.colorbar()
    plt.show()

    
def prepareQuery(args):
    query = "SELECT [time], lat, lon, depth, %s FROM %s WHERE "
    query += "[time] BETWEEN'%s' AND '%s' AND "
    query += "lat BETWEEN %f AND %f AND "
    query += "lon BETWEEN %f AND %f AND "
    query += "depth BETWEEN %f AND %f "
    query += "ORDER BY [time], lat, lon, depth "
    query = query % args
    return query 



############## set parameters ################
table = 'tblPisces_NRT'
field = 'Fe'        # Mole concentration of dissolved Iron 
dt1 = '2017-06-03'
dt2 = '2017-06-03'
lat1, lat2, lon1, lon2 = 10, 55, -180, -100  
depth1 = 0
depth2 = 1
##############################################


args = (field, table, dt1, dt2, lat1, lat2, lon1, lon2, depth1, depth2)
query = prepareQuery(args)
df = db.dbFetch(query)        
lat = df.lat.unique()
lon = df.lon.unique()
shape = (len(lat), len(lon))
data = df[field].values.reshape(shape)
#df.to_csv(field+'.csv', index=False)    # export data
plot(dt1, lat, lon, data)

<a class='anchor' id='sql-time-series'></a>

# SQL: Time Series
If you are familiar with SQL or T-SQL language, you and use "dbfFetch()" function to execute any generic query and retrieve data. Below is a simple example showing how to retrieve time series and plot.

In [None]:
from opedia import db
import matplotlib.pyplot as plt
%matplotlib inline


def plot(t, y):
    plt.plot(t, y, 'o')
    plt.xlabel('time')
    plt.show()
    
    
    
def prepareQuery(args):
    query = "SELECT [time], AVG(lat) AS lat, AVG(lon) AS lon, AVG(%s) AS %s FROM %s WHERE "
    query += "[time] BETWEEN'%s' AND '%s' AND "
    query += "lat BETWEEN %f AND %f AND "
    query += "lon BETWEEN %f AND %f "   
    query += "GROUP BY [time] "
    query += "ORDER BY [time] "
    query = query % args
    return query 


############## set parameters ################
table = 'tblsst_AVHRR_OI_NRT'
variable = 'sst'       
dt1 = '2016-06-01'
dt2 = '2016-10-01'
lat1, lat2, lon1, lon2 = 23, 24, -160, -158  
##############################################
args = (variable, variable, table, dt1, dt2, lat1, lat2, lon1, lon2)
query = prepareQuery(args)
df = db.dbFetch(query)        
#df.to_csv(variable+'.csv', index=False)    # export data
plot(df['time'], df[variable])

<a class="anchor" id="synthesis"></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Synthesis Analysis </h1>
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





<a class='anchor' id='external'></a>


# Colocalize a custom dataset with Darwin model, and satellite data

Colocalize a a custom dataset (Particulate Cobalamins observed on KM1314 cruise) with climatological POC, prokaryote estimates provided by Darwin model, and dissolved iron concentration. The dataset should be in either '.xlsx' or '.csv' format with 'time', 'lat', 'lon', and 'depth' columns. 


| time        | lat           | lon  | depth | [var1] | [...] | [varn] |
| -----------   | -----------   | ----- | ----- | ----- | ----- | ----- |
| <%Y-%m-%dT%H:%M:%S>  | [-90, 90] | [-180, 180] | positive number | number | number | number |


<br/> 
**Notes:**<br/> 

* Darwin_Climatology is a monthly climatology version of the Darwin model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$.<br/>

* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>

<br/>


**Thanks to Anitra Ingalls, Katherine Heal *et al.* (Inglass Lab, UW) for the beautiful dataset!**  <br/> <br/> 


In [None]:
from opedia import colocalize as COL

DB = False                            # < True > if source data exists in the database. < 0 > if the source data set is a spreadsheet file on disk. 
source = './data/KM1314_ParticulateCobalamins_2018_06_12_vPublished.xlsx'            # the source table name (or full filename)    
temporalTolerance = 1                # colocalizer temporal tolerance (+/- degrees)
latTolerance = 0.3                   # colocalizer meridional tolerance (+/- degrees)
lonTolerance = 0.3                   # colocalizer zonal tolerance (+/- degrees) 
depthTolerance = 5                   # colocalizer depth tolerance (+/- meters)
tables = ['tblDarwin_Nutrient_Climatology', 'tblPisces_NRT', 'tblDarwin_Plankton_Climatology']    # list of varaible table names               
variables = ['poc_darwin_clim', 'Fe', 'prokaryote_c01_darwin_clim']                            # list of variable names           
exportPath = './data/loaded.csv'         # path to save the colocalized data set 
    
COL.matchSource(DB, source, temporalTolerance, latTolerance, lonTolerance, depthTolerance, tables, variables, exportPath)    

<a class="anchor" id="use-case"></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Use Case </h1>
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>





<a class='anchor' id='cplyase'></a>


# Colocalize TARA-CPLyase with model nutrient/chemical fields

Oscar Sosa *et al.* studied the TARA dataset to investigate the oceanic distribution and abundance of the bacterial carbon-phosphorus (C-P) lyase pathway, a specialized enzyme complex that oxidizes phosphonates to acquire phosphate. A number of environmental parameters from Darwin and PICES models synthesized with this study where dissolved iron demonstrated a strong positive correlation to the CP lyase gene relative abundance, particularly to the metagenomic samples from surface waters.

The code below shows the process of colocalizing Oscar's dataset with PISCES model. Since most of the TARA observations carried out before the initiation of PISCES public dataset (year 2012), we had to compute and colocalize the monthly climatology of the model outputs. 


<br/> 
**Notes:**<br/> 

* Pisces model is a weekly-averaged global model with spatial resolution $\frac{1}{2}^\circ \times \frac{1}{2}^\circ$ (data is available only at one-week intervals).<br/>

<br/>


**Thanks to Oscar Sosa *et al.* (Karl Lab, UH)**  <br/> <br/> 


In [None]:
from opedia import db
import pandas as pd
import numpy as np



def appendColumn(df, cols, std):
    for col in cols:
        df[col] = ''
        if std:
            df[col+'_std'] = ''
    return df

def prepareQuery(month, lat1, lat2, lon1, lon2, depth1, depth2):
    args = (month, lat1, lat2, lon1, lon2, depth1, depth2)
    query = "SELECT [month], lat, lon, Fe, NO3, O2, PO4, Si, PP, PHYC, CHL FROM tblPisces_NRT WHERE "
    query = query + "[month]=%s AND "
    query = query + "lat>=%f AND lat<=%f AND "
    query = query + "lon>=%f AND lon<=%f AND "
    query = query + "depth>=%f AND depth<=%f "
    query = query % args
    return query


def localize(month, lat1, lat2, lon1, lon2, depth1, depth2):
    query = prepareQuery(month, lat1, lat2, lon1, lon2, depth1, depth2)
    df = db.dbFetch(query)        
    return df




margin = 0.3                                # lat/lon tolerance (+/- 0.3 deg)
depth_margin = 5                            # depth tolerance (+/- 5 m)
df = pd.read_csv('./Tara_CPlyase.csv')      # load originaldata set

cols = ['month', 'lat1', 'lat2', 'lon1', 'lon2', 'depth1', 'depth2']
std = False
df = appendColumn(df, cols, std)

cols = ['Fe', 'NO3', 'Model_O2', 'PO4', 'Si', 'PP', 'PHYC', 'CHL']
std = True
df = appendColumn(df, cols, std)

for i in ange(len(df)):
    month = str(df.time[i]).split("/")[0]
    lat = df.lat[i]
    lon = df.lon[i]
    depth = df.depth[i]
    lat1, lat2 = lat - margin, lat + margin
    lon1, lon2 = lon - margin, lon + margin
    depth1, depth2 = depth - depth_margin, depth + depth_margin
    data = localize(month, lat1, lat2, lon1, lon2, depth1, depth2)
    df['month'][i], df['lat1'][i], df['lat2'][i], df['lon1'][i], df['lon2'][i], df['depth1'][i], df['depth2'][i] = month, lat1, lat2, lon1, lon2, depth1, depth2
    df['Fe'][i], df['Fe_std'][i] = np.mean(data.Fe), np.std(data.Fe)
    df['NO3'][i], df['NO3_std'][i] = np.mean(data.NO3), np.std(data.NO3)
    df['Model_O2'][i], df['Model_O2_std'][i] = np.mean(data.O2), np.std(data.O2)
    df['PO4'][i], df['PO4_std'][i] = np.mean(data.PO4), np.std(data.PO4)
    df['Si'][i], df['Si_std'][i] = np.mean(data.Si), np.std(data.Si)
    df['PP'][i], df['PP_std'][i] = np.mean(data.PP), np.std(data.PP)
    df['PHYC'][i], df['PHYC_std'][i] = np.mean(data.PHYC), np.std(data.PHYC)
    df['CHL'][i], df['CHL_std'][i] = np.mean(data.CHL), np.std(data.CHL)
df.to_csv('Tara_CPlyase_loaded.csv', index=False)



<a class='anchor' id='doc'></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> CMAP Online Documentation </h1>
</center>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>



In [None]:
import IPython
IPython.display.IFrame('https://simons-ocean-atlas-documentation.readthedocs.io/en/latest/index.html', width=1200, height=800)

<a class='anchor' id='discussion'></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Open Discussion </h1>
</center>

<br/>
<br/>

<ul style="list-style-type:circle">
  <li>What other public datasets should be ingested in the database?</li>
  <li>Any sane/practical solution/suggestion to deal with variable naming convention (Ontology)!</li>
  <li>Do you have any comment/concern about the suggested data structure?</li> 
  <li>How about the suggested metadata?</li> 
  <li>General: any questions about how to contribue in the project?</li>
  <li>...</li> 
</ul>


<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>



<a class='anchor' id='contact'></a>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

<center>
<h1> Contacts </h1>

<ul style="list-style-type:none">
  <li>Norland Raphael Hagen (nrhagen@uw.edu)</li>  
  <li>Mohammad Ashkezari(mdehghan@uw.edu)</li>
</ul>


</center>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

