<!--NAVIGATION-->
< [Compute Climatology](Climatology.ipynb) | [Index](Index.ipynb) | [Match (colocalize) Cruise Track with Datasets](MatchCruise.ipynb) >

<a href="https://colab.research.google.com/github/simonscmap/pycmap/blob/master/docs/Sampling.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

<a href="https://mybinder.org/v2/gh/simonscmap/pycmap/master?filepath=docs%2FSampling.ipynb"><img align="right" src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" title="Open and Execute in Binder"></a>

## *Sample(source, targets, replaceWithMonthlyClimatolog)*

<br />Samples the target datasets using the time-location coordinates of the source dataset. The source dataset can be any dataframe with time and location (lat, lon, depth) columns, and therefore it doesn't necessarily need to be hosted in Simons CMAP database. For instance, you may load your own csv file and pass it as the source dataset to the `source` argument. The target datasets need to be in the Simons CMAP database and should be specified by a dictionary object as described below. The results strongly rely on the tolerance parameters (specified within the targets dictionary) because they set the matching boundaries between the source and target datasets.



> **Parameters:** 
>> **source: dataframe**
>>  <br />A dataframe containing the source datasets (must have time-location columns).
>> <br />
>> <br />**targets: dict**
>>  <br />A dcitionary containing the target table/variables and tolerance parameters. The keys must be table names and the sampling variables should be enumerated by the `variables` attribute. The items in `tolerances` list are: temporal tolerance [days], meridional tolerance [deg], 
>>    zonal tolerance [deg], and vertical tolerance [m], repectively.
>>    Below is an example for `targets` parameter:<br />
>>    <br />targets = {
>>    <br />        "tblSST_AVHRR_OI_NRT": {
>>    <br />                                "variables": ["sst"],
>>    <br />                                "tolerances": [1, 0.25, 0.25, 5]
>>    <br />                                },
>>    <br />        "tblPisces_NRT": {
>>    <br />                                "variables": ["NO3", "PO4", "Fe", "O2", "Si", "PP"],
>>    <br />                                "tolerances": [4, 0.5, 0.5, 5]
>>    <br />                               }
>>    <br />        }    
>> <br />
Please explore the [catalog](Catalog.ipynb) to find more target datasets. 
<br /><br />
>> <br />**replaceWithMonthlyClimatolog: boolean**
>>  <br />If `True`, monthly climatology of the target variables is sampled when the target dataset's temporal range does not cover the source data. If `False`, only contemporaneous target data are sampled. 
>> <br />

>**Returns:** 
>>  Pandas dataframe containing the original source dataframe joined by the target dataset variables.

## Example
The example below takes a cruise dataset ([AMT13](https://simonscmap.com/catalog/datasets/AMT13_Prochlorococcus_Abundance)) to sample the target datasets. The target datasets are a combination of satellite, numerical model outputs, and climatological datasets.

In [8]:
#!pip install pycmap -q     #uncomment to install pycmap, if necessary

import pycmap


targets = {
        # satellite SST
        "tblSST_AVHRR_OI_NRT": {
                                "variables": ["sst"],
                                "tolerances": [0, 0.25, 0.25, 0]
                                },
        
        # satellite SSS
        "tblSSS_NRT": {
                        "variables": ["sss"],
                        "tolerances": [0.0, 0.25, 0.25, 0]
                        },
        
        # BioGeoChemical Numerical Near-Real-Time Model
        "tblPisces_NRT": {
                          "variables": ["NO3", "PO4", "Fe", "O2", "Si", "PP"],
                          "tolerances": [4, 0.5, 0.5, 5]
                         },
    
        # Darwin Model (Climatology)
        "tblDarwin_Plankton_Climatology": {
                                           "variables": ["prokaryote_c01_darwin_clim", "prokaryote_c02_darwin_clim", "picoeukaryote_c03_darwin_clim", "cocco_c05_darwin_clim", "diazotroph_c10_darwin_clim", "diatom_c15_darwin_clim", "dinoflagellate_c26_darwin_clim", "zooplankton_c36_darwin_clim"],
                                           "tolerances": [0, 0.5, 0.5, 5]
                                          },
               
        # World Ocean Atlas Climatology
        "tblWOA_Climatology": {
                                "variables": ["density_WOA_clim", "salinity_WOA_clim", "nitrate_WOA_clim", "phosphate_WOA_clim", "silicate_WOA_clim", "oxygen_WOA_clim"],
                                "tolerances": [0, 0.75, 0.75, 5]
                                }    
        }



api = pycmap.API(token="<YOUR_API_KEY>")

source = api.get_dataset("tblAMT13_Chisholm")

pycmap.Sample(
              source=source, 
              targets=targets, 
              replaceWithMonthlyClimatolog=True
             )

Gathering metadata .... 
Sampling starts
Sampling finished                                                                                                    

Unnamed: 0,time,lat,lon,depth,Pro_abundance,Pro_total_ecotype_abundance,Pro_eMED4_abundance,Pro_eMIT9312_abundance,Pro_eMIT9211_abundance,Pro_eNATL2A_abundance,...,CMAP_diazotroph_c10_darwin_clim_tblDarwin_Plankton_Climatology,CMAP_diatom_c15_darwin_clim_tblDarwin_Plankton_Climatology,CMAP_dinoflagellate_c26_darwin_clim_tblDarwin_Plankton_Climatology,CMAP_zooplankton_c36_darwin_clim_tblDarwin_Plankton_Climatology,CMAP_density_WOA_clim_tblWOA_Climatology,CMAP_salinity_WOA_clim_tblWOA_Climatology,CMAP_nitrate_WOA_clim_tblWOA_Climatology,CMAP_phosphate_WOA_clim_tblWOA_Climatology,CMAP_silicate_WOA_clim_tblWOA_Climatology,CMAP_oxygen_WOA_clim_tblWOA_Climatology
0,2003-09-14T09:55:00,47.983,-11.539,9.280,0.920600,96901.054453,95878.590000,96.139927,1.000000,923.324525,...,0.072623,0.013547,0.184144,4.954793e-04,25.995854,35.564519,0.390463,0.118319,0.651668,5.683887
1,2003-09-14T09:55:00,47.983,-11.539,43.017,0.269000,21001.014360,19852.152500,24.226982,1.000000,1121.921075,...,0.021383,0.059329,0.084571,7.722752e-05,26.524515,35.569350,1.441909,0.193554,0.952697,5.738659
2,2003-09-14T09:55:00,47.983,-11.539,53.629,0.123600,4502.290607,3847.636500,6.132032,1.000000,645.522075,...,0.010668,0.053715,0.053461,2.120723e-05,26.819662,35.577024,2.483506,0.257478,1.237088,5.681908
3,2003-09-14T09:55:00,47.983,-11.539,62.327,0.095467,2468.064461,2120.974000,4.158968,3.010477,338.685800,...,0.004956,0.037266,0.032368,2.311244e-06,27.076436,35.585125,3.981607,0.333831,1.423695,5.637496
4,2003-09-14T09:55:00,47.983,-11.539,78.587,0.057667,2315.212687,1420.990750,2.133862,1.000000,889.088075,...,0.002066,0.021521,0.018344,-6.080327e-07,27.313025,35.595950,6.384935,0.452379,2.019073,5.608836
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
278,2003-10-12T12:43:59,-47.762,-51.420,62.744,0.001800,5.623099,1.261979,0.821804,1.000000,0.391510,...,0.000210,0.112924,0.037131,-8.328463e-06,27.214811,34.404506,13.960580,1.131356,5.711447,6.595229
279,2003-10-12T12:43:59,-47.762,-51.420,83.045,0.001867,5.154729,0.697516,0.457213,1.000000,1.000000,...,0.000148,0.063733,0.024957,-9.337925e-06,27.336333,34.411320,14.186612,1.187938,4.615173,6.660532
280,2003-10-12T12:43:59,-47.762,-51.420,103.250,0.001267,14.260102,7.436363,1.007110,1.000000,0.759817,...,0.000085,0.037983,0.015820,-1.014305e-05,27.434355,34.393612,13.790549,1.248067,5.389191,6.619431
281,2003-10-12T12:43:59,-47.762,-51.420,163.682,0.000533,5.353990,0.353990,1.000000,1.000000,1.000000,...,,,,,,,,,,
