## Create WaterFrame object from ERDDAP

Demo of usage of the method `from_erddap` that simplifies the data request.

In this first demostration, we collect data from the dataset "SBE 37 CTD Data" that is hosted in the ERDDAP Server:
http://erddap.emso.eu/erddap/tabledap/SBE37_cb46_2bc4_0b4f.html

The `from_erddap` method, receives the following parameters

```
    Parameters
    ----------
    
        server     : The ERDDAP server URL        
        dataset_id : The dataset id to query
        
    Optional Parameters
    -------------------
    
        variables  : List of variables to get from ERDDAP server, it can be comma 
                    separated string or a list.
        constraints : Query constraints to appy to the ERDDAP query, this can be 
                      list or dictionary.
        read_csv_kwargs : Dictionary with the parameters to pass to the read_csv 
                          function that converts the ERDDAP response to pandas 
                          DataFrame
        auth : Tupple with username and password to authenticate to a protected ERDDAP server.
```

This method uses the erddap-python library to make the metadata, and data requests, in this method we assume that the dataset is a Tabledap Dataset, and only use ERDDAP constraints to build the query.  

A WaterFrame object is returned with the data property with the DataFrame result of the query (With the time variable as index, if it's available). 

ERDDAP offers "server side functions" to affect the results (orderBy, orderByClosest, orderByMean, etc) this functionality is not currently accesible thru the `from_erddap` method.

In [1]:
from mooda.input import from_erddap

emso_erddap_url = 'http://erddap.emso.eu/erddap'
dataset_id = 'SBE37_cb46_2bc4_0b4f'

wf = from_erddap(emso_erddap_url, 
                 dataset_id,
                 constraints=[ {'time>=' : '2017-11-28T00:00:00Z'} , { 'time<=' : '2017-12-05T07:31:41Z' } ])

wf

Unnamed: 0_level_0,temperature,conductivity,pressure,salinity,sound_velocity,pH
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-11-28 00:00:02+00:00,17.3800,47.8698,19.301,37.2488,1516.855,
2017-11-28 00:00:22+00:00,17.3766,47.8675,19.295,37.2499,1516.846,
2017-11-28 00:00:42+00:00,17.3760,47.8662,19.328,37.2493,1516.844,
2017-11-28 00:01:02+00:00,17.3774,47.8671,19.294,37.2488,1516.847,
2017-11-28 00:01:22+00:00,17.3734,47.8641,19.318,37.2499,1516.837,
...,...,...,...,...,...,...
2017-12-05 07:30:21+00:00,16.3986,46.5413,19.567,36.9839,1513.624,
2017-12-05 07:30:41+00:00,16.3869,46.5391,19.551,36.9928,1513.598,
2017-12-05 07:31:01+00:00,16.3874,46.5423,19.547,36.9951,1513.602,
2017-12-05 07:31:21+00:00,16.3843,46.5382,19.563,36.9944,1513.592,


In [5]:
from pprint import pprint 

pprint(wf.vocabulary)

OrderedDict([('time',
              {'_CoordinateAxisType': 'Time',
               '_dataType': 'double',
               'actual_range': (cftime.DatetimeGregorian(2009, 5, 29, 18, 35, 42, 0),
                                cftime.DatetimeGregorian(2017, 12, 5, 7, 31, 41, 0)),
               'axis': 'T',
               'ioos_category': 'Time',
               'long_name': 'Date/Time',
               'source_name': 'Date/Time',
               'standard_name': 'time',
               'time_origin': '01-JAN-1970 00:00:00',
               'time_precision': '1970-01-01T00:00:00Z',
               'units': 'seconds since 1970-01-01T00:00:00Z'}),
             ('temperature',
              {'_dataType': 'float',
               'actual_range': (11.7101, 27.1452),
               'ioos_category': 'Temperature',
               'long_name': 'sea_water_temperature',
               'units': 'degC'}),
             ('conductivity',
              {'_dataType': 'float',
               'actual_range': (29.39

In [3]:
# WaterFrame normal usage

wf.plot_timeseries()

# WaterFrame datasets expect to have a DEPTH variable?, Can pressure be ussed instead? 
# The time variable by convention needs to be in uppercase TIME?

KeyError: 'DEPTH'

## Demo 2 of from_errddap method

Sample of types of arguments accepted, powered by the erddap-python library

In [6]:
import datetime as dt 

variables = ['time', 'pressure'] 
# datetime format can be used for all time constraints
timemin = dt.datetime(2017,11,1)

# ERDDAP constraint keywords are accepted
# Refer to https://coastwatch.pfeg.noaa.gov/erddap/tabledap/documentation.html#query
timemax = 'max(time)-1months' 

wf2 = from_erddap(emso_erddap_url, 
                  dataset_id,
                  constraints=[ {'time>=' : timemin} , { 'time<' : timemax } ])

wf2

Unnamed: 0_level_0,temperature,conductivity,pressure,salinity,sound_velocity,pH
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-11-01 00:00:15+00:00,21.2696,51.9652,19.440,37.2021,1527.695,
2017-11-01 00:00:35+00:00,21.2709,51.9659,19.348,37.2015,1527.697,
2017-11-01 00:00:55+00:00,21.2711,51.9660,19.348,37.2014,1527.697,
2017-11-01 00:01:35+00:00,21.2706,51.9659,19.384,37.2018,1527.697,
2017-11-01 00:01:55+00:00,21.2715,51.9669,19.386,37.2018,1527.699,
...,...,...,...,...,...,...
2017-11-05 07:30:15+00:00,20.6234,51.2837,19.510,37.2139,1525.979,
2017-11-05 07:30:35+00:00,20.6247,51.2847,19.536,37.2135,1525.982,
2017-11-05 07:30:55+00:00,20.6269,51.2857,19.450,37.2125,1525.985,
2017-11-05 07:31:15+00:00,20.6245,51.2844,19.439,37.2134,1525.980,


## erddap-python request to WaterFrame object

This demo is to show how to get advantage of the server side functionality that ERDDAP offers, all that thru the erddap-python methods

In [7]:
from erddapClient import ERDDAP_Tabledap 

remote = ERDDAP_Tabledap(emso_erddap_url, dataset_id)

# Show the global metadata
remote.info


OrderedDict([('area', 'Mediterranean'),
             ('author', 'Universitat Politecnica de Catalunya (UPC)'),
             ('cdm_data_type', 'Other'),
             ('contact', 'enoc.martinez@upc.edu'),
             ('Conventions', 'COARDS, CF-1.6, ACDD-1.3, NCCSV-1.0'),
             ('defaultGraphQuery',
              'time%2Ctemperature%2C&time>=2017-11-29T00%3A00%3A00Z&time<=2017-12-06T00%3A00%3A00Z&.draw=lines&.marker=5|5&.color=0x000000&.colorBar=|||||&.bgColor=0xffccccff'),
             ('geospatial_lat_max', '41.18212'),
             ('geospatial_lat_min', '41.18212'),
             ('geospatial_lon_max', '1.75257'),
             ('geospatial_lon_min', '1.75257'),
             ('infoUrl', 'http://www.obsea.es'),
             ('institution',
              'SARTI Research Group. Electronics Dept. Universitat Politecnica de Catalunya (UPC)'),
             ('institution_edmo_code', '2150'),
             ('institution_references',
              'http://www.obsea.es/, http://cdsarti.or

In [8]:
# Show the variables metadata, vocabulary equivalent in WaterFrame
remote.variables

OrderedDict([('time',
              {'_dataType': 'double',
               '_CoordinateAxisType': 'Time',
               'actual_range': (cftime.DatetimeGregorian(2009, 5, 29, 18, 35, 42, 0),
                cftime.DatetimeGregorian(2017, 12, 5, 7, 31, 41, 0)),
               'axis': 'T',
               'ioos_category': 'Time',
               'long_name': 'Date/Time',
               'source_name': 'Date/Time',
               'standard_name': 'time',
               'time_origin': '01-JAN-1970 00:00:00',
               'time_precision': '1970-01-01T00:00:00Z',
               'units': 'seconds since 1970-01-01T00:00:00Z'}),
             ('temperature',
              {'_dataType': 'float',
               'actual_range': (11.7101, 27.1452),
               'ioos_category': 'Temperature',
               'long_name': 'sea_water_temperature',
               'units': 'degC'}),
             ('conductivity',
              {'_dataType': 'float',
               'actual_range': (29.3952, 59.5073),
  

In [9]:
# Make some data request of daily mean values
# 

variables = ['time', 'pressure', 'temperature', 'salinity']
remote.clearQuery()

df = (remote.setResultVariables(variables)
            .addConstraint({'time>=' : dt.datetime(2013,1,1)})
            .addConstraint({'time<=' : 'max(time)'})
            .orderByMean('time/1day')  # Request a server side operation
            .getDataFrame(header=0, names=variables, parse_dates=True, index_col='time')
     )
df

Unnamed: 0_level_0,pressure,temperature,salinity
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-04-10 00:00:00+00:00,19.452881,12.741023,37.943239
2013-04-11 00:00:00+00:00,19.436894,12.971672,37.945033
2013-04-12 00:00:00+00:00,19.431021,12.826318,37.922738
2013-04-13 00:00:00+00:00,19.441862,12.928260,37.925933
2013-04-14 00:00:00+00:00,19.419134,12.956201,37.914295
...,...,...,...
2017-12-01 00:00:00+00:00,19.373599,16.874540,37.157962
2017-12-02 00:00:00+00:00,19.385518,16.684279,37.298129
2017-12-03 00:00:00+00:00,19.452079,16.263324,37.157068
2017-12-04 00:00:00+00:00,19.467208,16.081971,37.037986


In [10]:
# Create the WaterFrame object from the ERDDAP_Tabledap instance
import mooda as md

# Make the WaterFrame
wf = md.WaterFrame()
wf.data = df
wf.metadata = remote.info
# Just include in the vocabulary, the variables requested
wf.vocabulary = { key:value for key, value in remote.variables.items() if key in list(df.columns) }

wf

Unnamed: 0_level_0,pressure,temperature,salinity
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2013-04-10 00:00:00+00:00,19.452881,12.741023,37.943239
2013-04-11 00:00:00+00:00,19.436894,12.971672,37.945033
2013-04-12 00:00:00+00:00,19.431021,12.826318,37.922738
2013-04-13 00:00:00+00:00,19.441862,12.928260,37.925933
2013-04-14 00:00:00+00:00,19.419134,12.956201,37.914295
...,...,...,...
2017-12-01 00:00:00+00:00,19.373599,16.874540,37.157962
2017-12-02 00:00:00+00:00,19.385518,16.684279,37.298129
2017-12-03 00:00:00+00:00,19.452079,16.263324,37.157068
2017-12-04 00:00:00+00:00,19.467208,16.081971,37.037986
