# Marine EOV Broker



In [1]:
from lib.MarineRiBroker import MarineBroker
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

print(MarineBroker.ERDDAP_OUTPUT_FORMATS)
print(MarineBroker.EOV_LIST)

['csv', 'geoJson', 'json', 'nc', 'ncCF', 'odvTxt']
['EV_OXY', 'EV_SEATEMP', 'EV_SALIN', 'EV_CURR', 'EV_CHLA', 'EV_CO2', 'EV_NUTS']


## Start the broker

It will take some time (though it still needs improvements on performances). This is because the broker will :
* load vocabularies upon startup
* load erddap datasets metadata from all erddap servers


**Question :**
Do we want to work with all datasets on Erddap servers ? Or do we want to build a fixed list for them ?

In [2]:
%%time
broker = MarineBroker()

INFO:root:Querying vocabulary server for EOV : EV_OXY
INFO:root:Querying vocabulary server for EOV : EV_SEATEMP
INFO:root:Querying vocabulary server for EOV : EV_SALIN
INFO:root:Querying vocabulary server for EOV : EV_CURR
INFO:root:Querying vocabulary server for EOV : EV_CHLA
INFO:root:Querying vocabulary server for EOV : EV_CO2
INFO:root:Querying vocabulary server for EOV : EV_NUTS


CPU times: user 3.31 s, sys: 352 ms, total: 3.66 s
Wall time: 17.3 s


## Create a request to the broker :
The user must provide the EOVs, min/max date/lat/lon, output format desired.

When creating a query, the broker :
* first looks at every dataset to see if they match any eov requested by the user
* then checks if the datasets match the time/bbox requested by the user

In [3]:
%%time
# logger.setLevel(logging.DEBUG)
queries = broker.submit_request(["EV_SALIN", "EV_OXY", "EV_SEATEMP", "EV_CO2", "EV_CHLA"], "2020-01-01", "2020-01-31", -10, 30, 30, 60, "csv")

CPU times: user 303 ms, sys: 73.5 ms, total: 376 ms
Wall time: 6.29 s


## Results

The interesting part !
The broker provides the result as a list of python objects defined in the broker API.

In [4]:
queries

[<lib.MarineRiBroker.ErddapRequest at 0x7f509f7e5bb0>,
 <lib.MarineRiBroker.ErddapRequest at 0x7f509f7de550>]

In [5]:
[q.query_url for q in queries]

['https://www.ifremer.fr/erddap/tabledap/ArgoFloats.csv?time%2Clatitude%2Clongitude%2Cpsal%2Cdoxy%2Ctemp&time%3E=2020-01-01&time%3C=2020-01-31&latitude%3E=30&latitude%3C=60&longitude%3E=-10&longitude%3C=30',
 'https://www.ifremer.fr/erddap/tabledap/ArgoFloats-synthetic-BGC.csv?time%2Clatitude%2Clongitude%2Cpsal%2Cbisulfide%2Ccdom%2Cdoxy%2Cmlpl_doxy%2Cmolar_nitrate%2Ctemp&time%3E=2020-01-01&time%3C=2020-01-31&latitude%3E=30&latitude%3C=60&longitude%3E=-10&longitude%3C=30']

### Looking inside an ErddapRequest object :


In [6]:
sample_query = queries[0]
print(sample_query.query_url)
print(sample_query.dataset_name)
print(sample_query.query_variables)


https://www.ifremer.fr/erddap/tabledap/ArgoFloats.csv?time%2Clatitude%2Clongitude%2Cpsal%2Cdoxy%2Ctemp&time%3E=2020-01-01&time%3C=2020-01-31&latitude%3E=30&latitude%3C=60&longitude%3E=-10&longitude%3C=30
ArgoFloats
['psal', 'doxy', 'temp']


### Showing all urls

In [7]:
for q in queries:
    print(q.query_url)

https://www.ifremer.fr/erddap/tabledap/ArgoFloats.csv?time%2Clatitude%2Clongitude%2Cpsal%2Cdoxy%2Ctemp&time%3E=2020-01-01&time%3C=2020-01-31&latitude%3E=30&latitude%3C=60&longitude%3E=-10&longitude%3C=30
https://www.ifremer.fr/erddap/tabledap/ArgoFloats-synthetic-BGC.csv?time%2Clatitude%2Clongitude%2Cpsal%2Cbisulfide%2Ccdom%2Cdoxy%2Cmlpl_doxy%2Cmolar_nitrate%2Ctemp&time%3E=2020-01-01&time%3C=2020-01-31&latitude%3E=30&latitude%3C=60&longitude%3E=-10&longitude%3C=30


### Convenience method to retrieve data in xarray

We can see that erddap keeps the variables attributes in the output data.

Global attributes could be adapted in the to_xarray method in order to provide the proper min/max values.

In [8]:
ds = sample_query.to_xarray()
ds

### Same but to get a pandas dataframe

In [9]:
df = sample_query.to_pandas_dataframe()
df

  df = sample_query.to_pandas_dataframe()


Unnamed: 0,time,latitude,longitude,psal,doxy,temp
0,UTC,degrees_north,degrees_east,PSU,micromole/kg,degree_Celsius
1,2020-01-05T05:32:30Z,44.40783166666667,-2.898385,35.075,,13.525
2,2020-01-05T05:32:30Z,44.40783166666667,-2.898385,35.073,,13.525
3,2020-01-05T05:32:30Z,44.40783166666667,-2.898385,35.074,,13.526
4,2020-01-05T05:32:30Z,44.40783166666667,-2.898385,35.073,,13.528
...,...,...,...,...,...,...
241065,2020-01-26T05:22:00Z,41.886285,28.993265,22.291,,8.971
241066,2020-01-26T05:22:00Z,41.886285,28.993265,22.294,,8.976
241067,2020-01-26T05:22:00Z,41.886285,28.993265,22.297,,8.98
241068,2020-01-26T05:22:00Z,41.886285,28.993265,22.301,,8.985


## To be done :
* optimizations

## Other ideas :

* download the data on disk ?