# Marine EOV Broker



In [3]:
from marine_eov_broker import MarineRiBroker
import logging
import matplotlib.pyplot as plt

logger = logging.getLogger()
logger.setLevel(logging.INFO)
# logger.setLevel(logging.DEBUG)

print(MarineRiBroker.ERDDAP_OUTPUT_FORMATS)
print(MarineRiBroker.EOV_LIST)

['csv', 'geoJson', 'json', 'nc', 'ncCF', 'odvTxt']
['EV_OXY', 'EV_SEATEMP', 'EV_SALIN', 'EV_CURR', 'EV_CHLA', 'EV_CO2', 'EV_NUTS']


## Start the broker

It will take some time (though it still needs improvements on performances). This is because the broker will :
* load vocabularies upon startup
* load erddap datasets metadata from all erddap servers


**Question :**
Do we want to work with all datasets on Erddap servers ? Or do we want to build a fixed list for them ?

In [5]:
%%time
broker = MarineRiBroker.MarineBroker()

INFO:root:Querying vocabulary server for EOV : EV_OXY
INFO:root:Querying vocabulary server for EOV : EV_SEATEMP
INFO:root:Querying vocabulary server for EOV : EV_SALIN
INFO:root:Querying vocabulary server for EOV : EV_CURR
INFO:root:Querying vocabulary server for EOV : EV_CHLA
INFO:root:Querying vocabulary server for EOV : EV_CO2
INFO:root:Querying vocabulary server for EOV : EV_NUTS


CPU times: user 7.23 s, sys: 1.31 s, total: 8.54 s
Wall time: 13.7 s


## Create a request to the broker :
The user must provide the EOVs, min/max date/lat/lon, output format desired.

When creating a query, the broker :
* first looks at every dataset to see if they match any eov requested by the user
* then checks if the datasets match the time/bbox requested by the user

In [6]:
start_date = "2002-10-01"
end_date = "2003-01-01"
min_lon = -180
min_lat = -90
max_lon = 180
max_lat = 90

In [8]:
%%time
response = broker.submit_request(["EV_SALIN", "EV_OXY", "EV_SEATEMP", "EV_CO2", "EV_CHLA"], 
                                 start_date,
                                 end_date,
                                 min_lon,
                                 min_lat,
                                 max_lon,
                                 max_lat,
                                 "nc"
                                 )

CPU times: user 1.73 s, sys: 159 ms, total: 1.88 s
Wall time: 33.8 s


## Results

The interesting part !
The broker provides a BrokerResponse object. It contains the variable **queries** which is a Pandas DataFrame.

The pandas DataFrame contains all the global attributes, query URL and ErddapRequest object for each dataset found for the user request.

In [9]:
response.queries

Unnamed: 0,query_url,cdm_data_type,citation,Conventions,creator_name,creator_url,data_mode,data_type,defaultDataQuery,defaultGraphQuery,...,time_coverage_duration,time_coverage_resolution,wmo_platform_code,cdm_altitude_proxy,cdm_profile_variables,cdm_trajectory_variables,software_version,testOutOfDate,user_manual_version,creator_email
EMSO_Western_Ionian_Sea_CTD_2002_2003,http://erddap.emso.eu/erddap/tabledap/EMSO_Wes...,Point,"Favali, P., Beranzoli, L., Etiope, G., & Marin...","OceanSITES v1.4, SeaDataNet_1.0,COARDS, CF-1.6...",Istituto Nazionale Geofisica e Vulcanologia - ...,www.moist.it,R,OceanSITES time-series data,time%2CCNDC%2CPRES%2CTEMP&time%3E=2003-02-14T0...,time%2CTEMP&time%3E=2003-02-14T00%3A00%3A00Z&t...,...,,,,,,,,,,
ENVRIplus_b122nnnn,http://erddap.emso.eu/erddap/tabledap/ENVRIplu...,TimeSeries,,"SeaDataNet_1.0, CF-1.6, NCCSV-1.1",BODC,https://www.bodc.ac.uk/,,,SDN_LOCAL_CDI_ID%2Cdepth%2Clongitude%2Clatitud...,"time,TEMPPR01,PSALPR01&time%3E=2007-07-03T00%3...",...,,,,,,,,,,
Emso_Ligure_Dyfamed_TSO2,http://erddap.emso.eu/erddap/tabledap/Emso_Lig...,Point,These data were collected and made freely avai...,"OceanSITES-1.3, COARDS, CF-1.6, ACDD-1.3, NCCS...",IMEV Villefranche-sur-mer,,D,OceanSITES profile data,,time%2CPRES%2CDOXY&time>=2010-01-01T00%3A00%3A...,...,P251Y-2802M-3DT-23H-34M-26S,P1M,68418.0,,,,,,,
ArgoFloats-synthetic-BGC,https://www.ifremer.fr/erddap/tabledap/ArgoFlo...,TrajectoryProfile,,"Argo-3.1 CF-1.6, COARDS, ACDD-1.3",Argo,http://www.argodatamgt.org/Documentation,,,,longitude%2Clatitude%2Cph_in_situ_total&time>=...,...,,,,pres,"cycle_number, latitude, longitude, time",platform_number,1.11 (version 30.06.2020 for ARGO_simplified_p...,now-5days,1.0,
ArgoFloats,https://www.ifremer.fr/erddap/tabledap/ArgoFlo...,TrajectoryProfile,,"Argo-3.1, CF-1.6, COARDS, ACDD-1.3",Argo,https://argo.ucsd.edu/,,,,longitude%2Clatitude%2Ctemp&time>=now-2d&time<...,...,,,,pres,"cycle_number, data_type, format_version, handb...","platform_number, project_name, pi_name, platfo...",,,3.1,support@argo.net
SDC_GLO_AGG_V2,https://www.ifremer.fr/erddap/tabledap/SDC_GLO...,Point,,"COARDS, CF-1.6, ACDD-1.3",,,,,,longitude%2Clatitude%2Ctemp&time>=2019-01-01T0...,...,,,,,,,,,,


**Or just the list of datasets ID**

In [10]:
response.get_datasets_list()

['EMSO_Western_Ionian_Sea_CTD_2002_2003',
 'ENVRIplus_b122nnnn',
 'Emso_Ligure_Dyfamed_TSO2',
 'ArgoFloats-synthetic-BGC',
 'ArgoFloats',
 'SDC_GLO_AGG_V2']

### Access a dataset with its dataset ID and check its parameters

In [13]:
dataset_id = response.get_datasets_list()[-2]
print(dataset_id)

ArgoFloats


In [14]:
response.get_dataset(dataset_id).found_eovs

{'EV_SALIN': 'psal', 'EV_OXY': 'doxy', 'EV_SEATEMP': 'temp'}

### Execute a query & get the result as a Pandas DataFrame...

In [15]:
df = response.query_to_pandas_dataframe(dataset_id)
df

Unnamed: 0_level_0,time,latitude,longitude,psal,doxy,temp
row,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2002-10-08 22:54:36,-2.216,-34.822000,,,4.625
1,2002-10-08 22:54:36,-2.216,-34.822000,,,4.465
2,2002-10-08 22:54:36,-2.216,-34.822000,,,4.425
3,2002-10-08 22:54:36,-2.216,-34.822000,,,4.382
4,2002-10-08 22:54:36,-2.216,-34.822000,,,4.349
...,...,...,...,...,...,...
355075,2002-12-29 20:44:00,41.784,-64.081001,34.933998,,3.510
355076,2002-12-29 20:44:00,41.784,-64.081001,34.931000,,3.464
355077,2002-12-29 20:44:00,41.784,-64.081001,34.931000,,3.439
355078,2002-12-29 20:44:00,41.784,-64.081001,34.930000,,3.409


### ... or an Xarray dataset

In [16]:
ds_sdc = response.query_to_xarray(dataset_id)
ds_sdc

### Only retrieve a specific EOV :

In [17]:
ds_argo = response.query_to_pandas_dataframe(dataset_id, "EV_SEATEMP")
ds_argo

row
0         4.625
1         4.465
2         4.425
3         4.382
4         4.349
          ...  
355075    3.510
355076    3.464
355077    3.439
355078    3.409
355079    3.409
Name: temp, Length: 355080, dtype: float32

### Download a dataset as a NetCDF file

In [18]:
response.query_to_file_download(dataset_id, "nc")

True