# Marine EOV Broker



In [1]:
from marine_eov_broker import MarineRiBroker
import logging
import matplotlib.pyplot as plt

logger = logging.getLogger()
logger.setLevel(logging.INFO)
# logger.setLevel(logging.DEBUG)

print(MarineRiBroker.ERDDAP_OUTPUT_FORMATS)
print(MarineRiBroker.EOV_LIST)

['csv', 'geoJson', 'json', 'nc', 'ncCF', 'odvTxt']
['EV_OXY', 'EV_SEATEMP', 'EV_SALIN']


## Start the broker

It will take some time (though it still needs improvements on performances). This is because the broker will :
* load vocabularies upon startup
* load erddap datasets metadata from all erddap servers


**Question :**
Do we want to work with all datasets on Erddap servers ? Or do we want to build a fixed list for them ?

In [2]:
%%time
# broker = MarineRiBroker.MarineBroker()
broker = MarineRiBroker.MarineBroker({"https://www.ifremer.fr/erddap": ["ArgoFloats"]})

INFO:root:Querying vocabulary server for EOV : EV_OXY
INFO:root:Querying vocabulary server for EOV : EV_SEATEMP
INFO:root:Querying vocabulary server for EOV : EV_SALIN


CPU times: user 141 ms, sys: 0 ns, total: 141 ms
Wall time: 8.44 s


## Create a request to the broker :
The user must provide the EOVs, min/max date/lat/lon, output format desired.

When creating a query, the broker :
* first looks at every dataset to see if they match any eov requested by the user
* then checks if the datasets match the time/bbox requested by the user

In [3]:
eovs_request = ["EV_SALIN", "EV_OXY", "EV_SEATEMP"]

start_date = "2022-01-16"
end_date = "2022-01-17"
# North-east Atlantic Ocean
min_lon = -40
min_lat = 35
max_lon = 2
max_lat = 62

# logger.setLevel(logging.DEBUG)

In [4]:
%%time
response = broker.submit_request(eovs_request, 
                                 start_date,
                                 end_date,
                                 min_lon,
                                 min_lat,
                                 max_lon,
                                 max_lat,
                                 "nc"
                                 )

CPU times: user 31.2 ms, sys: 0 ns, total: 31.2 ms
Wall time: 1.45 s


![EOV Broker in action](images/marine_eov_broker_mechanism.png)

## Results

The interesting part !
The broker provides a BrokerResponse object. It contains the variable **queries** which is a Pandas DataFrame.

The pandas DataFrame contains all the global attributes, as well as query url and EOVs found.

In [5]:
response.queries

Unnamed: 0,query_url,cdm_altitude_proxy,cdm_data_type,cdm_profile_variables,cdm_trajectory_variables,Conventions,creator_email,creator_name,creator_url,defaultGraphQuery,...,summary,time_coverage_end,time_coverage_start,title,user_manual_version,Westernmost_Easting,query_object,EV_OXY,EV_SEATEMP,EV_SALIN
ArgoFloats,https://www.ifremer.fr/erddap/tabledap/ArgoFlo...,pres,TrajectoryProfile,"cycle_number, data_type, format_version, handb...","platform_number, project_name, pi_name, platfo...","Argo-3.1, CF-1.6, COARDS, ACDD-1.3",support@argo.net,Argo,https://argo.ucsd.edu/,longitude%2Clatitude%2Ctemp&time>=now-2d&time<...,...,Argo float vertical profiles from Coriolis Glo...,2026-12-27T14:48:20Z,1997-07-28T20:26:20Z,Argo Float Measurements,3.1,-179.99942,<marine_eov_broker.MarineRiBroker.ErddapReques...,doxy,temp,psal


**Get datasets ID**

In [6]:
print(f"Found datasets IDs : {response.get_datasets_list()}")

Found datasets IDs : ['SDC_NAT_CLIM_TS_V2_050_m', 'Emso_Azores_Chemini_IRON', 'SDC_GLO_AGG_V2', 'ArgoFloats-synthetic-BGC', 'ArgoFloats']


### Access a dataset with its dataset ID

In [6]:
dataset_id = response.get_datasets_list()[4]
print(f"Selected dataset: {dataset_id}")

Selected dataset: ArgoFloats


### Get the description of the EOVs found variables in the dataset

In [None]:
response.get_dataset_EOVs_list(dataset_id)

### Get the query URL for the dataset ID

In [None]:
response.get_dataset_query_url(dataset_id)

### Get the selected dataset as a Pandas DataFrame...

In [11]:
# df = response.dataset_to_pandas_dataframe(dataset_id)
# df
response.queries.loc["ArgoFloats"].query_object.to_xarray().to_dataframe()["platform_number"].drop_duplicates()

row
0       6901135
51      6901136
100     6901150
154     1901212
271     4901413
349     5902302
452     6900324
513     6900966
609     6900968
704     6901237
774     6901238
845     6901413
940     6901519
1300    6901597
1456    4901109
1528    4901133
1599    4901194
1700    6901572
Name: platform_number, dtype: object

### ... or an Xarray dataset

In [None]:
ds = response.dataset_to_xarray(dataset_id)
ds

### Download a dataset as a NetCDF file

In [None]:
response.dataset_to_file_download(dataset_id, "nc")