<img src="https://raw.githubusercontent.com/euroargodev/argopy/master/docs/_static/argopy_logo_long.png" alt="argopy logo" width="200"/>

# Training Camp - Sept 22<sup>th</sup> 2025

***

## Notebook Title : Select and fetch Argo data

**Author contact : [G. Maze](https://annuaire.ifremer.fr/cv/17182)**

**Description:**

This notebook describes:
- how to select (region, float, profile) Argo data to fetch,
- how to trigger data fetching (load or download),
- format of return data.

It is all based on the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher). 

This is basically a notebook to explore this [documentation section](https://argopy.readthedocs.io/en/v1.3.0/user-guide/fetching-argo-data/data_selection.html).

For more details on BGC, data sources and user modes, please refer to the dedicated notebooks.

*This notebook was developped with Argopy version: 1.3*

***

Let's start with the usual import:

In [None]:
from argopy import DataFetcher

And to prevent cell output to be too large, we won't display xarray object attributes:

In [None]:
import xarray as xr
xr.set_options(display_expand_attrs = False)

Before selecting any data, let’s first create a DataFetcher instance:

In [None]:
f = DataFetcher()
f

<br>

The ``f`` instance print indicates that `erddap` is the data source for this fetcher (it's the default choice) and that "No access point initialised", an access point is a data selection method.

2nd line of the print gives a list of all the access points available for this data source:

They are 3 data selection methods that can be used on this [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance:
    
- 🗺 ``region`` for a space/time domain,
- 🤖 ``float`` for one or more floats,
- ⚓ ``profile`` for one or more profiles.

We will now review each of these.

## Selecting data to fetch

### 🗺 Select data for a space/time domain

The ``region`` access point takes a rectangular box definition of space/time bounds to be included. Argopy expects one of the following 2 format to define a box:

``box = [lon_min, lon_max, lat_min, lat_max, pres_min, pres_max]``

or

``box = [lon_min, lon_max, lat_min, lat_max, pres_min, pres_max, datim_min, datim_max]``


Longitude, latitude and pressure limits are float values. Starting and ending datetime must be objects convertible to a [Pandas datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html).

Let's try the most exhaustive definition first, and select data from 75W to 45W, 20N to 30N, 0db to 10db and from January to May 2011:

In [None]:
box = [-75, -45, 20, 30, 0, 10, '2011-01', '2011-06']
f = f.region(box)
f

<br>

Now that the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance has been initialised with an access point, the print provides a little bit more information.

Note that the last time bound is exclusive: that’s why here we specify June to retrieve data collected in May.

#### EXERCICE
Make the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance to select data from a single date, say Feb. 12th of 2009:

In [None]:
box = [-75, -45, 20, 30, 0, 10, ...
f = f.region(box)
f

### 🤖 For one or more floats

If you know the Argo float unique identifier number, called a WMO number, you can use the access point ``float`` to specify one or more float WMO platform numbers to select.

For instance, to select data for float WMO 6902746:

In [None]:
f = f.float(6902746)
f

<br>
To fetch data for a collection of floats, input them in a list.

#### EXERCICE
Make the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance to select data from two floats, let's say 6902746 and 6902755:

In [None]:
f = f.float(...
f

### ⚓ For one or more profiles

Use the fetcher access point ``profile`` to specify one or more float WMO platform number and profile cycle number(s) to retrieve profiles for.

For instance, to select data from the 12th profile of float WMO 6902755:

In [None]:
f = f.profile(6902755, 12)
f

<br>

We can note that the profile number correspond to the cycle number.

#### EXERCICE
Make the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance to select data from the first 2 cycles of two floats, let's say again 6902746 and 6902755:

In [None]:
f = f.profile(...
f

## Trigger data fetching

### Default and recommended data structures

Once the access point is created, we can trigger data fetching by calling on the ``data`` property of the [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) instance:

In [None]:
%%time
ds = f.data
ds

<br>

The ``%%time`` command is a Jupyter magic to monitor how much time is needed to execute the cell.

Argopy works primarily with xarray [Dataset](https://docs.xarray.dev/en/stable/api/dataset.html) and this is what is return here when fetching Argo data.

The ``data`` property keep track of data downloaded, so if you re-execute the same command, no data will be downloaded again and cached data are return.

In a notebook, to make sure that you trigger a fresh data download, you can use the explicit ``to_xarray`` method:

In [None]:
%%time
ds = f.to_xarray()
ds

<br>

You can also note that a [DataFetcher](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.fetchers.ArgoDataFetcher.html#argopy.fetchers.ArgoDataFetcher) will return data as a collection of points, not profiles. This is the default choice, primarily driven by performance considerations.

But no worries, Argopy makes it very easy to go from a collection of points to profiles and vice versa. See this notebook for an illustration.

### Alternative data structures

Argopy also makes it easy to fetch data in alternative data structures like [Pandas Dataframe](https://pandas.pydata.org/docs/reference/frame.html) and the legacy Netcdf dataset:

In [None]:
%%time
df = f.to_dataframe()
df.sample(5)

<br>

A [Pandas Dataframe](https://pandas.pydata.org/docs/reference/frame.html) may be usefull, but is not very efficient for performances since all the per-profile properties will have to be replicated on all one-profile rows.

In [None]:
%%time
ds = f.to_dataset()
ds

<br>

Here a netcdf dataset is return, but this feature is primarily for compatibility issues with legacy software and procedures.

We strongly recommend users to work with default xarray DataSet, since it allows more Argopy features to be used.

***
![logo](https://raw.githubusercontent.com/euroargodev/argopy-training/refs/heads/main/notebooks/template_argopy_training_EAONE.png)