<img src="https://raw.githubusercontent.com/euroargodev/argopy/master/docs/_static/argopy_logo_long.png" alt="argopy logo" width="200"/>

# Training Camp - Sept 22<sup>th</sup> 2025

***

## Notebook Title : Working with Argo index files

**Author contact : [G. Maze](https://annuaire.ifremer.fr/cv/17182)**

**Description:**

The Argo dataset is a collection of millions of files. CSV index of these files exist to make Argo data discovery easier. This notebook will take you through Argopy feature for fetching and searching Argo file index with the [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex):
- Load one of the Argo index supported by Argopy,
- Search the index,
- Read properties of an index,
- Export the index.

This notebook basically illustrates [this section of the Argopy documentation](https://argopy.readthedocs.io/en/v1.3.0/advanced-tools/stores/argoindex.html).

🏷️ This notebook was developed with [Argopy version *1.3.0*](https://argopy.readthedocs.io/en/v1.3.0)

©  [European Union Public Licence (EUPL) v1.2](https://github.com/euroargodev/argopy-training/blob/main/LICENSE), see at the bottom of this notebook for more.

**Table of Contents**
- [Load an Argo index of files](#load-an-argo-index-of-files)
    - [✏️ EXERCICE](#✏️-exercice)
    - [🔍 Pro tip](#🔍-pro-tip)
- [Searching the index](#searching-the-index)
  - [Single filter](#single-filter)
    - [🔍 Pro tip](#🔍-pro-tip)
  - [Multiple filters](#multiple-filters)
    - [✏️ EXERCICE](#✏️-exercice)
- [Index properties](#index-properties)
    - [✏️ EXERCICE](#✏️-exercice)
    - [✏️ EXERCICE](#✏️-exercice)
- [Index export](#index-export)
  - [Pandas Dataframe](#pandas-dataframe)
    - [🔍 Pro tip](#🔍-pro-tip)
    - [🔍 Pro tip](#🔍-pro-tip)
  - [CSV index file](#csv-index-file)
- [🔍 Pro tip](#🔍-pro-tip)
- [🏁 End of the notebook](#🏁-end-of-the-notebook)
    - [👀 Useful argopy commands](#👀-useful-argopy-commands)
    - [⚖️ License Information](#⚖️-license-information)
    - [🤝 Sponsor](#🤝-sponsor)
***

Let's start with the import of the Argopy class handling Argo index:

In [None]:
from argopy import ArgoIndex

## Load an Argo index of files

Index files supported by Argopy are [documented here](https://argopy.readthedocs.io/en/v1.3.0/advanced-tools/stores/argoindex.html#index-file-supported). 

We will note that each index file has a keyword for easier calling:

| Index file                                | Shortcut |
|-------------------------------------------|----------|
| ar_index_global_prof.txt                  | core     |
| argo_bio-profile_index.txt                | bgc-b    |
| argo_synthetic-profile_index.txt          | bgc-s    |
| ar_index_global_meta.txt                  | meta     |
| etc/argo-index/argo_aux-profile_index.txt | aux      |

The default [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) class is created with the core profile files index:

In [None]:
idx = ArgoIndex()
idx

#### ✏️ EXERCICE

The index file to load is specified with the `index_file` argument. Load the BGC-Argo synthetic files index.

In [None]:
# Your code here

#### 🔍 Pro tip

By default, an [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) will fetch index files from the [Ifremer GDAC server](https://data-argo.ifremer.fr). 

But other GDAC servers are available, as well as local files, if you have a local copy of the GDAC. You can thus *plug* the [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) on any path: local, http or ftp.

See here for a list of [GDAC host shortnames](https://argopy.readthedocs.io/en/v1.3.0/advanced-tools/stores/argoindex.html#id2).

<br>

Once you created an [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) instance, you can trigger loading with the `load` method:

In [None]:
idx.load()
idx

<br>

The number of files in the loaded index is given by `N_RECORDS` attribute:

In [None]:
idx.N_RECORDS

## Searching the index

If you loaded an index, there is a good chance that you will search it with some filters.

An [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) instance comes with a `query` extension providing multiple filters that can be used individualy or combined.

### Single filter

All possible **single** filters are (as examples):
```python
idx.query.wmo(1901393)
idx.query.cyc(1)
idx.query.wmo_cyc(1901393, [1,12])

# Taking an index BOX definition:
box = [-60, -55, 40., 45., '2007-08-01', '2007-09-01']
idx.query.lon(box) # Only lon_min/lon_max used
idx.query.lat(box) # Only lat_min/lat_max used
idx.query.lon_lat(box) # Only lon_min/lon_max/lat_min/lat_max used
idx.query.date(box)    # Only date_min/date_max used
idx.query.box(box)

idx.query.params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])  # Only for BGC profile index
idx.query.parameter_data_mode({'BBP700': 'D'})  # Only for BGC profile index

idx.query.profiler_type(845)
idx.query.profiler_label('NINJA')
```

Let's work with the full BGC-Argo index, `bgc-b`. We can look for profiles in the Ionian Sea in 2024, for instance:

In [None]:
%%time
idx = ArgoIndex(index_file='bgc-b').load()
idx

In [None]:
%%time
idx = idx.query.box([15.5, 23, 35, 39, '2024-01', '2025-01'])
idx

<br>

The number of files matching the filter is given by the `N_MATCH` attribute:

In [None]:
idx.N_MATCH

<br>

A quick geographic look at the result can be obtained with the `plot` extension and its `trajectory` method:

In [None]:
idx.plot.trajectory();

#### 🔍 Pro tip

The trajectory plot is customizable. For instance if trajectory of each float clutter the figure, it can be removed with the `traj` argument, and free space around plots can be extended with `padding`:

In [None]:
idx.plot.trajectory(traj=False, padding=2);

### Multiple filters

It also possible to combine several single filters together with the `compose` method.

In this scenario, filters composition must be provided by a dictionary as an argument to `compose()`.

Each key of the dictionnary is the name of single filter to compose, and filter arguments are passed as the key.

Examples:
```python
idx.query.compose({'box': BOX, 'wmo': WMOs})
idx.query.compose({'box': BOX, 'params': 'DOXY'})
idx.query.compose({'box': BOX, 'params': (['DOXY', 'DOXY2'], {'logical': 'and'})})
idx.query.compose({'params': 'DOXY', 'profiler_label': 'ARVOR'})
```

#### ✏️ EXERCICE

Look for the number of oxygen profiles sampled globally in 2022.

In [None]:
# Your code here

## Index properties

Once you loaded your index of interest, and possibly run a filter query, it can be usefull to access some properties through the following method:

```python
idx.read_wmo()
idx.read_dac_wmo()
idx.read_params()
idx.read_domain()
idx.records_per_wmo()
```

and
```python
idx.uri
idx.read_files()
```


This for instance, can be used to get the number of profiles per float:

In [None]:
idx.records_per_wmo()

#### ✏️ EXERCICE

Load the BGC-Argo synthetic files index and list all BGC parameters available in the Ionian Sea.

💡 Code hint:
```python
ionian_sea_box = [15.5, 23, 35, 39]
```

In [None]:
# Your code here

In [None]:
idx.read_params()

<br>

If you are interested in looping through index files, the `read_files()` method and `uri` attribute are for you.

`read_files()` will return the list of relative paths, as they are in the index file:

In [None]:
idx.read_files()[0:10]

<br>

while the `uri` attribute will return absolute paths, which depends on the GDAC host used by the [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) instance

In [None]:
idx.uri[0:10]

#### ✏️ EXERCICE

Compare uri of files of a single Argo float profile, using 3 different GDAC hosts.

In [None]:
# Your code here

## Index export

Let's load and search an index for export demonstration purposes:

In [None]:
idx = ArgoIndex(index_file='bgc-s').load()
idx

In [None]:
idx.query.compose({'params': 'CHLA', 'profiler_label': 'ARVOR'})

### Pandas Dataframe

If the [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex) does not provides you with the most appropriate manipulation methods, you can still export the index, or search results, as a [Pandas Dataframe](https://pandas.pydata.org/docs/reference/frame.html) with the `to_dataframe` method like this:

In [None]:
df = idx.to_dataframe()
df.head()

#### 🔍 Pro tip

If this is more appropriate for the development of your software or procedure, you can restrict export to the first nrows of the index with the `nrows` argument:

In [None]:
idx.to_dataframe(nrows=2)

#### 🔍 Pro tip

By default the `to_dataframe()` method will export the search results of your possible query.

If you still want to export the full index, you can use the `index` argument:

In [None]:
df = idx.to_dataframe(index=True)
df.shape

### CSV index file

It is also possible that you would be interested to export on file your search results, as a CSV file following the Argo convention. This can be done with the `to_indexfile()` method:

In [None]:
idx.to_indexfile('MyArgoIndexFile.csv')

<br>

This file can then be loaded elsewhere with an [ArgoIndex](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoIndex.html#argopy.ArgoIndex):

In [None]:
idx = ArgoIndex(host='.', index_file='MyArgoIndexFile.csv', convention='bgc-s').load()
idx

## 🔍 Pro tip

You can loop through an Argo index unique floats as [ArgoFloat](https://argopy.readthedocs.io/en/v1.3.0/generated/argopy.ArgoFloat.html#argopy.ArgoFloat) objects. This can be useful to retrieve information from a float that are not in an index.

As an example, let's get the deployment dates for floats in a specific region:

In [None]:
idx = ArgoIndex(index_file='bgc-s').query.box([15.5, 23, 35, 39, '2024-01', '2025-01'])
idx

In [None]:
for a_float in idx.iterfloats():
    ds = a_float.open_dataset('meta')
    print(a_float.WMO, ds['LAUNCH_DATE'].data)

## 🏁 End of the notebook

***
#### 👀 Useful argopy commands
```python
argopy.reset_options()
argopy.show_options()
argopy.status()
argopy.clear_cache()
argopy.show_versions()
```
#### ⚖️ License Information
This Jupyter Notebook is licensed under the **European Union Public Licence (EUPL) v1.2**.

| Permissions      | Limitations     | Conditions                     |
|------------------|-----------------|--------------------------------|
| ✔ Commercial use | ❌ Liability     | ⓘ License and copyright notice |
| ✔ Modification   | ❌ Trademark use | ⓘ Disclose source              |
| ✔ Distribution   | ❌ Warranty      | ⓘ State changes                |
| ✔ Patent use     |                  | ⓘ Network use is distribution  |
| ✔ Private use    |                  | ⓘ Same license                 |

For more details, visit: [EUPL v1.2 Full Text](https://github.com/euroargodev/argopy-training/blob/main/LICENSE).

#### 🤝 Sponsor
![logo](https://raw.githubusercontent.com/euroargodev/argopy-training/refs/heads/main/for_nb_producers/template_argopy_training_EAONE.png)
***
