![header](logos/logo.png)

<center><h2> Ecopath 40 Years - Ostend, 7 June 2024</h2></center>

<center><h4> Downloading Copernicus Marine Service products using the Python API </h4></center>

# 1. Introduction

This is a modify version of a Jupyter notebook developed by Ergane Fouchet and proposed during the 2022 Copernicus Marine Service
Mediterranean Sea Training Workshop. The original version can be downloaded
[here](https://marine.copernicus.eu/services/user-learning-services/mediterranean-sea-training-2022-discover-copernicus-marine-service)

Welcome to this Copernicus Marine Service training over the Mediterranean Sea ! 

In this Jupyter Notebook, you will learn everything you need to access and download the Copernicus Marine Service products with the **Copernicus Marine Toolbox**. To do so, we are going to follow a basic exercise whose objective will be to download the Mediterranean Sea temperature and chlorophyll concentration in several different ways.

In this notebook, we will use the **Copernicus Marine Toolbox Python API**.

# 2. Presentation and access to the data

We are going to work with 2 Copernicus Marine Service datasets. In the following sections, you will learn how to find information on these products and how to access them.

## 2.1. Presentation of the products used

From the Copernicus Marine Service [catalogue](https://resources.marine.copernicus.eu/products), you can explore all the products available with many filters to select the region you are interested in, the parameters you want to study, etc. 

In this exercise, we are interested in the sea temperature and in the chlorophyll concentration; we will get the data from the following two products: 
- The [*Mediterranean Sea Physics Reanalysis*](https://data.marine.copernicus.eu/product/MEDSEA_MULTIYEAR_PHY_006_004/description) model for the temperature;
- The [*Mediterranean Sea Biogechemistry Reanalysis*](https://data.marine.copernicus.eu/product/MEDSEA_MULTIYEAR_BGC_006_008/description) model;

The links associated to the products will direct you to the catalogue information page, where you can find the description of the product, its temporal and spatial resolution, processing level, temporal coverage and more. 

<div class="alert alert-block alert-warning">
    <b>Get Copernicus Marine Service User credentials</b>
<hr>
    Whatever way you choose to download the products, you will need to have by your Copernicus Marine Service User credentials. If you are not registered yet, you can get them from <a href="http://marine.copernicus.eu/services-portfolio/register-now/" target="_blank">Copernicus Marine Service registration page</a>.



## 2.2. The Copernicus Marine Toolbox

The **Copernicus Marine Toolbox** is a software that offers capabilities through both Command Line Interface (CLI) and Python API:

- Metadata Information: List and retrieve metadata information on all variables, datasets, products, and their associated documentation.
- Subset Datasets: Subset datasets to extract only the parts of interest, in preferred format, such as Analysis-Ready Cloud-Optimized (ARCO) Zarr or NetCDF file format.
- Advanced Filters: Apply simple or advanced filters to get multiple files, in original formats like NetCDF/GeoTIFF, via direct Marine Data Store connections.
- No Quotas: Enjoy no quotas, neither on volume size nor bandwidth.

In this notebook we will focus on its Python API. To use the **Copernicus Marine Toolbox** inside a Python program, you need to import it

In [1]:
import copernicusmarine

Before starting, register your credentials in the Copernicus Marine Toolbox, to use it freely from now on

In [4]:
USERNAME = "PUT_YOUR_USERNAME_HERE"
PASSWORD = "PUT_YOUR_PASSWORD_HERE"

copernicusmarine.login(username=USERNAME, password=PASSWORD, skip_if_user_logged_in=True)

INFO - 2024-06-03T15:03:54Z - You are already logged in. Skipping login.


True

## 2.3. Exploring the product with *copernicusmarine.describe*



In the previous lecture, you should have learn how to obtain information about the Copernicus products you want to use.

You can also obtain some information about a product using the CLI of the Copernicus Marine Toolbox; in this case we examinate the dataset `cmems_mod_med_phy-tem_my_4.2km_P1Y-m` that contains the temperature:

In [11]:
datasets = ['cmems_mod_med_phy-tem_my_4.2km_P1Y-m']

json_data = copernicusmarine.describe(contains=datasets, include_description=True)

from pprint import pprint
pprint(json_data)

ERROR - 2024-06-03T15:07:08Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
ERROR - 2024-06-03T15:07:08Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
{'products': [{'description': 'The Med MFC physical multiyear product is '
                              'generated by a numerical system composed of an '
                              'hydrodynamic model, supplied by the Nucleous '
                              'for European Modelling of the Ocean (NEMO) and '
                              'a variational data assimilation scheme '
                              '(OceanVAR) for temperature and salinity '
                              'vertical profiles and satellite Sea Level '
                              'Anomaly along track data. It contains a '
                              'reanalysis dataset and an interim dataset which '
                         

You can check which options the `describe` function can handle by calling the help with the `?`

In [13]:
copernicusmarine.describe?

[0;31mSignature:[0m
[0mcopernicusmarine[0m[0;34m.[0m[0mdescribe[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minclude_description[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minclude_datasets[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minclude_keywords[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minclude_all_versions[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcontains[0m[0;34m:[0m [0mlist[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;34m[[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moverwrite_metadata_cache[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mno_metadata_cache[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdisabl

## 2.4. Download with *copernicusmarine.get*

The **get** function of the **Copernicus Marine Toolbox Python API** is the most basic way to download files from the Copernicus Data Store. It retrieves the files in the exact format produced by the production center. This means that filtering is not applicable; you cannot apply any criteria to select data, only download an entire file of the product. You can see the structure of the directories of the product in the `DATA ACCESS` section of the product web page in the catalogue (click on `Browse`).

For most products, the NetCDF files contain one day of data and are grouped by months in the directories of the **Copernicus Marine Toolbox** option.

In the following cell, we will download the temperature data for the first nine days of February 2019 for the entire Mediterranean Sea

In [18]:
from pathlib import Path

DATASET_ID = "med-cmcc-tem-rean-d"
FILE_FILTER = "*2019020*"
OUTPUT_DIR = Path("data")

output_files = copernicusmarine.get(
    dataset_id=DATASET_ID,
    filter=FILE_FILTER,
    output_directory=OUTPUT_DIR,
    force_download=True
)

ERROR - 2024-06-03T15:11:51Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
ERROR - 2024-06-03T15:11:51Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
ERROR - 2024-06-03T15:11:51Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
INFO - 2024-06-03T15:11:52Z - Dataset version was not specified, the latest one was selected: "202012"
INFO - 2024-06-03T15:11:52Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-06-03T15:11:52Z - Service was not specified, the default one was selected: "original-files"
INFO - 2024-06-03T15:11:52Z - Downloading using service original-files...


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:07<00:00,  1.16it/s]


The output of the method is a list that contains all the files that have been downloaded

In [21]:
output_files[0]

PosixPath('data/MEDSEA_MULTIYEAR_PHY_006_004/med-cmcc-tem-rean-d_202012/2019/02/20190201_d-CMCC--TEMP-MFSe3r1-MED-b20200901_re-sv01.00_(2).nc')

You can see all the options of the function `get` by using again the flag `?`

In [22]:
copernicusmarine.get?

[0;31mSignature:[0m
[0mcopernicusmarine[0m[0;34m.[0m[0mget[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdataset_url[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdataset_id[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdataset_version[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdataset_part[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0musername[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpassword[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0

## 2.5. Download with *copernicusmarine.subset*

Now we start the real part of this exercise! The `subset` command allows you to download a specific part of the dataset. It ignores the structure of the original product and reshapes all the data into a single file.

Using `subset`, we can choose which variables to download and specify the part of the domain. For example, now we will download three years of monthly data for the temperature of the Adriatic Sea.

In [32]:
from pathlib import Path
from datetime import datetime

VARIABLES = ['thetao']

LONGITUDE_RANGE = [11.5, 20.]
LATITUDE_RANGE = [40., 45.979]

START_DATE = datetime(2015, 1, 1)
END_DATE = datetime(2018, 1, 1)

TEMPERATURE_DATASET = "med-cmcc-tem-rean-m"
TEMPERATURE_VARIABLES = ['thetao']
OUTPUT_DIR = Path("data")

downloaded_data_file = copernicusmarine.subset(
    dataset_id=TEMPERATURE_DATASET,
    output_directory=OUTPUT_DIR,
    variables=VARIABLES,
    minimum_longitude=LONGITUDE_RANGE[0],
    maximum_longitude=LONGITUDE_RANGE[1],
    minimum_latitude=LATITUDE_RANGE[0],
    maximum_latitude=LATITUDE_RANGE[1],
    start_datetime=START_DATE,
    end_datetime=END_DATE,
    force_download=True
)

print(f'Data downloaded inside file {downloaded_data_file}')

ERROR - 2024-06-03T16:17:10Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
ERROR - 2024-06-03T16:17:10Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
ERROR - 2024-06-03T16:17:10Z - Client version 1.2.1 is not compatible with current backend service. Please update to the latest client version.
INFO - 2024-06-03T16:17:11Z - Dataset version was not specified, the latest one was selected: "202012"
INFO - 2024-06-03T16:17:11Z - Dataset part was not specified, the first one was selected: "default"
INFO - 2024-06-03T16:17:12Z - Service was not specified, the default one was selected: "arco-time-series"
INFO - 2024-06-03T16:17:13Z - Downloading using service arco-time-series...
INFO - 2024-06-03T16:17:14Z - Estimated size of the dataset file is 583.727 MB.
INFO - 2024-06-03T16:17:14Z - Writing to local storage. Please wait...


  0%|          | 0/9195 [00:00<?, ?it/s]

INFO - 2024-06-03T16:17:27Z - Successfully downloaded to data/med-cmcc-tem-rean-m_thetao_11.50E-20.00E_40.02N-45.94N_1.02-5754.04m_2015-01-01-2018-01-01_(2).nc
Data downloaded inside file data/med-cmcc-tem-rean-m_thetao_11.50E-20.00E_40.02N-45.94N_1.02-5754.04m_2015-01-01-2018-01-01_(2).nc


Now it's your turn! Use what you have just learned to download the chlorophyll data (variable `chl`) from the `med-ogs-pft-rean-m` dataset for the same domain for which we have just downloaded the temperature.