# Hazard assessment for river flooding using river discharge statistics
## Accessing data

This notebook illustrates how the river discharges dataset can be downloaded via API from the Copernicus Data Store for subsequent use in the analysis. The dataset is downloaded for the entire Europe, it is not possible to subset it by area prior to downloading. 

Note: alternatively it is possible to access this dataset via the dataset mirror on the CLIMAAX data server, this option is made available to speed up data access. 

### Load libraries

`````{admonition} Find more info about the libraries used in this workflow here
:class: hint dropdown

In this notebook we will use the following Python libraries:
- [os](https://docs.python.org/3/library/os.html) - Provides a way to interact with the operating system, allowing the creation of directories and file manipulation.
- [xarray](https://docs.xarray.dev/en/stable/) - library for working with labelled multi-dimensional arrays.
- cdsapi
- zipfile

These libraries enable the download of the dataset.
`````

In [1]:
import os
from glob import glob
import cdsapi
import zipfile
import xarray as xr

### Create the directory structure
In the next cell will create the directory called 'FLOOD_RIVER_discharges' in the same directory where this notebook is saved. A folder for storing data will be made as well.

In [2]:
# Define the folder for the flood workflow
workflow_folder = 'FLOOD_RIVER_discharges'
os.makedirs(workflow_folder, exist_ok=True)

#data_folder = os.path.join(workflow_folder, 'data')
data_folder = r'n:\My Documents\projects\CLIMAAX\River_discharges\FLOOD_RIVER_discharges\data' # TEMP
os.makedirs(data_folder, exist_ok=True)

data_folder_catch = os.path.join(data_folder, 'EHYPEcatch')
os.makedirs(data_folder_catch, exist_ok=True)

### Data access parameters

In the cell below we will select three GCM-RCM model combinations (see dataset documentation for the available combinations). Using several model combinations helps to assess the uncertainty range due to the different climate models in the river discharges data.  

In [3]:
gcms = ["ec_earth","hadgem2_es","mpi_esm_lr"]
rcms = ["racmo22e","rca4","csc_remo2009"]
ens_members = ['r12i1p1','r1i1p1','r1i1p1']

We also need to initialize the API client to be able to make connection to the CDS servers for downloading the data.

In [None]:
client = cdsapi.Client()

### Downloading river discharge timeseries - historical daily values

First we will download catchment-level discharge data for the historical period. Data is available based on different E-HYPEcatch model realizations. We will download all model realizations.

The daily timeseries are downloaded for the period of 2000-2005. If a different period is required for comparing to local observations, the selection can be adjusted below as part of the API request under "period".

In [None]:
for ii, rcm in enumerate(rcms):
    gcm = gcms[ii]
    ens_member = ens_members[ii]
    file = os.path.join(data_folder_catch, 'download.zip')
    dataset = "sis-hydrology-variables-derived-projections"
    request = {
        "product_type": "essential_climate_variables",
        "variable": ["river_discharge"],
        "variable_type": "absolute_values",
        "time_aggregation": "daily",
        "experiment": ["historical"],
        "hydrological_model":   ["e_hypecatch_m00",
                                "e_hypecatch_m01",
                                "e_hypecatch_m02",
                                "e_hypecatch_m03",
                                "e_hypecatch_m04",
                                "e_hypecatch_m05",
                                "e_hypecatch_m06",
                                "e_hypecatch_m07"],
        "rcm": rcm,
        "gcm": gcm,
        "ensemble_member": ens_member,
        "period": ["2001_2005"]
    }
    client.retrieve(dataset, request, file)

    # Unzip the file that was just downloaded, and remove the zip file
    with zipfile.ZipFile(file, 'r') as zObject:
        zObject.extractall(path=data_folder_catch)
    os.remove(file)

2025-07-23 16:28:30,401 INFO [2025-01-29T00:00:00] This dataset is no longer supported by the data providers. Data and documentation are provided as is. Users are encouraged to use our [Forum](https://forum.ecmwf.int/) to raise any item of discussion with respect to this dataset.
2025-07-23 16:28:30,402 INFO Request ID is 8ded3f52-88c2-494c-97e7-928cddba03ae
2025-07-23 16:28:30,538 INFO status has been updated to accepted
2025-07-23 16:29:20,362 INFO status has been updated to running
2025-07-23 16:42:49,984 INFO status has been updated to successful
2025-07-23 16:43:34,731 INFO [2025-01-29T00:00:00] This dataset is no longer supported by the data providers. Data and documentation are provided as is. Users are encouraged to use our [Forum](https://forum.ecmwf.int/) to raise any item of discussion with respect to this dataset.
2025-07-23 16:43:34,732 INFO Request ID is a4f7230d-5855-427e-9435-e8a8e6d1c1cc
2025-07-23 16:43:34,810 INFO status has been updated to accepted
2025-07-23 16:43:

In [4]:
def preprocess_daily(ds):
    filename = ds.encoding['source'].split("/")[-1].split("\\")[-1]
    ds['gcm_rcm'] = f'{filename.split("_")[3]}_{filename.split("_")[6]}'
    ds = ds.set_coords('gcm_rcm').expand_dims('gcm_rcm')

    ds['catchmodel'] = filename.split("_")[2]
    ds = ds.set_coords('catchmodel').expand_dims('catchmodel')    

    return ds

In [15]:
# CHECK
files = glob(os.path.join(data_folder_catch, 'rdis_day_E-HYPEcatch*-EUR-11_*_catch_v1.nc'))
ds_day = xr.open_mfdataset(files, preprocess=preprocess_daily)

In [16]:
ds_day

Unnamed: 0,Array,Chunk
Bytes,5.68 GiB,135.98 kiB
Shape,"(8, 3, 1826, 34810)","(1, 1, 1, 34810)"
Dask graph,43824 chunks in 100 graph layers,43824 chunks in 100 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 5.68 GiB 135.98 kiB Shape (8, 3, 1826, 34810) (1, 1, 1, 34810) Dask graph 43824 chunks in 100 graph layers Data type float32 numpy.ndarray",8  1  34810  1826  3,

Unnamed: 0,Array,Chunk
Bytes,5.68 GiB,135.98 kiB
Shape,"(8, 3, 1826, 34810)","(1, 1, 1, 34810)"
Dask graph,43824 chunks in 100 graph layers,43824 chunks in 100 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [18]:
ds_day.to_netcdf(os.path.join(data_folder, 'rdis_day_E-HYPEcatch_allmodels.nc'))

### Downloading river discharge timeseries - monthly means

Next we will download the historical monthly means of river discharges for 1971-2000 from the E-HYPEcatch models which are useful for checking longer-term statistics of river discharges in the historical climate.

In [None]:
for ii, rcm in enumerate(rcms):
    gcm = gcms[ii]
    ens_member = ens_members[ii]
    file = os.path.join(data_folder_catch, 'download.zip')
    dataset = "sis-hydrology-variables-derived-projections"
    request = {
        "product_type": "climate_impact_indicators",
        "variable": ["river_discharge"],
        "variable_type": "absolute_values",
        "time_aggregation": "monthly_mean",
        "experiment": ["historical"],
        "hydrological_model":   ["e_hypecatch_m00",
                                "e_hypecatch_m01",
                                "e_hypecatch_m02",
                                "e_hypecatch_m03",
                                "e_hypecatch_m04",
                                "e_hypecatch_m05",
                                "e_hypecatch_m06",
                                "e_hypecatch_m07"],
        "rcm": rcm,
        "gcm": gcm,
        "ensemble_member": ens_member,
        "period": ["1971_2000"]
    }
    client.retrieve(dataset, request, file)

    # Unzip the file that was just downloaded, and remove the zip file
    with zipfile.ZipFile(file, 'r') as zObject:
        zObject.extractall(path=data_folder_catch)
    os.remove(file)


We will download monthly means of discharges for future periods of 2011-2040, 2041-2070 and 2071-2100:

In [None]:
for ii, rcm in enumerate(rcms):
    gcm = gcms[ii]
    ens_member = ens_members[ii]

    for period in ["2011_2040","2041_2070","2071_2100"]:
        file = os.path.join(data_folder_catch, 'download.zip')
        dataset = "sis-hydrology-variables-derived-projections"
        request = {
            "product_type": "climate_impact_indicators",
            "variable": ["river_discharge"],
            "variable_type": "absolute_values",
            "time_aggregation": "monthly_mean",
            "experiment": ["rcp_4_5","rcp_8_5"],
            "hydrological_model":  ["e_hypecatch_m00",
                                    "e_hypecatch_m01",
                                    "e_hypecatch_m02",
                                    "e_hypecatch_m03",
                                    "e_hypecatch_m04",
                                    "e_hypecatch_m05",
                                    "e_hypecatch_m06",
                                    "e_hypecatch_m07"],
            "rcm": rcm,
            "gcm": gcm,
            "ensemble_member": ens_member,
            "period": period
        }
        client.retrieve(dataset, request, file)

        # Unzip the file that was just downloaded, and remove the zip file
        with zipfile.ZipFile(file, 'r') as zObject:
            zObject.extractall(path=data_folder_catch)
        os.remove(file)

We will make use of a preprocessing function to write model names and scenarios to the dataset dimensions:

In [19]:
def preprocess_monthly_mean(ds):
    filename = ds.encoding['source'].split("/")[-1].split("\\")[-1]
    ds['gcm_rcm'] = f'{filename.split("_")[4]}_{filename.split("_")[7]}'
    ds = ds.set_coords('gcm_rcm').expand_dims('gcm_rcm')

    ds['catchmodel'] = filename.split("_")[3]
    ds = ds.set_coords('catchmodel').expand_dims('catchmodel')    

    ds['scenarios'] = filename.split("_")[5]
    ds = ds.set_coords('scenarios').expand_dims('scenarios')

    ds['time_period'] = filename.split("_")[9]
    ds = ds.set_coords('time_period').expand_dims('time_period')

    ds['time'] = ds.time.dt.month

    return ds

Now we can read the dataset of monthly means of river discharges into a single dataset variable and save it on disk as one file for ease of future access.

In [20]:
files = glob(os.path.join(data_folder_catch, 'rdis_ymonmean_abs_E-HYPEcatch*-EUR-11_*_na_*_catch_v1.nc'))
ds_monmean = xr.open_mfdataset(files, preprocess=preprocess_monthly_mean)

In [21]:
ds_monmean

Unnamed: 0,Array,Chunk
Bytes,611.89 MiB,271.95 kiB
Shape,"(4, 3, 8, 4, 12, 34810)","(2, 1, 1, 1, 1, 34810)"
Dask graph,3456 chunks in 973 graph layers,3456 chunks in 973 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 611.89 MiB 271.95 kiB Shape (4, 3, 8, 4, 12, 34810) (2, 1, 1, 1, 1, 34810) Dask graph 3456 chunks in 973 graph layers Data type float32 numpy.ndarray",8  3  4  34810  12  4,

Unnamed: 0,Array,Chunk
Bytes,611.89 MiB,271.95 kiB
Shape,"(4, 3, 8, 4, 12, 34810)","(2, 1, 1, 1, 1, 34810)"
Dask graph,3456 chunks in 973 graph layers,3456 chunks in 973 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [None]:
ds_monmean.to_netcdf(os.path.join(data_folder, 'rdis_ymonmean_abs_E-HYPEcatch_allmodels.nc'))

### Downloading data on flood occurence (extreme river discharges)

We will download river discharge data corresponding to the 50-year return period (extreme river discharges projected to be exceeded once in 50 years). Similarly to the timeseries data, we will download this data for different climate scenarios, timelines and catchment models.

Downloading 50-year return period river discharges for the historical climate:

In [None]:
for ii, rcm in enumerate(rcms):
    gcm = gcms[ii]
    ens_member = ens_members[ii]
    
    for period in ["2011_2040","2041_2070","2071_2100"]:
        file = os.path.join(data_folder_catch, 'download.zip')
        dataset = "sis-hydrology-variables-derived-projections"
        request = {
            "product_type": "climate_impact_indicators",
            "variable": ["flood_recurrence_50_years_return_period"],
            "variable_type": "absolute_values",
            "time_aggregation": "annual_mean",
            "experiment": ["historical"],
            "hydrological_model": ["e_hypecatch_m00",
                                    "e_hypecatch_m01",
                                    "e_hypecatch_m02",
                                    "e_hypecatch_m03",
                                    "e_hypecatch_m04",
                                    "e_hypecatch_m05",
                                    "e_hypecatch_m06",
                                    "e_hypecatch_m07"],
            "rcm": rcm,
            "gcm": gcm,
            "ensemble_member": ens_member,
            "period": period
        }   
        client.retrieve(dataset, request, file)

        # Unzip the file that was just downloaded, and remove the zip file
        with zipfile.ZipFile(file, 'r') as zObject:
            zObject.extractall(path=data_folder_catch)
        os.remove(file)

Downloading 50-year return period river discharges for the future time periods:

In [None]:
for ii, rcm in enumerate(rcms):
    gcm = gcms[ii]
    ens_member = ens_members[ii]

    for period in ["2011_2040","2041_2070","2071_2100"]:
        file = os.path.join(data_folder_catch, 'download.zip')
        dataset = "sis-hydrology-variables-derived-projections"
        request = {
            "product_type": "climate_impact_indicators",
            "variable": ["flood_recurrence_50_years_return_period"],
            "variable_type": "absolute_values",
            "time_aggregation": "annual_mean",
            "experiment": ["rcp_4_5","rcp_8_5"],
            "hydrological_model":  ["e_hypecatch_m00",
                                    "e_hypecatch_m01",
                                    "e_hypecatch_m02",
                                    "e_hypecatch_m03",
                                    "e_hypecatch_m04",
                                    "e_hypecatch_m05",
                                    "e_hypecatch_m06",
                                    "e_hypecatch_m07"],
            "rcm": rcm,
            "gcm": gcm,
            "ensemble_member": ens_member,
            "period": period
        }   
        client.retrieve(dataset, request, file)

        # Unzip the file that was just downloaded, and remove the zip file
        with zipfile.ZipFile(file, 'r') as zObject:
            zObject.extractall(path=data_folder_catch)
        os.remove(file)

We will make use of a preprocessing function to write model names and scenarios to the dataset dimensions:

In [None]:
def preprocess_flood_occurence(ds):
    filename = ds.encoding['source'].split("/")[-1].split("\\")[-1]
    ds['gcm_rcm'] = f'{filename.split("_")[4]}_{filename.split("_")[7]}'
    ds = ds.set_coords('gcm_rcm').expand_dims('gcm_rcm')

    ds['scenarios'] = filename.split("_")[5]
    ds = ds.set_coords('scenarios').expand_dims('scenarios')

    time_period = [filename.split("_")[9]]
    ds = ds.assign_coords(time_period=("time",time_period))
    return ds

We will read the dataset of extreme river discharges into a single dataset variable and save it on disk:

In [25]:
files = glob(os.path.join(data_folder_catch, 'rdisreturnmax50_tmean_abs_E-HYPEcatch*_catch_v1.nc'))
ds_flood = xr.open_mfdataset(files, preprocess=preprocess_flood_occurence)

In [27]:
ds_flood

Unnamed: 0,Array,Chunk
Bytes,7.97 MiB,271.95 kiB
Shape,"(3, 5, 4, 34810)","(1, 1, 2, 34810)"
Dask graph,45 chunks in 151 graph layers,45 chunks in 151 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 7.97 MiB 271.95 kiB Shape (3, 5, 4, 34810) (1, 1, 2, 34810) Dask graph 45 chunks in 151 graph layers Data type float32 numpy.ndarray",3  1  34810  4  5,

Unnamed: 0,Array,Chunk
Bytes,7.97 MiB,271.95 kiB
Shape,"(3, 5, 4, 34810)","(1, 1, 2, 34810)"
Dask graph,45 chunks in 151 graph layers,45 chunks in 151 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [26]:
ds_flood.to_netcdf(os.path.join(data_folder, 'rdisreturnmax50_tmean_abs_E-HYPEcatch_allmodels.nc'))

Now all of the data that we need for the analysis has been retrieved and aggregated. In the next notebooks this data will be used to analyze the impact of climate scenarios on the seasonal and extreme river discharges as a proxy for flood hazard.

Author of the workflow:  
Natalia Aleksandrova (Deltares)