This notebook demonstrates how to download the CORDEX-ML-Bench dataset for any of the three included domains. All data are available in [this Zenodo repository](https://zenodo.org/records/15797226).

In [1]:
import os
import requests
import zipfile

First, we need to select the domain to download from the following options:

- New Zealand (`NZ`)
- Europe (`ALPS`)
- South Africa (`????`) *TODO*

In [2]:
domain = 'ALPS'

We specify the folder where the dataset will be downloaded.

In [3]:
DATA_PATH = f'/r/scratch/users/mschillinger/data/cordexbench/{domain}'
os.makedirs(DATA_PATH, exist_ok=True)

Below, we automate the download of the corresponding files based on the selected domain. The process varies by domain but should be relatively quick, as each domain’s data is approximately 5 GB.

In [4]:
def download_and_extract(domain, DATA_PATH=DATA_PATH):
    
    BASE_URL = "https://zenodo.org/records/15797226/files"

    zip_path = os.path.join(DATA_PATH, f"{domain}_domain.zip")
    download_url = f"{BASE_URL}/{domain}_domain.zip?download=1"

    # Download the zip file
    with requests.get(download_url, stream=True) as r:
        r.raise_for_status()
        with open(zip_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

    # Extract zip contents into DATA_PATH
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(DATA_PATH)

    # Remove the zip file after extraction
    os.remove(zip_path)

In [None]:
download_and_extract(domain)