[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/geacomputing/UCY2Sept/blob/main/Python_code/Hands-on/6_Downloading_Data_from_Copernicus_API.ipynb)


In [None]:
#Install packages
!rm -rf /content/sample_data


# Install netcdf library
!apt-get install -y netcdf-bin


!pip install -r https://raw.githubusercontent.com/geacomputing/UCY2Sept/main/requirements.txt

In [None]:
import cdsapi
import xarray as xr
from datetime import datetime

## 🛠️ Exercise: Setting Up Your `.cdsapirc` File

Before using the **Copernicus CDS API**, you need to authenticate using your personal API key.  
This is done via a hidden config file called `.cdsapirc`, stored in your home directory.

---

### ✅ Step-by-Step Instructions

1. **Register** on the [Copernicus Climate Data Store (CDS)](https://cds.climate.copernicus.eu).
   - Create an account (if you don’t already have one).
   - After login, go to:  
     👉 [https://cds.climate.copernicus.eu/how-to-api)

2. **Copy your credentials**, which look like this:

    ```
    url: https://cds.climate.copernicus.eu/api/v2
    key: 123456:abcdef12-3456-7890-abcd-1234567890ab
    ```

3. **Use the Python code below** to create the `.cdsapirc` file with your credentials:

    ```python
    from pathlib import Path

    # Replace these with YOUR OWN credentials from the CDS website
    cds_url = "https://cds.climate.copernicus.eu/api"         # This is real
    cds_key = "123456:abcdef12-3456-7890-abcd-1234567890ab"   # This is made-up

    # (these are made up!) 

    # Build the config content
    cdsapirc_content = f"""url: {cds_url}
    key: {cds_key}
    """

    # Write to ~/.cdsapirc    
    config_path = Path.home() / ".cdsapirc" # <-------this is where the cdsapi expects to find the file!
    config_path.write_text(cdsapirc_content)

    print(f"✅ .cdsapirc file created at: {config_path}")
    ```



In [None]:
from pathlib import Path

cds_url = "https://cds.climate.copernicus.eu/api"  
cds_key = ""

if not cds_key: 
    raise ValueError("Key is empty, before proceding you need to provide a vlid one!")
else:
    try: 
        cdsapirc_content = f"""url: {cds_url}
        key: {cds_key}
        """
        
        # Write to ~/.cdsapirc    
        config_path = Path.home() / ".cdsapirc" # <-------this is where the cdsapi expects to find the file!
        config_path.write_text(cdsapirc_content)
        
        print(f"✅ .cdsapirc file created at: {config_path}")
    except Exception as e:
        print(f"❌ Failed to write .cdsapirc file: {e}")
        

## Downloading ERA5 Temperature Data from Copernicus CDS API

In this section, we will demonstrate how to use the Copernicus Climate Data Store (CDS) Python API to programmatically request and download climate data.

Specifically, we will download **ERA5 reanalysis data** for 2-meter temperature over Cyprus for selected times on June 1, 2022.  

The example includes:  
- Specifying the dataset and parameters  
- Defining the geographic area (Cyprus bounding box)  
- Setting the date and time of interest  
- Requesting data in NetCDF format  
- Saving the downloaded file locally

Make sure you have your `.cdsapirc` file properly configured with your credentials before running this code.


In [None]:
# Initialize the CDS API client
c = cdsapi.Client()

# Define parameters for the data request
variable = "2m_temperature"
year = "2022"
month = "06"
day = "01"
times = ["00:00", "06:00", "12:00", "18:00"]
area = [35.7, 32.2, 34.5, 34.0]  # [North, West, South, East] bounding box for Cyprus

# Create a dynamic filename
date_str = f"{year}{month}{day}"
filename = f"era5_{variable}_cyprus_{date_str}.nc" #add ".nc" for netcdf file

# Request data from CDS
c.retrieve(
    "reanalysis-era5-single-levels",
    {
        "product_type": "reanalysis",
        "format": "netcdf",
        "variable": variable,
        "year": year,
        "month": month,
        "day": day,
        "time": times,
        "area": area,
    },
    filename
)

print(f"Data successfully downloaded and saved as {filename}")

# Inspecting the Downloaded ERA5 NetCDF File

After downloading the data from the Copernicus Climate Data Store (CDS), we use `xarray` to load and examine the contents of the NetCDF file.

This step will help us:

- Verify the variable (`2m_temperature`) is present.
- Confirm the spatial area (bounding box for Cyprus).
- Check the available times (e.g., 00:00, 06:00, 12:00, 18:00).
- Review coordinate ranges and dimensions.

We'll also compare the actual values in the dataset against our original request to ensure the data matches our expectations.


In [None]:
# Load the downloaded NetCDF file
import xarray as xr
ds = xr.open_dataset(filename)

# Print the dataset summary
print(ds)

# Check variable names
print("\nVariables in dataset:", list(ds.data_vars))

# Confirm the variable requested
if variable in ds:
    print(f"\nVariable '{variable}' found with dimensions: {ds[variable].dims}")

# Check coordinate ranges and dimension sizes
print("\nCoordinates summary:")
print(ds.coords)

# Check times
print("\nTimes in dataset:")
print(ds.valid_time.values)

# Confirm area bounding box by checking min/max lat/lon
lat_min = ds.latitude.min().item()
lat_max = ds.latitude.max().item()
lon_min = ds.longitude.min().item()
lon_max = ds.longitude.max().item()

print(f"\nLatitude range: {lat_min:.2f} to {lat_max:.2f}")
print(f"Longitude range: {lon_min:.2f} to {lon_max:.2f}")

# Optionally, check time dimension length and expected times
expected_times = [f"{year}-{month}-{day}T{t}:00" for t in ["00", "06", "12", "18"]]
print(f"\nExpected times : {expected_times}")
print(f"Actual times   : {[str(t) for t in ds.valid_time.values]}")

## ERA5 Data Download for Multiple Domains via CDS API

This guide demonstrates how to programmatically download **ERA5 reanalysis data** from the [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/) using the **CDS API**, iterating over several geographic domains.

---


## 🔧 Script Functionality Overview

The script downloads **2m air temperature** (`2m_temperature`) data from ERA5, for a fixed date and times, over **4 different geographic domains**.

Each domain is defined by:

- a `name` (used in filenames)
- a bounding box `area`, defined by `[North, West, South, East]`

---

## Domains Used

1. **Cyprus** – Eastern Mediterranean island  
   Area: `[35.7, 32.2, 34.5, 34.0]`

2. **Central Europe** – A slice from the North Sea to the Alps  
   Area: `[55.0, 5.0, 45.0, 15.0]`

3. **Northern Africa** – From Morocco to Libya  
   Area: `[35.0, -10.0, 20.0, 30.0]`

4. **Scandinavia** – Covering Norway, Sweden, and Finland  
   Area: `[71.0, 5.0, 55.0, 30.0]`

---

## Temporal Parameters

The script downloads data for:

- **Date**: `2022-06-01`
- **Times**: `00:00`, `06:00`, `12:00`, `18:00` (4 times per day)

---

## Output

- All `.nc` (NetCDF) files are saved to a folder named `downloads/`
- Filenames follow the format:  
  `era5_2m_temperature_<domain_name>_<yyyymmdd>.nc`  
  e.g., `era5_2m_temperature_cyprus_20220601.nc`

---

## Loop Logic

For each domain:
- The script constructs a custom filename
- Submits a request to the CDS API with the specified area and time
- Downloads the file in NetCDF format
- Prints progress messages for each download

---

## Summary

This script is a simple but powerful example of how to automate ERA5 data downloads across multiple regions using a loop and structured parameters. It saves time and ensures consistent processing over different geographic areas.

For more info, visit the [CDS API documentation](https://cds.climate.copernicus.eu/api-how-to).

---


In [None]:

import os

# Initialize CDS API client
c = cdsapi.Client()

# Define date and time parameters
variable = "2m_temperature"
year = "2022"
month = "06"
day = "01"
times = ["00:00", "06:00", "12:00", "18:00"]

# Define multiple domains using bounding boxes: [North, West, South, East]
domains = [
    {
        "name": "cyprus",
        "area": [35.7, 32.2, 34.5, 34.0]
    },
    {
        "name": "central_europe",
        "area": [55.0, 5.0, 45.0, 15.0]
    },
    {
        "name": "northern_africa",
        "area": [35.0, -10.0, 20.0, 30.0]
    },
    {
        "name": "scandinavia",
        "area": [71.0, 5.0, 55.0, 30.0]
    }
]

# Create output directory
output_dir = "downloads"
os.makedirs(output_dir, exist_ok=True)

# Loop through domains and download
for domain in domains:
    area = domain["area"]
    name = domain["name"]
    date_str = f"{year}{month}{day}"
    filename = os.path.join(output_dir, f"era5_{variable}_{name}_{date_str}.nc")

    print(f"\n🔽 Downloading {variable} for domain '{name}' into {filename} ...")
    c.retrieve(
        "reanalysis-era5-single-levels",
        {
            "product_type": "reanalysis",
            "format": "netcdf",
            "variable": variable,
            "year": year,
            "month": month,
            "day": day,
            "time": times,
            "area": area
        },
        filename
    )
    print(f"✅ Download complete for domain '{name}'!")
