# ERA5-Land Monthly Averaged

ERA5-Land is a high-resolution reanalysis dataset that provides a consistent and detailed view of land variables over several decades, combining model data with atmospheric forcing from ERA5 to ensure accuracy. By correcting input variables for altitude differences and leveraging indirect observational influences, it offers enhanced precision for land surface applications like flood and drought forecasting. Despite some inherent uncertainties, ERA5-Land's extensive temporal and spatial resolution makes it a valuable resource for decision-making and environmental analysis.

**Information on Dataset:**
* Source: [ERA5-Land Monthly Data](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land-monthly-means?tab=overview)
* Author: str.ucture GmbH
* Notebook-Version: 1.2 (Updated: March 05, 2025)

## 1. Specifying the paths and working directories

In [1]:
import os

''' ---- Specify Directories Here ---- '''
download_folder = r".\data\era5-land-monthly-data\download"
working_folder = r".\data\era5-land-monthly-data\working"
geotiff_folder = r".\data\era5-land-monthly-data\geotiff"
csv_folder = r".\data\era5-land-monthly-data\csv"
output_folder = r".\data\era5-land-monthly-data\output"
''' ----- End of Declaration ---- '''

os.makedirs(download_folder, exist_ok=True)
os.makedirs(working_folder, exist_ok=True)
os.makedirs(geotiff_folder, exist_ok=True)
os.makedirs(csv_folder, exist_ok=True)
os.makedirs(output_folder, exist_ok=True)

## 2. Download and Extract the Dataset

### 2.1 Authentication

In [2]:
import cdsapi

def main():
    # API key for authentication
    api_key = "fdae60fd-35d4-436f-825c-c63fedab94a4"
    api_url = "https://cds.climate.copernicus.eu/api"

    # Creation of the CDS API client
    client = cdsapi.Client(url=api_url, key=api_key)
    return client

### 2.2 Define the "request" and Download the Dataset

In [3]:
import ipywidgets as widgets
import _utils.extra_era5_land_monthly as utils

var_group_name_list = utils.var_group_name_list
var_group_dict = utils.var_group_dict

selected_variable_group = widgets.Dropdown(
    options=var_group_name_list,
    value=var_group_name_list[0],
    description="Select a variable group",
    style=dict(description_width='initial'),
    layout=widgets.Layout(width='50%'),
)

selected_variable_group

Dropdown(description='Select a variable group', layout=Layout(width='50%'), options=('var_group_temperature', …

In [4]:
current_variable_group = var_group_dict[selected_variable_group.value]

selected_variable = widgets.Dropdown(
    options=current_variable_group,
    value=current_variable_group[1],
    description="Select the variable of interest",
    style=dict(description_width='initial'),
    layout=widgets.Layout(width='50%'),
)

selected_variable

Dropdown(description='Select the variable of interest', index=1, layout=Layout(width='50%'), options=('2m_dewp…

### 2.3 Define Bounding Box Extents (Bbox)

In [5]:
# Define the bounding box coordinates (WGS84 format)
# The coordinates are in the format: [North, West, South, East]
bbox_wgs84_deutschland = [56.0, 5.8, 47.2, 15.0]
bbox_wgs84_de_standard = [5.7, 47.1, 15.2, 55.2]
bbox_wgs84_konstanz = [47.9, 8.9, 47.6, 9.3]
bbox_wgs84_konstanz_standard = [9.0, 47.6, 9.3, 47.8]  # Standard format: [West, South, East, North]

# Alternatively, use a shapefile for precise geographic filtering
import geopandas as gpd
import math

# Load the shapefile of Konstanz (WGS84 projection) for geographic boundary filtering
de_shapefile = r"./shapefiles/de_boundary.shp"
de_gdf = gpd.read_file(de_shapefile)

# Extract the bounding box of the shapefile
de_bounds = de_gdf.total_bounds

# Adjust and buffer the bounding box to create a slightly larger
de_bounds_adjusted = [(math.floor(de_bounds[0]* 10)/10)-0.1,
                      (math.floor(de_bounds[1]* 10)/10)-0.1,
                      (math.ceil(de_bounds[2]* 10)/10)+0.1,
                      (math.ceil(de_bounds[3]* 10)/10)+0.1]

# Rearrange the coordinates to the format: [North, West, South, East]
bbox_de_bounds_adjusted = [de_bounds_adjusted[3], de_bounds_adjusted[0],
                           de_bounds_adjusted[1], de_bounds_adjusted[2]]

### 2.4 Define "dataset" and "request"

Based on the selected **hour** of the day, the **product_type** varies in the request. For example:

* For: product_type = monthly_averaged_reanalysis, the only available hour is 00:00.
* For: product_type = monthly_averaged_reanalysis_by_hour_of_day, multiple hour (00:00 to 23:00) can be selected. Where, separate output bands are generated for average of each hour per month.

This Notebook will primarily focus on the **monthly_averaged_reanalysis** product_type.

In [6]:
# Definition of the dataset and the request parameters
dataset = "reanalysis-era5-land-monthly-means"
request = {
    "product_type": ["monthly_averaged_reanalysis"],
    "variable": selected_variable.value,
    "year": [str(year) for year in range(1950,2025+1,1)],
    "month": [str(month) for month in range(13)],
    "time": ["00:00"], 
    "data_format": "netcdf",
    "download_format": "unarchived",
    "area": bbox_de_bounds_adjusted
}

In [7]:
download_folder_subset = os.path.join(download_folder, f"{request['product_type'][0]}", f"{selected_variable.value}")
os.makedirs(download_folder_subset, exist_ok=True)

# Execute it to download the dataset:
def main_retrieve():
    dataset_filename = f"{selected_variable.value}.nc"
    dataset_filepath = os.path.join(download_folder_subset, dataset_filename)

    # Download the dataset only if the dataset has not been downloaded before
    if not os.path.isfile(dataset_filepath):
        # Call the CDS client only if the dataset has not been downloaded before
        client = main()
        # Download the dataset with the defined request parameters
        client.retrieve(dataset, request, dataset_filepath)
    else:
        print("Dataset already downloaded.")

if __name__ == "__main__":
    main_retrieve()

Dataset already downloaded.


### 2.3 Extract the ZIP file in folder

> Note: Since the dataset is downloaded for a single variable, only 1 netCDF file is downloaded and CDS does not create a zip file for single variable netCDF file

In [8]:
# import zipfile

# extract_folder = os.path.join(working_folder, f"{selected_variable.value}")
# os.makedirs(extract_folder, exist_ok=True)

# # Extract the zip file
# try:
#     if not os.listdir(extract_folder):
#         dataset_filename = dataset_filename = f"{dataset}-{request['product_type'][0]}-{selected_variable.value}.nc"
#         dataset_filepath = os.path.join(download_folder, dataset_filename)

#         with zipfile.ZipFile(dataset_filepath, 'r') as zip_ref:
#             zip_ref.extractall(extract_folder)
#             print(f"Successfully extracted files to: {extract_folder}")
#     else:
#         print("Folder is not empty. Skipping extraction.")
# except FileNotFoundError:
#     print(f"Error: The file {dataset_filepath} was not found.")
# except zipfile.BadZipFile:
#     print(f"Error: The file {dataset_filepath} is not a valid zip file.")
# except Exception as e:
#     print(f"An unexpected error occurred: {e}")

## 3. Investigate the Metadata of the NetCDF4 file

In [9]:
# Print list of netCDF4 files inside the working/extracted folder
filename_list = os.listdir(download_folder_subset)
print(filename_list)

['2m_temperature.nc']


In [10]:
import netCDF4 as nc

# Define the file path for the selected NetCDF dataset
nc_filename = '2m_temperature.nc'
nc_filepath = os.path.join(download_folder_subset, nc_filename)

# Open the NetCDF file in read mode
nc_dataset = nc.Dataset(nc_filepath, mode='r')

# List all variables in the dataset
variables_list = list(nc_dataset.variables.keys())
print(f"Available variables: {list(variables_list)}")

Available variables: ['number', 'valid_time', 'latitude', 'longitude', 'expver', 't2m']


In [11]:
import pandas as pd

# Define variable name from available variables and read variable data
variable_name = 't2m'
variable_data = nc_dataset[variable_name]

# Create a summary of the primary variables
summary = {
    "Variable Name": variable_name,
    "Data Type": variable_data.dtype,
    "Shape": variable_data.shape,
    "Variable Info": f"{variable_data.dimensions}",
    "Units": getattr(variable_data, "units", "N/A"),
    "Long Name": getattr(variable_data, "long_name", "N/A"),
}

# Display the summary of the data set as a DataFrame for better visualisation
nc_summary = pd.DataFrame(list(summary.items()), columns=['Description', 'Details'])

# Display the summary DataFrame
nc_summary

Unnamed: 0,Description,Details
0,Variable Name,t2m
1,Data Type,float32
2,Shape,"(902, 82, 96)"
3,Variable Info,"('valid_time', 'latitude', 'longitude')"
4,Units,K
5,Long Name,2 metre temperature


In [12]:
# Print a summary of all the variables of the dataset
rows = []
for variable in variables_list:
    try:
        var_obj = nc_dataset.variables[variable]
        unit = getattr(var_obj, 'units', 'N/A')
        shape = var_obj.shape
        rows.append({
            "nc_variables": variable,
            "unit": unit,
            "shape": shape
        })
    except Exception as e:
        print(f"Error processing variable {variable}: {e}")

# Create a DataFrame
df = pd.DataFrame(rows)
df

Unnamed: 0,nc_variables,unit,shape
0,number,1,()
1,valid_time,seconds since 1970-01-01,"(902,)"
2,latitude,degrees_north,"(82,)"
3,longitude,degrees_east,"(96,)"
4,expver,,"(902,)"
5,t2m,K,"(902, 82, 96)"


## 4. Export the dataset in CSV Format

### 4.1 Filter Data by Bounding Box and Export as CSV

In [13]:
import xarray as xr

# Function for converting NetCDF data into a Pandas DataFrame
def netcdf_to_dataframe(nc_filepath, bounding_box=None):

    with xr.open_dataset(nc_filepath) as nc_dataset:
        # Access the variable data from the datase
        variable_data = nc_dataset[variable_name]

        # Ensure latitude and longitude names are correct
        latitude_name = 'latitude' if 'latitude' in nc_dataset.coords else 'lat'
        longitude_name = 'longitude' if 'longitude' in nc_dataset.coords else 'lon'

        # Filter the data based on the bounding box, if provided
        if bounding_box:
            filtered_data = variable_data.where(
                (nc_dataset['X'] >= bounding_box[0]) & (nc_dataset['X'] <= bounding_box[2]) &
                (nc_dataset['Y'] >= bounding_box[1]) & (nc_dataset['Y'] <= bounding_box[3]),
                drop=True
            )
        else:
            filtered_data = variable_data

        # Convert the xarray dataset to a pandas DataFrame
        df = filtered_data.to_dataframe().reset_index()

        # Remove columns that are not neeeded (varies depending on the data set)
        if 'number' in df.columns:
            df = df.drop(columns=['number'])
        if 'expver' in df.columns:
            df = df.drop(columns=['expver'])

        # Separate valid_time into date and time
        df['valid_time'] = pd.to_datetime(df['valid_time'])
        df['date'] = df['valid_time'].dt.date
        df['time'] = df['valid_time'].dt.time
        df = df.set_index(['date', 'time', latitude_name, longitude_name])

        return df

In [14]:
dataframe = netcdf_to_dataframe(nc_filepath=nc_filepath)
dataframe

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,valid_time,t2m
date,time,latitude,longitude,Unnamed: 4_level_1,Unnamed: 5_level_1
1950-01-01,00:00:00,55.2,5.7,1950-01-01,
1950-01-01,00:00:00,55.2,5.8,1950-01-01,
1950-01-01,00:00:00,55.2,5.9,1950-01-01,
1950-01-01,00:00:00,55.2,6.0,1950-01-01,
1950-01-01,00:00:00,55.2,6.1,1950-01-01,
...,...,...,...,...,...
2025-02-01,00:00:00,47.1,14.8,2025-02-01,271.331543
2025-02-01,00:00:00,47.1,14.9,2025-02-01,271.675293
2025-02-01,00:00:00,47.1,15.0,2025-02-01,272.151855
2025-02-01,00:00:00,47.1,15.1,2025-02-01,272.720215


In [15]:
# # Define csv filename and filepath for the output
# csv_filename = f"{nc_filename.replace('.nc','.csv')}"
# csv_folderpath = os.path.join(csv_folder, f"{request['product_type'][0]}")
# os.makedirs(csv_folderpath, exist_ok=True)

# csv_filepath = os.path.join(csv_folderpath, csv_filename)

# csv_filepath# Export the DataFrame as CSV if it does not already exist
# if not os.path.isfile(csv_filepath):
#     dataframe = netcdf_to_dataframe(nc_filepath=nc_filepath)
#     dataframe.to_csv(csv_filepath, sep=",", encoding='utf8')
# else:
#     print(f"File already exists at {csv_filepath}.\nSkipping export.")
#     print("Reading existing CSV file...")
#     dataframe = pd.read_csv(csv_filepath).set_index(['time', 'Y', 'X'])

# # Display DataFrame
# dataframe