# Final Project

This project aims to develope a framework for the spatial and temporal harmonization of ERA5 reanalysis data to enable a direct comparison with meteorological station observations from the DWD. Gridded ERA5 near-surface temperature data are interpolated to station locations and aggregated to a common temporal resolution. The harmonized datasets are then compared using simple evaluation metrics such as bias and root mean square error (RMSE).

## Imports and Prerequisites

In [1]:
import io, zipfile, requests
import pandas as pd

## Access data from the DWD

In [17]:
station_id = "02667" # station in Cologne/Bonn
date_str = "2026-24-01"

base = "https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent"
zip_url = f"{base}/stundenwerte_TU_{station_id}_akt.zip"

r = requests.get(zip_url, timeout=60)
r.raise_for_status()

zf = zipfile.ZipFile(io.BytesIO(r.content))
produkt = next(n for n in zf.namelist() if n.lower().startswith("produkt_") and n.lower().endswith(".txt"))

with zf.open(produkt) as f:
    df = pd.read_csv(f, sep=";")
    #print(df["MESS_DATUM"].min(), df["MESS_DATUM"].max())

df["time"] = pd.to_datetime(df["MESS_DATUM"], format="%Y%m%d%H", errors="coerce")
target_date = pd.to_datetime("2024-07-24").date()
df_day = df[df["time"].dt.date == target_date][["time", "TT_TU"]]


df_day

Unnamed: 0,time,TT_TU
0,2024-07-24 00:00:00,16.3
1,2024-07-24 01:00:00,16.3
2,2024-07-24 02:00:00,16.2
3,2024-07-24 03:00:00,16.0
4,2024-07-24 04:00:00,16.2
5,2024-07-24 05:00:00,16.2
6,2024-07-24 06:00:00,16.4
7,2024-07-24 07:00:00,17.7
8,2024-07-24 08:00:00,18.4
9,2024-07-24 09:00:00,18.6


In [14]:
df_day.describe()

Unnamed: 0,TT_TU
count,24.0
mean,18.091667
std,2.505631
min,13.9
25%,16.2
50%,18.05
75%,20.525
max,22.5
