# ZAMG Data Hub: Synoptische Daten

Quellen:
- **Messstationen Zehnminutendaten**: https://data.hub.zamg.ac.at/dataset/synop-v1-1h
- **API Client:** https://dataset.api.hub.zamg.ac.at/app/frontend/station/historical/synop-v1-1h?anonymous=true

Beachte die Parameterbeschreibung in der Datei *synop_params.tsv* (vor allem bei Niederschlagsmengen und Wind).
Tmax wird um 18h UTC gemeldet, Tmin wird um 6h UTC gemeldet.
Bei neueren Meldungen wird beides um 6h und 18h UTC gemeldet.

In [1]:
import pandas as pd, datetime as dt, numpy as np, requests as req
import matplotlib.pyplot as plt
import bz2, os
pd.set_option('display.max_rows', 128)

In [2]:
def generate_url(date_from, date_to, station_ids, parameters):
    get_params = {"parameters": ",".join(parameters), "start": date_from,
                "end":date_to, "station_ids": ",".join([str(val) for val in station_ids]),
                "output_format":"csv"}
    return "https://dataset.api.hub.zamg.ac.at/v1/station/historical/synop-v1-1h?"+'&'.join([f"{key}={val}" for key, val in get_params.items()])

Auslesen der möglichen Stationen und der möglichen Parameter aus den Metadaten.

In [3]:
metadata = req.get("https://dataset.api.hub.zamg.ac.at/v1/station/historical/synop-v1-1h/metadata").json()
params = pd.DataFrame(metadata.get("parameters")).astype(
    {"name": "string", "long_name": "string", "desc": "string", "unit": "string"})
stations = pd.DataFrame(metadata.get("stations")).astype(
    {"type": "string", "id": int, "group_id": "Int64", "name": "string", "state": "string",
     "lat": float, "lon": float, "altitude": float, "valid_from": "datetime64", "valid_to": "datetime64",
     "has_sunshine": bool, "has_global_radiation": bool, "is_active": bool })

In [4]:
params.to_csv("synop_params.tsv", sep="\t", index=False)
stations.sort_values(["state", "id"]).to_csv("synop_stations.tsv", sep="\t", index=False)

Abfragen von bestimmten Parametern einer oder mehrerer Stationen.

In [7]:
# Append data to existing datafile.
measurements = pd.read_parquet("../zamg_weatherdata.parquet") if os.path.exists("../zamg_weatherdata.parquet") else None

In [None]:
station_ids = stations.loc[stations.name.isin(["GUMPOLDSKIRCHEN", "WIEN-INNERE STADT",
   "WIEN/HOHE WARTE", "RAX/SEILBAHN-BERGSTAT", "BREGENZ", "SONNBLICK - AUTOM."]), "id"].sort_values()
parameters = ["T", "Tmax", "Tmin", "Td", "rel", "dd", "ff", "Pg", "Pp", "RR3", "RRR", "tr", "tr3", "sonne"]
dtypes = {"station": int}
dtypes.update({val:float  for val in parameters})
first_year = 1993
last_year = 2022
for station_id in station_ids:
    print(f"Lade Station {station_id}...")
    for year in range(first_year, last_year+1, 1):
        print(f"    Lade Jahr {year}...")
        download_url = generate_url(f"{year}-01-01T00:00:00Z", f"{year}-12-31T23:50:00Z", [station_id], parameters)
        df = pd.read_csv(download_url, sep=",", dtype=dtypes, parse_dates=["time"]).query("T.notna()")
        measurements = pd.concat([df, measurements]) if measurements is not None else df

In [None]:
pd.io.clipboards.to_clipboard(measurements.sample(5).to_markdown(index=False), excel=False)    # pip install tabulate --upgrade

In [8]:
measurements = measurements.sort_values(["station", "time"])
measurements.to_csv("../zamg_weatherdata.csv.bz2", compression={'method': 'bz2', 'compresslevel': 9}, sep=";", encoding="utf-8", index=False)
csv_data = measurements.to_csv(sep=";", index=False)
with bz2.open("../zamg_weatherdata_unicode.csv.bz2", "wb") as f: f.write(csv_data.encode("utf-16"))
measurements.to_parquet("../zamg_weatherdata.parquet", compression="brotli")

## Anzahl der Messwerte pro Tag

Gibt eine Statistik aus, wie viele Tagesmesswerte pro Jahr und Station vorliegen.
Dabei werden Tage gezählt, wo ein 6h und 18h UTC Messwert übermittelt wurde.

In [16]:
main_measurements = measurements.loc[(measurements.time.dt.hour == 6) | (measurements.time.dt.hour == 18), ["time", "station", "T"]]
daily_measurements = main_measurements.groupby([main_measurements.time.dt.date, "station"]).aggregate(count=("T", "count")) \
    .query("count == 2").reset_index().astype({"time": "datetime64"})
yearly_measurements = daily_measurements.groupby([daily_measurements.time.dt.year, "station"]).aggregate({"count": "size"}).unstack().fillna(0).astype(int)
yearly_measurements.to_clipboard()