# Stationbench Tutorial

This tutorial demonstrates how to use the stationbench repository to:
1. Preprocess weather forecast and ground truth data
2. Calculate verification metrics
3. Compare multiple forecasts and visualize results

This tutorial runs in a notebook environment. The same commands can be run in a terminal or script.

## Setup

First, complete the [setup guide](setup.md) then import the required packages.

In [5]:
import xarray as xr
import pandas as pd
import wandb
import numpy as np
import os

## 1. Data Preprocessing

Stationbench expects forecast data and ground truth observations in Zarr format. Let's look at both datasets.

### Data format

The forecast data should be a Zarr dataset with the following structure:

```
<xarray.Dataset>
Dimensions:
  - time: Forecast initialization times
  - prediction_timedelta: Forecast lead times
  - latitude: Grid latitudes
  - longitude: Grid longitudes

Coordinates:
  - latitude: (latitude) float32, grid latitudes in degrees North
  - longitude: (longitude) float32, grid longitudes in degrees East  
  - prediction_timedelta: (prediction_timedelta) timedelta64[ns], forecast lead times
  - time: (time) datetime64[ns], initialization times

Data variables:
  - 10m_wind_speed: (time, prediction_timedelta, latitude, longitude) float32
  - 2m_temperature: (time, prediction_timedelta, latitude, longitude) float32
  - ssrd: (time, prediction_timedelta, latitude, longitude) float32
```

For this tutorial, lets create a simple forecast dataset.


In [17]:
lats = np.linspace(36, 72, 74)
lons = np.linspace(-15, 45, 124)
times = pd.date_range("2023-03-18", "2023-07-31", freq="D")
prediction_timedeltas = pd.timedelta_range(start='0h', end='23h', freq='1h')

forecast = xr.Dataset(
    data_vars=dict(
        temperature=(["time", "prediction_timedelta", "latitude", "longitude"], np.random.rand(len(times), len(prediction_timedeltas), len(lats), len(lons))),
        wind_speed=(["time", "prediction_timedelta", "latitude", "longitude"], np.random.rand(len(times), len(prediction_timedeltas), len(lats), len(lons))),
    ),
    coords=dict(
        latitude=("latitude", lats),
        longitude=("longitude", lons),
        prediction_timedelta=("prediction_timedelta", prediction_timedeltas),
        time=("time", times)
    )
)

forecast.to_zarr("data/forecast.zarr", mode="w")
forecast

Let's also have a look at the METEOSTAT ground truth data.

In [14]:
ground_truth_path = 'gs://jua-benchmarking/ground_truth/synoptic/synoptic-2023-1h-v3.zarr'
ground_truth = xr.open_zarr(ground_truth_path)
ground_truth

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 589.81 kiB 294.91 kiB Shape (75496,) (37748,) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  1,

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 589.81 kiB 294.91 kiB Shape (75496,) (37748,) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  1,

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 589.81 kiB 294.91 kiB Shape (75496,) (37748,) Dask graph 2 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  1,

Unnamed: 0,Array,Chunk
Bytes,589.81 kiB,294.91 kiB
Shape,"(75496,)","(37748,)"
Dask graph,2 chunks in 2 graph layers,2 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.93 GiB 3.24 MiB Shape (8760, 75496) (90, 4719) Dask graph 1568 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  8760,

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.93 GiB 3.24 MiB Shape (8760, 75496) (90, 4719) Dask graph 1568 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  8760,

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 4.93 GiB 3.24 MiB Shape (8760, 75496) (90, 4719) Dask graph 1568 chunks in 2 graph layers Data type float64 numpy.ndarray",75496  8760,

Unnamed: 0,Array,Chunk
Bytes,4.93 GiB,3.24 MiB
Shape,"(8760, 75496)","(90, 4719)"
Dask graph,1568 chunks in 2 graph layers,1568 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


Notice the difference that the ground truth data is not a grid but unstructured point data made up of stations. This package will automatically align the grid data to the station locations using linear interpolation.

## 2. Calculate Verification Metrics

Now we'll calculate RMSE between the forecast and ground truth data.
For this we need to set the following parameters:
- `--forecast_loc`: Location of the forecast data (required)
- `--ground_truth_loc`: Location of the ground truth data (required)
- `--start_date`: Start date for benchmarking (required)
- `--end_date`: End date for benchmarking (required)
- `--output`: Output path for benchmarks (required)
- `--region`: Region to benchmark (see `regions.py` for available regions)
- `--name_10m_wind_speed`: Name of 10m wind speed variable (optional)
- `--name_2m_temperature`: Name of 2m temperature variable (optional)

In [18]:
forecast_loc = "data/forecast.zarr"
ground_truth_loc = "gs://jua-benchmarking/ground_truth/synoptic/synoptic-2023-1h-v3.zarr"
start_date = "2023-03-18"
end_date = "2023-07-31"
output = "data/tutorial_benchmark.zarr"
region = "europe"
name_10m_wind_speed = "wind_speed"
name_2m_temperature = "temperature"

command = f"poetry run python ../stationbench/calculate_metrics.py --forecast_loc {forecast_loc} --ground_truth_loc {ground_truth_loc} --start_date {start_date} --end_date {end_date} --output {output} --region {region} --name_10m_wind_speed {name_10m_wind_speed} --name_2m_temperature {name_2m_temperature}"

print("Running benchmark command: ", command)
os.system(command)

Running benchmark command:  poetry run python ../stationbench/calculate_metrics.py --forecast_loc data/forecast.zarr --ground_truth_loc gs://jua-benchmarking/ground_truth/synoptic/synoptic-2023-1h-v3.zarr --start_date 2023-03-18 --end_date 2023-07-31 --output data/tutorial_benchmark.zarr --region europe --name_10m_wind_speed wind_speed --name_2m_temperature temperature


2025-01-13 17:13:36,760 - root - INFO - Dask dashboard http://127.0.0.1:8787/status
2025-01-13 17:13:36,760 - __main__ - INFO - preprocessing dataset data/forecast.zarr
2025-01-13 17:13:36,968 - __main__ - INFO - creating valid time...
2025-01-13 17:13:36,970 - __main__ - INFO - Selecting region: https://linestrings.com/bbox/#-15,36,45,72
2025-01-13 17:13:36,971 - __main__ - INFO - Finished processing of data/forecast.zarr: <xarray.Dataset> Size: 479MB
Dimensions:         (latitude: 74, longitude: 124, lead_time: 24, init_time: 136)
Coordinates:
  * latitude        (latitude) float64 592B 36.0 36.49 36.99 ... 71.51 72.0
  * longitude       (longitude) float64 992B -15.0 -14.51 -14.02 ... 44.51 45.0
  * lead_time       (lead_time) timedelta64[ns] 192B 00:00:00 ... 23:00:00
  * init_time       (init_time) datetime64[ns] 1kB 2023-03-18 ... 2023-07-31
    valid_time      (init_time, lead_time) datetime64[ns] 26kB 2023-03-18 ......
Data variables:
    2m_temperature  (init_time, lead_time, 

256

## 3. Compare Multiple Forecasts

Let's compare our forecast against reference forecasts and visualize the results using Weights & Biases.

In [None]:
# Initialize W&B
wandb.init(project="stationbench-tutorial", name="example-comparison")

# Define reference forecasts
reference_forecasts = {
    "reference_model": "data/reference_benchmarks.zarr"
}

# Regions to analyze
regions = ["europe", "north_america"]

# Generate comparison metrics and plots
benchmarking = PointBasedBenchmarking(wandb_run=wandb.run)
metrics = benchmarking.generate_metrics(
    evaluation_benchmarks=benchmarks,
    reference_benchmark_locs=reference_forecasts,
    region_names=regions
)

# Log metrics to W&B
wandb.log(metrics)

## Understanding the Results

The comparison generates several visualizations:

1. **Geographical scatter plots**:
   - RMSE values at each station location
   - Skill scores comparing against reference forecasts

2. **Time series plots**:
   - RMSE evolution over forecast lead time
   - Skill score evolution over forecast lead time

These plots are automatically uploaded to your W&B project where you can:
- Compare different model versions
- Track performance improvements
- Share results with your team

Visit your W&B project page to explore the interactive visualizations!