# Forecast Skill Validation

This example shows how to evaluate Salient's probabilistic forecasts against observations and calculate meaningful metrics. It demonstrates [validation best practices](https://salientpredictions.notion.site/Validation-0220c48b9460429fa86f577914ea5248) such as:

- Proper scoring using the Continuous Ranked Probability Score (CRPS)
  - Considers the full forecast distribution to reward both accuracy and precision
  - Less sensitive to climatology decisions than metrics like Anomaly Correlation
- A long backtesting period (2015-2022)
  - Short evaluation periods are subject to noise


In [None]:
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sys
import os

try:
    import salientsdk as sk
except ModuleNotFoundError as e:
    if os.path.exists("../salientsdk"):
        sys.path.append(os.path.abspath(".."))
        import salientsdk as sk
    else:
        raise ModuleNotFoundError("Install salient SDK with: pip install salientsdk")

# Prevent wrapping on tables for readability
pd.set_option("display.width", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.expand_frame_repr", False)

# The variable that we'll be evaluating.
var = "temp"
fld = "vals"
timescale = "sub-seasonal"
ref_model = "clim"  # Works across all timescale values.

fast = True
if fast:
    # 1 year of data shows how the mechanics of the validation works.  This is
    # not recommended for full validation, but does quickly demonstrate the theory.
    (start_date, end_date) = ("2021-01-01", "2021-12-31")
else:
    # Validating 2015-2022 will replicate skill scores from hindcast_summary
    # with split_set = "test"
    (start_date, end_date) = ("2015-01-01", "2022-12-31")

sk.set_file_destination("validation_example")
sk.login("username", "password")

<requests.sessions.Session at 0x7ff3b4c5ba10>

## Set geographic bounds

The Salient SDK uses a "Location" object to specify the geographic bounds of a request.


In [None]:
if True:  # single lat/lon point
    loc = sk.Location(26.125, -97.375)  # SpaceX spaceport
elif False:  # "shapefile" - gridded analysis of the ERCOT footprint
    lons = [-107, -98, -94, -94, -100, -103, -103, -107]
    lats = [32, 25, 29, 34, 36, 36, 32, 32]
    shape_file = sk.upload_shapefile(list(zip(lons, lats)), "ercot_simple", force=False)
    loc = sk.Location(shapefile=shape_file)
elif True:  # "location_file"
    loc = sk.Location(
        location_file=sk.upload_location_file(
            lats=[37.7749, 33.9416, 32.7336],
            lons=[-122.4194, -118.4085, -117.1897],
            names=["SFO", "LAX", "SAN"],
            geoname="CA_Airports",
        )
    )
else:
    raise ValueError("Invalid location type")

print(loc)

(26.125, -97.375)


## Forecast

The [`forecast_timeseries`](https://sdk.salientpredictions.com/api/#salientsdk.forecast_timeseries) API endpoint and SDK function returns Salient's native temporally granular weekly/monthly/quarterly forecasts.


In [None]:
date_range = sk.get_hindcast_dates(start_date=start_date, end_date=end_date, timescale=timescale)

fcst = sk.forecast_timeseries(
    loc=loc,
    variable=var,
    field=fld,
    date=date_range,  # OK to request multiple forecast dates
    timescale=timescale,
    model=["blend", ref_model],
    reference_clim="30_yr",  # this is the climatology used by data_timeseries
    verbose=False,
    force=False,
    strict=False,  # There is missing data in 2020.  Work around it.
)

# Because we requested multiple forecast dates and models, the result is a vector of file names
print(fcst)

                                             file_name        date  model
0    validation_example/forecast_timeseries_3275aee...  2021-01-03  blend
1    validation_example/forecast_timeseries_904f9fc...  2021-01-03   clim
2    validation_example/forecast_timeseries_95c7514...  2021-01-10  blend
3    validation_example/forecast_timeseries_2e45506...  2021-01-10   clim
4    validation_example/forecast_timeseries_d7d535b...  2021-01-17  blend
..                                                 ...         ...    ...
99   validation_example/forecast_timeseries_abf18bf...  2021-12-12   clim
100  validation_example/forecast_timeseries_754b1b4...  2021-12-19  blend
101  validation_example/forecast_timeseries_d290771...  2021-12-19   clim
102  validation_example/forecast_timeseries_55f61a5...  2021-12-26  blend
103  validation_example/forecast_timeseries_42c20e5...  2021-12-26   clim

[104 rows x 3 columns]


In [None]:
# Example forecast file is for a single model and a single forecast_date
print(xr.load_dataset(fcst["file_name"].values[0]))

<xarray.Dataset> Size: 1kB
Dimensions:                 (quantiles: 23, lead_weekly: 5, nbnds: 2,
                             location: 1)
Coordinates:
  * quantiles               (quantiles) float64 184B 0.01 0.025 ... 0.975 0.99
    forecast_period_weekly  (lead_weekly, nbnds) datetime64[ns] 80B 2020-12-3...
  * lead_weekly             (lead_weekly) int32 20B 1 2 3 4 5
    lat                     (location) float64 8B 26.12
    lon                     (location) float64 8B -97.38
    month                   int32 4B 12
    forecast_date_weekly    datetime64[ns] 8B 2020-12-30
Dimensions without coordinates: nbnds, location
Data variables:
    vals_weekly             (lead_weekly, location, quantiles) float64 920B 1...
Attributes:
    clim_period:  ['1990-01-01', '2019-12-31']
    short_name:   temp
    timescale:    sub-seasonal
    region:       north-america


## Historical

Download daily historical values from [`data_timeseries`](https://sdk.salientpredictions.com/api/#salientsdk.data_timeseries) and then aggregate to match the forecasts, so that we can ensure that all forecasts use the same dates.


In [None]:
# Get additional historical data beyond end_date to make sure we have enough
# observed days to compare with the final forecast.
duration = {"sub-seasonal": 8 * 5, "seasonal": 31 * 3, "long-range": 95 * 4}[timescale]
hist = sk.data_timeseries(
    loc=loc,
    variable=var,
    field=fld,
    start=np.datetime64(start_date) - np.timedelta64(5, "D"),
    end=np.datetime64(end_date) + np.timedelta64(duration, "D"),
    frequency="daily",
    # reference_clim="30_yr",  implicitly uses 30 yr climatology
    verbose=False,
    force=False,
)
print(xr.load_dataset(hist))

<xarray.Dataset> Size: 7kB
Dimensions:  (time: 410, location: 1)
Coordinates:
  * time     (time) datetime64[ns] 3kB 2020-12-27 2020-12-28 ... 2022-02-09
    lat      (location) float64 8B 26.12
    lon      (location) float64 8B -97.38
Dimensions without coordinates: location
Data variables:
    vals     (time, location) float64 3kB 18.9 21.31 22.17 ... 15.58 14.28 14.44
Attributes:
    long_name:   2 metre temperature
    units:       degC
    clim_start:  1990-01-01
    clim_end:    2019-12-31


## Calculate Skill Metrics

Compare the forecast and observed datasets to see how well they match.


In [None]:
skill_fcst = sk.skill.crps(observations=hist, forecasts=fcst[fcst["model"] == "blend"])
print(skill_fcst)

<xarray.Dataset> Size: 76B
Dimensions:      (location: 1, lead_weekly: 5)
Coordinates:
    lat          (location) float64 8B 26.12
    lon          (location) float64 8B -97.38
  * lead_weekly  (lead_weekly) int32 20B 1 2 3 4 5
Dimensions without coordinates: location
Data variables:
    crps         (lead_weekly, location) float64 40B 0.4233 0.757 ... 0.8848
Attributes:
    clim_period:  ['1990-01-01', '2019-12-31']
    short_name:   crps
    timescale:    sub-seasonal
    region:       north-america
    long_name:    CRPS


### Calculate Relative Skill

CRPS shows skill without context. A "skill score" will compare two different skills for a relative value. In the example below, we will compare the Salient blend with climatology (historical averages).


In [None]:
# Calculate the "reference" skill score.  This is what we are comparing Salient Blend against.
skill_ref = sk.skill.crps(observations=hist, forecasts=fcst[fcst["model"] == ref_model])

skill_score = sk.skill.crpss(forecast=skill_fcst, reference=skill_ref)

print(skill_score)

<xarray.Dataset> Size: 76B
Dimensions:      (location: 1, lead_weekly: 5)
Coordinates:
    lat          (location) float64 8B 26.12
    lon          (location) float64 8B -97.38
  * lead_weekly  (lead_weekly) int32 20B 1 2 3 4 5
Dimensions without coordinates: location
Data variables:
    crpss        (lead_weekly, location) float64 40B 0.5714 0.252 ... 0.1038
Attributes:
    short_name:  crpss
    long_name:   CRPSS


### Assemble all results into a single table


In [None]:
skill_table = (
    (
        xr.merge(
            [
                (skill_ref.rename({"crps": "Reference CRPS"})).round(2),
                skill_fcst.rename({"crps": "Salient CRPS"}).round(2),
                (skill_score * 100).rename({"crpss": "CRPS Skill Score (%)"}).round(1),
            ]
        )
        .to_dataframe()
        .reset_index()
    )
    .dropna(how="any")
    .drop(columns=["lat", "lon"])
)
if "location" in skill_table.columns and loc.location_file is None:
    skill_table = skill_table.drop(columns=["location"])


print(skill_table)

   location  lead_weekly  Reference CRPS  Salient CRPS  CRPS Skill Score (%)
0         0            1            0.99          0.42                  57.1
1         0            2            1.01          0.76                  25.2
2         0            3            1.01          0.81                  20.1
3         0            4            1.00          0.84                  16.1
4         0            5            0.99          0.88                  10.4


### Compare to pre-computed skill

Salient pre-calculates skill metrics as a convenience.


In [None]:
skill_summ = pd.read_csv(
    sk.hindcast_summary(
        loc=loc,
        interp_method="linear",
        metric="crps",
        variable=var,
        timescale=timescale,
        reference=ref_model,
        split_set="test",
    )
)
print(skill_summ.drop(columns="Reference Model"))

     Lead  Reference CRPS  Salient CRPS  Salient CRPS Skill Score (%)
0  Week 1            0.96          0.38                          60.1
1  Week 2            0.96          0.72                          25.1
2  Week 3            0.96          0.84                          11.7
3  Week 4            0.96          0.88                           8.1
4  Week 5            0.96          0.88                           8.2


In [None]:
# "fast" mode does not download forecasts for the full test period,
# so results aren't expected to be consistent.
if fast:
    print("Set fast=False for consistent calculations.")
else:
    skill_fcst["crps"].plot(x="lead_weekly", label="manual calculation")
    plt.plot(skill_fcst.lead_weekly, skill_summ["Salient CRPS"], label="hindcast_sumary")
    plt.legend()

Set fast=False for consistent calculations.
