# Consistent scores

A "consistent scoring function" is a scoring function where following a forecast directive will optimise the forecaster's expected score. It is important to use consistent scoring functions to avoid the "forecaster's dilemma" ([Lerch et al., 2017](https://projecteuclid.org/journals/statistical-science/volume-32/issue-1/Forecasters-Dilemma-Extreme-Events-and-Forecast-Evaluation/10.1214/16-STS588.full)) (i.e., should a forecaster issue an honest forecast or should they issue one that optimises their expected score?). Consistent scoring rules are formally defined in([Gneiting et al., 2011](https://www.tandfonline.com/doi/abs/10.1198/jasa.2011.r10138)).


The MSE is a consistent score for forecasting the mean and the MAE is a consistent score for forecasting the median. But did you know that there are a whole family of scores that are consistent for prediciting the mean, median, or a quantile? 

`score`'s consistent scoring module provides access to these families of consistent scores that can we used to emphasise predictive performance across desired decision thresholds.

### Example 1. A score consistent with the mean
Let's jump into an example that illustrates a scoring function that is consistent with predicting the mean value using `scores`. Suppose we want to evaluate the performance of expected daily precipitation forecasts, but we want to place increasing importance on correctly forecasting extreme values. 


We want to used the `consistent_expectile_score` in `scores`` to evaluate the expected daily precipitation forecasts. Note that medians relate to quantiles in an analogous way to how the mean relates to [expectiles](https://en.wikipedia.org/wiki/Expectile).

Our consistent scoring function for expectiles is 

 $$ S(x, y) =
    \begin{cases}
    (1 - \alpha)(\phi(y) - \phi(x) - \phi'(x)(y-x)), & y < x \\
    \alpha(\phi(y) - \phi(x) - \phi'(x)(y-x)), & x \leq y
    \end{cases}
$$

Where:
- $x$ is the forecast
- $y$ is the observation
- $\alpha$ is the expectile level
- $\phi$ is a [convex function](https://en.wikipedia.org/wiki/Convex_function) of a single variable
- $\phi'$ is the [subderivative](https://en.wikipedia.org/wiki/Subderivative) of $\phi$. The subderivative is a generalisation of the derivative for convex functions and coincides with the derivative when the convex function is differentiable
- $S(x,y)$ is the score.

To make the expectile the consistent with the mean, set $\alpha=0.5$.

Next we need to determine our $\phi(z)$ and $\phi'(z)$ functions. Let's assume that that the importance of predicting extreme rainfall increases exponentially.

First we create the weighting function

$$\phi''(z) = \exp ^\frac{z}{10}$$ 

This will place increasing importance on more extreme rainfall thresholds.

Next we need to integrate $\phi''(z)$ twice so that we can derive the functions for our consistent scoring function.

$$\phi'(z) = 10\exp ^\frac{z}{10}$$ 
and 
$$\phi(z) = z + 100\exp ^\frac{z}{10}$$ 

These equations can be substituted in $ S(x, y)$. Note that to use the `consistent_expectile_score` in scores, you only need to define $\phi(s)$ and $\phi'(s)$, but not $S(x, y)$.

Let's illustrate how this can be done in scores using some synthetic rainfall data.

In [49]:
from scores.continuous import consistent_expectile_score
import numpy as np
import pandas as pd
import xarray as xr

In [78]:
# First define phi and phi prime
def xr_loss_func(s):
    """Phi"""
    result = 0.5 * s**2
    return xr.where(s >=0, result, 0)

def xr_loss_func_prime(s):
    """Phi prime"""
    result = s + (s**3)/6
    return xr.where(s >=0, result, 0)

In [68]:
# Generate some synthetic data
lat = np.linspace(-90, 90, 10)
lon = np.linspace(-180, 180, 20)
times = pd.date_range('2023-11-19', periods=5)

forecast = np.random.uniform(0, 100, size=(len(lat), len(lon), len(times)))
forecast = xr.DataArray(
    forecast,
    dims=('lat', 'lon', 'time'),
    coords={'lat': lat, 'lon': lon, 'time': times}
)

obs = forecast + np.random.random((len(lat), len(lon), len(times)))
obs = obs.clip(min=0)
obs = xr.DataArray(
    obs,
    dims=('lat', 'lon', 'time'),
    coords={'lat': lat, 'lon': lon, 'time': times}
)

In [79]:
# Calculate score
consistent_expectile_score(forecast, obs, alpha=0.5, phi=xr_loss_func, phi_prime=xr_loss_func_prime)

In [82]:
forecast