# Probabilistic metric integration
After developing probabilistic metrics in #2232 need to ensure they are compatible with useful features such as grid search for model parameters. There are two key problems that need to be solved for this:

1. proba metrics take in the output of `predict_quantile` or `predict_interval` (or `predict_proba`) where normal metrics just take predict. This means we need to change what predictions are used inside the grid search.

2. Some probabilistic metrics have their own hyperparameters. For example the quantile used in a pinball loss. Currently this is inferred from the data inputted, however for a grid search we will need to somehow tell it what quantile to produce. 

To solve 1. could either create some `set_default` function which determines what the forecaster implements for predict (_predict, _predict_quantile or _predict_interval) or use tags inside the grid search evaluation that retrieves the type of metric being used and calls the corresponding predict function.

To solve 2. we could do a small refactor to the probabilistic metrics, where we specify the hyperprameter(s) we want and it retrieves the correct data from the input (and raises an error if it isn't there). This will allow it to require a specific quantile but reduces flexibility as a user will have to instantiate a new metric class for each different set of quantiles they want to evaluate.

In [1]:
# Basic imports
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)
import numpy as np
import pandas as pd

In [2]:
# Prep data/forecaster
from sktime.datasets import load_airline
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster

y = np.log1p(load_airline())
y_train, y_test = temporal_train_test_split(y)
fh = np.arange(len(y_test)) + 1

f = ThetaForecaster(sp=12)
f.fit(y_train)
y_pred = f.predict(fh=fh)
q_pred = f.predict_quantiles(fh=fh, alpha=0.5)
i_pred = f.predict_interval(fh=fh)

In [3]:
q_pred.head()

Unnamed: 0_level_0,Quantiles
Unnamed: 0_level_1,0.5
1958-01,5.84779
1958-02,5.841117
1958-03,5.998219
1958-04,5.954095
1958-05,5.950747


In [4]:
i_pred.head()

Unnamed: 0_level_0,Coverage,Coverage
Unnamed: 0_level_1,0.9,0.9
Unnamed: 0_level_2,lower,upper
1958-01,5.771386,5.924195
1958-02,5.750227,5.932007
1958-03,5.894854,6.101584
1958-04,5.839605,6.068584
1958-05,5.826123,6.075371


In [5]:
# Define probabilistic metric
from sktime.performance_metrics.forecasting.probabilistic import PinballLoss

loss = PinballLoss()

In [6]:
loss(y_test, q_pred)

Unnamed: 0,0.5
0,0.026143


In [7]:
from sktime.forecasting.model_evaluation import evaluate
from sktime.forecasting.model_selection import (
    ExpandingWindowSplitter,
    ForecastingGridSearchCV,
)

cv = ExpandingWindowSplitter(
    initial_window=24, step_length=12, fh=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
)

param_grid = {"sp": [6, 12]}

gcv = ForecastingGridSearchCV(f, cv, param_grid, scoring=loss)

In [8]:
gcv.fit(y_test)

KeyError: 'mean_test_PinballLoss'

The ForecastingGridSearchCV relies on `sktime.forecasting.model_evaluation.evaluate` to evaluate metric scores, hence this is what we will need to change to allow it to work. It also has it's own `score()` function which could also be changed but this isn't used in fitting.

In [None]:
evaluate(f, cv, y_test, scoring=loss)

TypeError: ['y_pred should be a pd.DataFrame', 'y_pred should be a pd.DataFrame']

If we naively substitute the normal loss for a quantile loss we get an input error (as expected).

We will first try changing the evaluate function.

In [None]:
import time

from sklearn.base import clone

from sktime.forecasting.base import ForecastingHorizon
from sktime.utils.validation.forecasting import (
    check_cv,
    check_fh,
    check_scoring,
    check_X,
)
from sktime.utils.validation.series import check_series


def evaluate(
    forecaster,
    cv,
    y,
    X=None,
    strategy="refit",
    scoring=None,
    fit_params=None,
    return_data=False,
):
    """Evaluate forecaster using timeseries cross-validation.

    Parameters
    ----------
    forecaster : sktime.forecaster
        Any forecaster
    cv : Temporal cross-validation splitter
        Splitter of how to split the data into test data and train data
    y : pd.Series
        Target time series to which to fit the forecaster.
    X : pd.DataFrame, default=None
        Exogenous variables
    strategy : {"refit", "update"}
        Must be "refit" or "update". The strategy defines whether the `forecaster` is
        only fitted on the first train window data and then updated, or always refitted.
    scoring : subclass of sktime.performance_metrics.BaseMetric, default=None.
        Used to get a score function that takes y_pred and y_test arguments
        and accept y_train as keyword argument.
        If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).
    fit_params : dict, default=None
        Parameters passed to the `fit` call of the forecaster.
    return_data : bool, default=False
        Returns three additional columns in the DataFrame, by default False.
        The cells of the columns contain each a pd.Series for y_train,
        y_pred, y_test.

    Returns
    -------
    pd.DataFrame
        DataFrame that contains several columns with information regarding each
        refit/update and prediction of the forecaster.
    """
    _check_strategy(strategy)
    cv = check_cv(cv, enforce_start_with_window=True)
    scoring = check_scoring(scoring)
    y = check_series(
        y,
        enforce_univariate=forecaster.get_tag("scitype:y") == "univariate",
        enforce_multivariate=forecaster.get_tag("scitype:y") == "multivariate",
    )
    X = check_X(X)
    fit_params = {} if fit_params is None else fit_params

    # Define score name.
    score_name = "test_" + scoring.name

    # Initialize dataframe.
    results = pd.DataFrame()

    # Run temporal cross-validation.
    for i, (train, test) in enumerate(cv.split(y)):
        # split data
        y_train, y_test, X_train, X_test = _split(y, X, train, test, cv.fh)

        # create forecasting horizon
        fh = ForecastingHorizon(y_test.index, is_relative=False)

        # fit/update
        start_fit = time.perf_counter()
        if i == 0 or strategy == "refit":
            forecaster = clone(forecaster)
            forecaster.fit(y_train, X_train, fh=fh, **fit_params)

        else:  # if strategy == "update":
            forecaster.update(y_train, X_train)
        fit_time = time.perf_counter() - start_fit

        # predict
        start_pred = time.perf_counter()
        if scoring.get_tag("scitype:y_pred") == "pred_quantiles":
            y_pred = forecaster.predict_quantiles(fh, X=X_test, **fit_params)
        else:
            y_pred = forecaster.predict(fh, X=X_test)

        pred_time = time.perf_counter() - start_pred

        # score
        score = scoring(y_test, y_pred, y_train=y_train)

        # save results
        results = results.append(
            {
                score_name: score,
                "fit_time": fit_time,
                "pred_time": pred_time,
                "len_train_window": len(y_train),
                "cutoff": forecaster.cutoff,
                "y_train": y_train if return_data else np.nan,
                "y_test": y_test if return_data else np.nan,
                "y_pred": y_pred if return_data else np.nan,
            },
            ignore_index=True,
        )

    # post-processing of results
    if not return_data:
        results = results.drop(columns=["y_train", "y_test", "y_pred"])
    results["len_train_window"] = results["len_train_window"].astype(int)

    return results


def _split(y, X, train, test, fh):
    """Split y and X for given train and test set indices."""
    y_train = y.iloc[train]
    y_test = y.iloc[test]

    cutoff = y_train.index[-1]
    fh = check_fh(fh)
    fh = fh.to_relative(cutoff)

    if X is not None:
        X_train = X.iloc[train, :]

        # We need to expand test indices to a full range, since some forecasters
        # require the full range of exogenous values.
        test = np.arange(test[0] - fh.min(), test[-1]) + 1
        X_test = X.iloc[test, :]
    else:
        X_train = None
        X_test = None

    return y_train, y_test, X_train, X_test


def _check_strategy(strategy):
    """Assert strategy value.

    Parameters
    ----------
    strategy : str
        strategy of how to evaluate a forecaster

    Raises
    ------
    ValueError
        If strategy value is not in expected values, raise error.
    """
    valid_strategies = ("refit", "update")
    if strategy not in valid_strategies:
        raise ValueError(f"`strategy` must be one of {valid_strategies}")

In [None]:
evaluate(f, cv, y_test, scoring=loss)

       0.05      0.95
0  0.008705  0.007918


Unnamed: 0,test_PinballLoss,fit_time,pred_time,len_train_window,cutoff
0,0.05 0.95 0 0.008705 0.007918,0.00721,0.008943,24,1959-12


## Other solution
A different option would be to change forecasters so that they included a default implementation for predict. 

```{python}
forecaster():
    def __init__(self):
        self.pred_default = "point"
        pred_types = {
            "point":self._predict, 
            "quantile":self._predict_quantile,
            "interval":self._predict_interval}

    def _predict(self, X, fh):
        notimplementederror()

    def _predict_quantile(self, X, fh):
        notimplementederror()

    def _predict_interval(self, X, fh):
        notimplementederror()

    def predict(self, X, fh, type = None):
        if type = None:
            type = self.pred_default

    return pred_types[type](X, fh)
        
```

This would require changing the base class of forecasters and also still wouldn't solve the issue of how we pass what quantiles we need to the predict function.