Skip to content

Commit

Permalink
[DOC] forecaster tutorial: multivariate forecasting, probabilistic fo…
Browse files Browse the repository at this point in the history
…recasting (#2041)

This PR adds to the forecasting tutorial:
* a section on multivariate forecasting
* a section on prediction intervals and predictive quantiles
  • Loading branch information
kejsitake committed Mar 23, 2022
1 parent 52edddb commit 81bfb34
Show file tree
Hide file tree
Showing 2 changed files with 197 additions and 14 deletions.
210 changes: 196 additions & 14 deletions examples/01_forecasting.ipynb
Expand Up @@ -60,7 +60,8 @@
" * [1.2.1 Basic deployment workflow in a nutshell](#section_1_2_1)\n",
" * [1.2.2 Forecasters that require the horizon already in `fit`](#section_1_2_2)\n",
" * [1.2.3 Forecasters that can make use of exogeneous data](#section_1_2_3)\n",
" * [1.2.4 Prediction intervals](#section_1_2_4) \n",
" * [1.2.4 Multivariate Forecasters](#section_1_2_4)\n",
" * [1.2.5 Prediction intervals and quantile forecasts](#section_1_2_5) \n",
" * [1.3 basic evaluation workflow - evaluating a batch of forecasts against ground truth observations](#section_1_3) \n",
" * [1.3.1 The basic batch forecast evaluation workflow in a nutshell - function metric interface](#section_1_3_1)\n",
" * [1.3.2 The basic batch forecast evaluation workflow in a nutshell - metric class interface](#section_1_3_2) \n",
Expand Down Expand Up @@ -593,11 +594,107 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2.4 prediction intervals<a class=\"anchor\" id=\"section_1_2_4\"></a>\n",
"#### 1.2.4. multivariate forecasting <a class=\"anchor\" id=\"section_1_2_4\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not all forecasters in sktime are multivariate. Some examples of multivariate forecasters are: `MultiplexForecaster`, `EnsembleForecaster`,`TransformedTargetForecaster` etc. In order to determine is a forecaster can be multivariate, one can look at the `scitype:y` in `tags`, which should be set to `multivariate` or '`both`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to find the complete list of multivariate forecasters you can use the code below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sktime.registry import all_estimators\n",
"\n",
"for forecaster in all_estimators(filter_tags={\"scitype:y\": [\"multivariate\", \"both\"]}):\n",
" print(forecaster[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is an example of the general workflow of multivariate `ColumnEnsembleForecaster` using the longley dataset from `sktime.datasets`. The workflow is the same as in the univariate forecasters, but the input has more than one variables (columns)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sktime.datasets import load_longley\n",
"from sktime.forecasting.compose import ColumnEnsembleForecaster\n",
"from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n",
"from sktime.forecasting.trend import PolynomialTrendForecaster\n",
"\n",
"`sktime` provides a unified interface to return prediction interval when forecasting. This is possible directly in the `predict` function, by setting the `return_pred_int` argument to `True`. The `predict` method then returns a second argument, Not all forecasters are capable of returning prediction intervals, in which case an error will be raised.\n",
"_, y = load_longley()\n",
"\n",
"Obtaining prediction intervals can be done as part of any workflow involving `predict`, by adding the argument `return_pred_int` - below, we illustrate this by modifying the basic workflow in Section 1.2:"
"y = y.drop(columns=[\"UNEMP\", \"ARMED\", \"POP\"])\n",
"\n",
"forecasters = [\n",
" (\"trend\", PolynomialTrendForecaster(), 0),\n",
" (\"ses\", ExponentialSmoothing(trend=\"add\"), 1),\n",
"]\n",
"\n",
"forecaster = ColumnEnsembleForecaster(forecasters=forecasters)\n",
"forecaster.fit(y, fh=[1, 2, 3])\n",
"\n",
"y_pred = forecaster.predict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The input to the multivariate forecaster `y` is a `pandas.DataFrame` where each column is a variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the multivariate forecaster `y_pred` is a `pandas.DataFrame` where columns are the predicted values for each variable. The variables in `y_pred` are the same as in `y`, the input to the multivariate forecaster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2.5 prediction intervals and quantile forecasts <a class=\"anchor\" id=\"section_1_2_5\"></a>\n",
"\n",
"`sktime` provides a unified interface to return prediction interval when forecasting. This is possible using the `predict_interval` function. Not all forecasters are capable of returning prediction intervals, in which case an error will be raised. If a forecaster can return prediction intervals `capability:pred_int` in `tags` dictionary should be set to `True`.\n"
]
},
{
Expand All @@ -615,24 +712,53 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# simple workflow\n",
"y = load_airline()\n",
"\n",
"fh = np.arange(1, 13)\n",
"\n",
"forecaster = ThetaForecaster(sp=12)\n",
"forecaster.fit(y)\n",
"\n",
"# setting return_pred_int argument to True; alpha determines percentiles\n",
"# intervals are lower = alpha/2-percentile, upper = (1-alpha/2)-percentile\n",
"alpha = 0.05 # 2.5%/97.5% prediction intervals\n",
"y_pred, y_pred_ints = forecaster.predict(fh, return_pred_int=True, alpha=alpha)"
"# interval coverage determines percentiles\n",
"# percentiles for an interval are:\n",
"# lower bound = 50 - coverage/2 upper bound = 50 + (coverage/2)\n",
"coverage = 0.95 # 2.5%/97.5% prediction intervals\n",
"y_pred = forecaster.predict(fh)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_ints` is a `pandas.DataFrame` with columns `lower` and `upper`, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are lower/upper (as column name) bound of the nominal alpha predictive interval for the index in the same row."
"#### predict_interval"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`coverage` argument can be interpreted as the nominal coverage of the prediction interval and it can be a `float` of `list of floats`. The interval is symmetric, for example, a coverage of `90` returns values at the lower: `5 (50 - (coverage/2)` and upper: `95 (50 + (coverage/2)` percentiles."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_ints = forecaster.predict_interval(fh, coverage=coverage)"
]
},
{
Expand All @@ -644,6 +770,13 @@
"y_pred_ints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_ints` is a `pandas.DataFrame` with third-level columns `lower` and `upper`, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are lower/upper (as column name) bound of the nominal coverage predictive interval for the index in the same row. The first level is variable name from y in fit, second level coverage fractions for which intervals were computed, in the same order as in input `coverage`."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -666,14 +799,16 @@
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plot_series(y, y_pred, labels=[\"y\", \"y_pred\"])\n",
"from sktime.utils import plotting\n",
"\n",
"fig, ax = plotting.plot_series(y, y_pred, labels=[\"y\", \"y_pred\"])\n",
"ax.fill_between(\n",
" ax.get_lines()[-1].get_xdata(),\n",
" y_pred_ints[\"lower\"],\n",
" y_pred_ints[\"upper\"],\n",
" y_pred_ints[\"Coverage\"][coverage][\"lower\"],\n",
" y_pred_ints[\"Coverage\"][coverage][\"upper\"],\n",
" alpha=0.2,\n",
" color=ax.get_lines()[-1].get_c(),\n",
" label=f\"{1 - alpha}% prediction intervals\",\n",
" label=f\"{coverage}% prediction intervals\",\n",
")\n",
"ax.legend();"
]
Expand All @@ -682,7 +817,53 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"NOTE: this should be turned into a one-liner, by moving this to `utils.plotting` - contributions are appreciated."
"#### predict_quantile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"sktime offers `predict_quantile` as a unified interface to return quantile values of predictions. Similar to `predict_interval` not all forecasters can return prediction quantiles. All forecasters that can return prediction intervals can also return prediction quantiles (because they are probabilistic).\n",
"\n",
"`alpha` argument can be interpreted as value at the percentile prediction for each variable and similar to the case of the predict_interval, can be a `float` of `list of floats`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_quantiles = forecaster.predict_quantiles(fh, alpha=[0.275, 0.975])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_quantiles`, the output of predict_quantiles is a `pandas.DataFrame` with columns the values of the percentiles, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are the quantile predictions for each variable of the forecaster. The higer level indices of columns are an indicator of the variable we are returning quantiles for. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_quantiles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Conversion from quantile to interval bound:** \n",
"\n",
"\n",
"**alpha < 0.5:** The alpha-quantile predictions for a variable would be equal to the lower bound of the interval with coverage = (0.5 - alpha) * 2\n",
"\n",
"**alpha > 0.5:** The alpha-quantile predictions for a variable would be equal to the upper bound of the interval with coverage = (alpha - 0.5) * 2"
]
},
{
Expand Down Expand Up @@ -2714,6 +2895,7 @@
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"hide_input": false,
"interpreter": {
"hash": "bc250fec99d1b72e5bb23d9fb06e1f1ac90e860438a1535c061277d2caf5ebfc"
Expand Down
1 change: 1 addition & 0 deletions sktime/forecasting/theta.py
Expand Up @@ -84,6 +84,7 @@ class ThetaForecaster(ExponentialSmoothing):

_fitted_param_names = ("initial_level", "smoothing_level")
_tags = {
"scitype:y": "univariate",
"ignores-exogeneous-X": True,
"capability:pred_int": True,
"requires-fh-in-fit": False,
Expand Down

0 comments on commit 81bfb34

Please sign in to comment.