Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] forecaster tutorial: multivariate forecasting, probabilistic forecasting #2041

Merged
merged 3 commits into from Mar 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
210 changes: 196 additions & 14 deletions examples/01_forecasting.ipynb
Expand Up @@ -60,7 +60,8 @@
" * [1.2.1 Basic deployment workflow in a nutshell](#section_1_2_1)\n",
" * [1.2.2 Forecasters that require the horizon already in `fit`](#section_1_2_2)\n",
" * [1.2.3 Forecasters that can make use of exogeneous data](#section_1_2_3)\n",
" * [1.2.4 Prediction intervals](#section_1_2_4) \n",
" * [1.2.4 Multivariate Forecasters](#section_1_2_4)\n",
" * [1.2.5 Prediction intervals and quantile forecasts](#section_1_2_5) \n",
" * [1.3 basic evaluation workflow - evaluating a batch of forecasts against ground truth observations](#section_1_3) \n",
" * [1.3.1 The basic batch forecast evaluation workflow in a nutshell - function metric interface](#section_1_3_1)\n",
" * [1.3.2 The basic batch forecast evaluation workflow in a nutshell - metric class interface](#section_1_3_2) \n",
Expand Down Expand Up @@ -613,11 +614,107 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2.4 prediction intervals<a class=\"anchor\" id=\"section_1_2_4\"></a>\n",
"#### 1.2.4. multivariate forecasting <a class=\"anchor\" id=\"section_1_2_4\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not all forecasters in sktime are multivariate. Some examples of multivariate forecasters are: `MultiplexForecaster`, `EnsembleForecaster`,`TransformedTargetForecaster` etc. In order to determine is a forecaster can be multivariate, one can look at the `scitype:y` in `tags`, which should be set to `multivariate` or '`both`. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In order to find the complete list of multivariate forecasters you can use the code below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sktime.registry import all_estimators\n",
"\n",
"for forecaster in all_estimators(filter_tags={\"scitype:y\": [\"multivariate\", \"both\"]}):\n",
" print(forecaster[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is an example of the general workflow of multivariate `ColumnEnsembleForecaster` using the longley dataset from `sktime.datasets`. The workflow is the same as in the univariate forecasters, but the input has more than one variables (columns)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sktime.datasets import load_longley\n",
"from sktime.forecasting.compose import ColumnEnsembleForecaster\n",
"from sktime.forecasting.exp_smoothing import ExponentialSmoothing\n",
"from sktime.forecasting.trend import PolynomialTrendForecaster\n",
"\n",
"`sktime` provides a unified interface to return prediction interval when forecasting. This is possible directly in the `predict` function, by setting the `return_pred_int` argument to `True`. The `predict` method then returns a second argument, Not all forecasters are capable of returning prediction intervals, in which case an error will be raised.\n",
"_, y = load_longley()\n",
"\n",
"Obtaining prediction intervals can be done as part of any workflow involving `predict`, by adding the argument `return_pred_int` - below, we illustrate this by modifying the basic workflow in Section 1.2:"
"y = y.drop(columns=[\"UNEMP\", \"ARMED\", \"POP\"])\n",
"\n",
"forecasters = [\n",
" (\"trend\", PolynomialTrendForecaster(), 0),\n",
" (\"ses\", ExponentialSmoothing(trend=\"add\"), 1),\n",
"]\n",
"\n",
"forecaster = ColumnEnsembleForecaster(forecasters=forecasters)\n",
"forecaster.fit(y, fh=[1, 2, 3])\n",
"\n",
"y_pred = forecaster.predict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The input to the multivariate forecaster `y` is a `pandas.DataFrame` where each column is a variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result of the multivariate forecaster `y_pred` is a `pandas.DataFrame` where columns are the predicted values for each variable. The variables in `y_pred` are the same as in `y`, the input to the multivariate forecaster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1.2.5 prediction intervals and quantile forecasts <a class=\"anchor\" id=\"section_1_2_5\"></a>\n",
"\n",
"`sktime` provides a unified interface to return prediction interval when forecasting. This is possible using the `predict_interval` function. Not all forecasters are capable of returning prediction intervals, in which case an error will be raised. If a forecaster can return prediction intervals `capability:pred_int` in `tags` dictionary should be set to `True`.\n"
]
},
{
Expand All @@ -635,24 +732,53 @@
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"# simple workflow\n",
"y = load_airline()\n",
"\n",
"fh = np.arange(1, 13)\n",
"\n",
"forecaster = ThetaForecaster(sp=12)\n",
"forecaster.fit(y)\n",
"\n",
"# setting return_pred_int argument to True; alpha determines percentiles\n",
"# intervals are lower = alpha/2-percentile, upper = (1-alpha/2)-percentile\n",
"alpha = 0.05 # 2.5%/97.5% prediction intervals\n",
"y_pred, y_pred_ints = forecaster.predict(fh, return_pred_int=True, alpha=alpha)"
"# interval coverage determines percentiles\n",
"# percentiles for an interval are:\n",
"# lower bound = 50 - coverage/2 upper bound = 50 + (coverage/2)\n",
"coverage = 0.95 # 2.5%/97.5% prediction intervals\n",
"y_pred = forecaster.predict(fh)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_ints` is a `pandas.DataFrame` with columns `lower` and `upper`, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are lower/upper (as column name) bound of the nominal alpha predictive interval for the index in the same row."
"#### predict_interval"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`coverage` argument can be interpreted as the nominal coverage of the prediction interval and it can be a `float` of `list of floats`. The interval is symmetric, for example, a coverage of `90` returns values at the lower: `5 (50 - (coverage/2)` and upper: `95 (50 + (coverage/2)` percentiles."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_ints = forecaster.predict_interval(fh, coverage=coverage)"
]
},
{
Expand All @@ -664,6 +790,13 @@
"y_pred_ints"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_ints` is a `pandas.DataFrame` with third-level columns `lower` and `upper`, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are lower/upper (as column name) bound of the nominal coverage predictive interval for the index in the same row. The first level is variable name from y in fit, second level coverage fractions for which intervals were computed, in the same order as in input `coverage`."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -686,14 +819,16 @@
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plot_series(y, y_pred, labels=[\"y\", \"y_pred\"])\n",
"from sktime.utils import plotting\n",
"\n",
"fig, ax = plotting.plot_series(y, y_pred, labels=[\"y\", \"y_pred\"])\n",
"ax.fill_between(\n",
" ax.get_lines()[-1].get_xdata(),\n",
" y_pred_ints[\"lower\"],\n",
" y_pred_ints[\"upper\"],\n",
" y_pred_ints[\"Coverage\"][coverage][\"lower\"],\n",
" y_pred_ints[\"Coverage\"][coverage][\"upper\"],\n",
" alpha=0.2,\n",
" color=ax.get_lines()[-1].get_c(),\n",
" label=f\"{1 - alpha}% prediction intervals\",\n",
" label=f\"{coverage}% prediction intervals\",\n",
")\n",
"ax.legend();"
]
Expand All @@ -702,7 +837,53 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"NOTE: this should be turned into a one-liner, by moving this to `utils.plotting` - contributions are appreciated."
"#### predict_quantile"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"sktime offers `predict_quantile` as a unified interface to return quantile values of predictions. Similar to `predict_interval` not all forecasters can return prediction quantiles. All forecasters that can return prediction intervals can also return prediction quantiles (because they are probabilistic).\n",
"\n",
"`alpha` argument can be interpreted as value at the percentile prediction for each variable and similar to the case of the predict_interval, can be a `float` of `list of floats`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_quantiles = forecaster.predict_quantiles(fh, alpha=[0.275, 0.975])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`y_pred_quantiles`, the output of predict_quantiles is a `pandas.DataFrame` with columns the values of the percentiles, and rows the indices for which forecasts were made (same as in `y_pred`). Entries are the quantile predictions for each variable of the forecaster. The higer level indices of columns are an indicator of the variable we are returning quantiles for. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_quantiles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Conversion from quantile to interval bound:** \n",
"\n",
"\n",
"**alpha < 0.5:** The alpha-quantile predictions for a variable would be equal to the lower bound of the interval with coverage = (0.5 - alpha) * 2\n",
"\n",
"**alpha > 0.5:** The alpha-quantile predictions for a variable would be equal to the upper bound of the interval with coverage = (alpha - 0.5) * 2"
]
},
{
Expand Down Expand Up @@ -2766,6 +2947,7 @@
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"hide_input": false,
"interpreter": {
"hash": "bc250fec99d1b72e5bb23d9fb06e1f1ac90e860438a1535c061277d2caf5ebfc"
Expand Down
1 change: 1 addition & 0 deletions sktime/forecasting/theta.py
Expand Up @@ -84,6 +84,7 @@ class ThetaForecaster(ExponentialSmoothing):

_fitted_param_names = ("initial_level", "smoothing_level")
_tags = {
"scitype:y": "univariate",
"ignores-exogeneous-X": True,
"capability:pred_int": True,
"requires-fh-in-fit": False,
Expand Down