Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] probabilistic forecasting rework part 2 - distribution forecast metrics log-loss, CRPS #4276

Merged
merged 105 commits into from Mar 19, 2023

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Mar 1, 2023

Experimental PR that introduces distribution forecast metrics, based on the back-end agnostic, pandas-like, skbase-based distribution object interface in #4190.

The design allows both approximative calculations and exact calculations, as the responsibility with providing values for certain mathematical expressions rests with the base distribution object.

This is still in a preliminary state, as the base class duplicates probabilistic, non-distributional metrics. Future changes should not change the interface though.

@fkiraly fkiraly added the API design API design & software architecture label Mar 1, 2023
@fkiraly fkiraly requested a review from RNKuhns as a code owner March 4, 2023 18:13
@fkiraly fkiraly changed the title [ENH] experimental - distribution forecast metrics log-loss, CRPS [ENH] probabilistic forecasting rework part 2 - distribution forecast metrics log-loss, CRPS Mar 5, 2023
fkiraly added a commit that referenced this pull request Mar 18, 2023
…ability distributions (#4190)

This experimental PR introduces backend agnostic probability distributions, based on `BaseObject`, towards #4359

Currently, `BaseForecaster.predict_proba` relies on `tensorflow_probability` as a return type, which is a heavy dependency (>300MB in the python env).

This PR, instead, introduces a `BaseObject`-based interface for probability distributions, which is back-end agnostic, but has `tensorflow-probability` as one of the options for a back-end.

The conceptual model mixes indexed objects as in `pandas` and the array-based distributions in `tensorflow-probability`, i.e., array distributions which have `pandas`-based `index`, `columns`, and can be sub-set using `loc` and `iloc` indexing, similar to `pd.DataFrame`.

Advantages:
* row/column subsetting of the return, compatibility with `pandas` indices
* decouples the distributions from the `tensorflow` back-end, foundation for distribution metrics/losses
* allows to add methods to the distribution outside `tensorflow`, in `sktime`, e.g., heuristical integrated cdf or energy statistics required for CRPS metric (see `energy` method)

STEP: sktime/enhancement-proposals#31
Related issue: #1746

Includes a full test suite for the interface, based on the `skbase` design.

Roadmap for subsequent PR:

* part 2: probabilistic (distributional) metrics, see #4276
* part 3: use in forecasters as return object if `tensorflow-probability` is not present; deprecation for 0.18.0 or 0.19.0
* part 4: use in tuned forecasters
* further probability distributions native to `sktime`
* interfacing more distributions from `tfp`
@fkiraly fkiraly merged commit 4912bec into main Mar 19, 2023
20 checks passed
@fkiraly fkiraly deleted the proba-metrics branch March 19, 2023 11:59
@fkiraly fkiraly added the module:probability&simulation probability distributions and simulators label Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture enhancement Adding new functionality implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:metrics&benchmarking metrics and benchmarking modules module:probability&simulation probability distributions and simulators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant