Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] quantile method for distributions, default implementation of forecaster predict_quantiles if predict_proba is present #4513

Merged
merged 4 commits into from Apr 28, 2023

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Apr 27, 2023

This PR introduces a quantile method to all sktime distributions.

This complements ppf (which also returns quantiles) in that it returns quantiles in the same format as forecasters' predict_quantiles, and broadcasts quantile points in the same way as in forecasters.

This PR also adds a default implementation of predict_quantiles if predict_proba is present, through calling quantiles of the predict_proba return.

This also enables the common usage pattern beyond 0.18.0, where quantile is called on the predict_proba return, similar to the tensorflow method, but in an sktime compatible mtype.

@fkiraly fkiraly added implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality module:probability&simulation probability distributions and simulators labels Apr 27, 2023
@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 27, 2023

FYI @yarnabrina, re probabilistic predictions - would appreciate a reivew!

@fkiraly fkiraly changed the title [ENH] quantiles method for distributions, default implementation of forecaster predict_quantiles if predict_proba is present [ENH] quantile method for distributions, default implementation of forecaster predict_quantiles if predict_proba is present Apr 27, 2023
@yarnabrina
Copy link
Collaborator

Looks okay, but two questions:

  1. How do I try/test these? Which estimators has this capability as of now? I got none (just Normal and TFNormal distributions) with this: pytest --co -k "test_quantile" sktime/proba/tests/test_all_distrs.py.

  2. Since _predict_interval can work with _predict_quantiles, and you added support for _predict_quantiles using _predict_proba, can we extend _predict_proba support to _predict_interval as well?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2023

  1. How do I try/test these? Which estimators has this capability as of now? I got none (just Normal and TFNormal distributions) with this: pytest --co -k "test_quantile" sktime/proba/tests/test_all_distrs.py.

This is not an estimator capability, but a capability of distributions. Currently indeed there are only Normal and TFNormal, as well as the abstract tensorflow based adapter _BaseTFDistribution (that's tested through TFNormal).

  1. Since _predict_interval can work with _predict_quantiles, and you added support for _predict_quantiles using _predict_proba, can we extend _predict_proba support to _predict_interval as well?

That happens indirectly, my thinking is that the preferred method to use as a default for _predict_interval is _predict_quantiles, not _predict_proba. My reasoning for this priority is that these methods are content-wise the same except how the columns are parametrized and named.

To see that we always have a default if just one of the methods is implemented, in the current PR, let's consider two cases where _predict_interval is not implemented.

Case 1: both _predict_quantiles is implemented, and _predict_proba may or may not be implemented.
Then _predict_interval gets the default from _predict_quantiles.

Case 2: only _predict_proba is implemented, not _predict_quantiles.
Then _predict_interval gets the default from _predict_quantiles, which in turn gets it from _predict_proba.

These cases are distinct and exhaustive, assuming at least one other method is implemented.

Do you think we should do this differently? Or, same logic, but implement it differently?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2023

@yarnabrina, merging to have some fix in place for the release action, but happy to change the logic later on

@fkiraly fkiraly merged commit 91290af into main Apr 28, 2023
22 checks passed
@fkiraly fkiraly deleted the proba-quantiles branch April 28, 2023 17:58
fkiraly added a commit that referenced this pull request Apr 28, 2023
scheduled deprecation and change actions for the 0.18.0 release

* remove `VectorizedDF.get_iloc_indexer`
* switch default of `legacy_interface` in `predict_proba` to `False`

For deprecation of the `predict_proba` legacy interface, depends on:

* #4513
* #4514
@yarnabrina
Copy link
Collaborator

yarnabrina commented Apr 29, 2023

Do you think we should do this differently? Or, same logic, but implement it differently?

I completely agree with the logic, but I think I'm missing how predict_interval would work if both _predict_interval and _predict_quantiles are undefined (in the sense of not overridden at estimator level) and only _predict_proba is overridden.

If actual (non-base) estimator do not define _predict_interval or _predict_quantiles, the checks with _has_implementation_of will be False, but _predict_proba implementation will make can_do_proba as True. So, it goes pass L1995, and also goes passL2003. My question/doubt is how will L2030 work then, because I don't see pred_int being defined anywhere else in the function.

Do we not need something like this here in _predict_interval as well? Of course the logic needs to be a bit different to conform to the output format.

elif implements_proba:

If I have missed some logic which will call _predict_quantiles of BaseForecaster from _predict_interval of BaseForecaster to make use of _predict_proba in actual estimator in such cases, please let me know.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 29, 2023

@yarnabrina, moved discussion here:
#4528

(I've merged the above as a fix to a blocker in the deprecation which was blocking the release, but we should get this sorted out!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting module:probability&simulation probability distributions and simulators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants