[ENH] empirical quantile parameterized distribution #236

fkiraly · 2024-04-04T23:58:55Z

Towards #235.

This PR adds an empirical quantile parameterized distribution QPD_Empirical, inheriting from Empirical, but parameterized by quantile points/predictions.

This PR also factors out the predict_proba logic from MultipleQuantileRegressor as a parameter change between QPD_Empirical and Empirical, and uses QPD_Empirical directly to simplify the logic. This part is a pure refactor and does not change the internal logic.

As it changes the type of the return object, we may consider a deprecation cycle.

For discussion at the moment.

Alternative approaches for consideration:

adding an argument to Empirical which, if passed, does the reparametrization.
not inheriting form Empirical, but from a joint mixin or parent.

FYI @Ram0nB, @FelixWick, @setoguchi-naoki - I would be interested to hear what you think.

FelixWick · 2024-04-05T15:36:46Z

The refactoring seems ok to me.
General comment: I think you shouldn't call the empirical quantile fitting QPD, since it is not really parameterized by the quantiles. (If I understand your Empirical mode correctly as kind of spline fit.)

fkiraly · 2024-04-05T15:44:49Z

General comment: I think you shouldn't call the empirical quantile fitting QPD, since it is not really parameterized by the quantiles.

Could you explain this? I think it is fully parameterized by the quantiles.

The argument quantiles contains:

quantile points in the index, e.g., [0.2, 0.5, 0.8]
the corresponding quantiles (quantile values), e.g., [40, 42, 47]
this, for every sample index

Therefore, I think that the resulting distribution is fully parameterized by the quantiles that were provided.

Apologies, I should have probably written the docstring.

FelixWick · 2024-04-05T17:00:59Z

But the quantiles are used as inputs to a spline that is fitted (i.e., parameters of the spline are estimated). The quantiles are not the parameters of the spline function. That means you have to do a fit for each sample. A QPD on the other hand does not do a fit, instead the quantiles are directly used as parameters of the function (what is much faster, of course).

fkiraly · 2024-04-05T20:54:21Z

But the quantiles are used as inputs to a spline that is fitted

Oh, I see.

Actually, no spline is fitted here. The distribution is just a weighted mixture supported at the quantile points, so in fact they are parameters quite directly, no?

FelixWick · 2024-04-06T06:01:28Z

Ok, I see. Yeah, in that case I think it's ok to call it QPD 👍. Thanks for the explanation.

Ram0nB · 2024-04-16T09:35:01Z

The refactoring looks good to me.

One thing that comes to mind is whether we also want to add the logic of converting a quantile prediction to a distribution estimate to the BaseProbaRegressor. Currently, the implementation of BaseProbaRegressor's _predict_proba uses the var and mean prediction to return a normal distribution. Maybe we can enhance BaseProbaRegressor's _predict_proba such that it uses the QPD_Empirical if _predict_quantiles/_predict_interval are available, and else the current logic. This way, we don't assume a normal distribution if multiple quantiles are available. What are your thoughts on this @fkiraly ?

fkiraly · 2024-04-16T15:42:48Z

What are your thoughts on this @fkiraly ?

Excellent suggestion, in my opinion!

Yes, the normal assumption has bothered me for a while, but there haven't been too good alternatives before the various empiricals had been implemented.

One question of course is, one would need to choose some arbitrary quantile points if we would be using empirical QPD.
Perhaps, all the percentiles?

Further, a problem could be lack of smoothness, which have the risk of suddenly breaking user workflows that involve losses assuming continuous distributions, this might be a major issue to finish discussion on before doing sth too quickly.

Should we open a new issue to discuss the predict_proba default?

fkiraly · 2024-04-16T18:02:09Z

moved discussion here:
#246

fkiraly added 3 commits April 5, 2024 00:53

qpd empirical

16d8231

refactor multiplequantile

0994395

Update qpd_empirical.py

b8277fe

fkiraly added enhancement module:probability&simulation probability distributions and simulators module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Apr 4, 2024

fkiraly added 3 commits April 5, 2024 01:06

get_test_params

7ae9c06

Update qpd_empirical.py

7c63633

iloc

a63173a

fkiraly mentioned this pull request Apr 5, 2024

[ENH] Consolidate quantile parameterized distributions in few classes #235

Open

fkiraly mentioned this pull request Apr 16, 2024

[ENH] non-parametric default for _predict_proba if _predict_quantiles is available #246

Open

fkiraly added 2 commits April 18, 2024 10:40

Merge branch 'main' into qpd-empirical

53f4741

Update distributions.rst

ed94822

fkiraly merged commit 8704448 into main Apr 18, 2024
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] empirical quantile parameterized distribution #236

[ENH] empirical quantile parameterized distribution #236

fkiraly commented Apr 4, 2024 •

edited

Loading

FelixWick commented Apr 5, 2024

fkiraly commented Apr 5, 2024 •

edited

Loading

FelixWick commented Apr 5, 2024

fkiraly commented Apr 5, 2024

FelixWick commented Apr 6, 2024

Ram0nB commented Apr 16, 2024

fkiraly commented Apr 16, 2024

fkiraly commented Apr 16, 2024

[ENH] empirical quantile parameterized distribution #236

[ENH] empirical quantile parameterized distribution #236

Conversation

fkiraly commented Apr 4, 2024 • edited Loading

FelixWick commented Apr 5, 2024

fkiraly commented Apr 5, 2024 • edited Loading

FelixWick commented Apr 5, 2024

fkiraly commented Apr 5, 2024

FelixWick commented Apr 6, 2024

Ram0nB commented Apr 16, 2024

fkiraly commented Apr 16, 2024

fkiraly commented Apr 16, 2024

fkiraly commented Apr 4, 2024 •

edited

Loading

fkiraly commented Apr 5, 2024 •

edited

Loading