Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] empirical quantile parameterized distribution #236

Merged
merged 8 commits into from
Apr 18, 2024
Merged

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented Apr 4, 2024

Towards #235.

This PR adds an empirical quantile parameterized distribution QPD_Empirical, inheriting from Empirical, but parameterized by quantile points/predictions.

This PR also factors out the predict_proba logic from MultipleQuantileRegressor as a parameter change between QPD_Empirical and Empirical, and uses QPD_Empirical directly to simplify the logic. This part is a pure refactor and does not change the internal logic.

As it changes the type of the return object, we may consider a deprecation cycle.

For discussion at the moment.

Alternative approaches for consideration:

  • adding an argument to Empirical which, if passed, does the reparametrization.
  • not inheriting form Empirical, but from a joint mixin or parent.

FYI @Ram0nB, @FelixWick, @setoguchi-naoki - I would be interested to hear what you think.

@fkiraly fkiraly added enhancement module:probability&simulation probability distributions and simulators module:regression probabilistic regression module implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Apr 4, 2024
@FelixWick
Copy link

The refactoring seems ok to me.
General comment: I think you shouldn't call the empirical quantile fitting QPD, since it is not really parameterized by the quantiles. (If I understand your Empirical mode correctly as kind of spline fit.)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 5, 2024

General comment: I think you shouldn't call the empirical quantile fitting QPD, since it is not really parameterized by the quantiles.

Could you explain this? I think it is fully parameterized by the quantiles.

The argument quantiles contains:

  • quantile points in the index, e.g., [0.2, 0.5, 0.8]
  • the corresponding quantiles (quantile values), e.g., [40, 42, 47]
  • this, for every sample index

Therefore, I think that the resulting distribution is fully parameterized by the quantiles that were provided.

Apologies, I should have probably written the docstring.

@FelixWick
Copy link

But the quantiles are used as inputs to a spline that is fitted (i.e., parameters of the spline are estimated). The quantiles are not the parameters of the spline function. That means you have to do a fit for each sample. A QPD on the other hand does not do a fit, instead the quantiles are directly used as parameters of the function (what is much faster, of course).

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 5, 2024

But the quantiles are used as inputs to a spline that is fitted

Oh, I see.

Actually, no spline is fitted here. The distribution is just a weighted mixture supported at the quantile points, so in fact they are parameters quite directly, no?

@FelixWick
Copy link

Ok, I see. Yeah, in that case I think it's ok to call it QPD 👍. Thanks for the explanation.

@Ram0nB
Copy link
Contributor

Ram0nB commented Apr 16, 2024

The refactoring looks good to me.

One thing that comes to mind is whether we also want to add the logic of converting a quantile prediction to a distribution estimate to the BaseProbaRegressor. Currently, the implementation of BaseProbaRegressor's _predict_proba uses the var and mean prediction to return a normal distribution. Maybe we can enhance BaseProbaRegressor's _predict_proba such that it uses the QPD_Empirical if _predict_quantiles/_predict_interval are available, and else the current logic. This way, we don't assume a normal distribution if multiple quantiles are available. What are your thoughts on this @fkiraly ?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 16, 2024

What are your thoughts on this @fkiraly ?

Excellent suggestion, in my opinion!

Yes, the normal assumption has bothered me for a while, but there haven't been too good alternatives before the various empiricals had been implemented.

One question of course is, one would need to choose some arbitrary quantile points if we would be using empirical QPD.
Perhaps, all the percentiles?

Further, a problem could be lack of smoothness, which have the risk of suddenly breaking user workflows that involve losses assuming continuous distributions, this might be a major issue to finish discussion on before doing sth too quickly.

Should we open a new issue to discuss the predict_proba default?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 16, 2024

moved discussion here:
#246

@fkiraly fkiraly merged commit 8704448 into main Apr 18, 2024
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement implementing algorithms Implementing algorithms, estimators, objects native to skpro module:probability&simulation probability distributions and simulators module:regression probabilistic regression module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants