-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] empirical quantile parameterized distribution #236
Conversation
The refactoring seems ok to me. |
Could you explain this? I think it is fully parameterized by the quantiles. The argument
Therefore, I think that the resulting distribution is fully parameterized by the quantiles that were provided. Apologies, I should have probably written the docstring. |
But the quantiles are used as inputs to a spline that is fitted (i.e., parameters of the spline are estimated). The quantiles are not the parameters of the spline function. That means you have to do a fit for each sample. A QPD on the other hand does not do a fit, instead the quantiles are directly used as parameters of the function (what is much faster, of course). |
Oh, I see. Actually, no spline is fitted here. The distribution is just a weighted mixture supported at the quantile points, so in fact they are parameters quite directly, no? |
Ok, I see. Yeah, in that case I think it's ok to call it QPD 👍. Thanks for the explanation. |
The refactoring looks good to me. One thing that comes to mind is whether we also want to add the logic of converting a quantile prediction to a distribution estimate to the BaseProbaRegressor. Currently, the implementation of BaseProbaRegressor's _predict_proba uses the var and mean prediction to return a normal distribution. Maybe we can enhance BaseProbaRegressor's _predict_proba such that it uses the QPD_Empirical if _predict_quantiles/_predict_interval are available, and else the current logic. This way, we don't assume a normal distribution if multiple quantiles are available. What are your thoughts on this @fkiraly ? |
Excellent suggestion, in my opinion! Yes, the normal assumption has bothered me for a while, but there haven't been too good alternatives before the various empiricals had been implemented. One question of course is, one would need to choose some arbitrary quantile points if we would be using empirical QPD. Further, a problem could be lack of smoothness, which have the risk of suddenly breaking user workflows that involve losses assuming continuous distributions, this might be a major issue to finish discussion on before doing sth too quickly. Should we open a new issue to discuss the |
moved discussion here: |
Towards #235.
This PR adds an empirical quantile parameterized distribution
QPD_Empirical
, inheriting fromEmpirical
, but parameterized by quantile points/predictions.This PR also factors out the
predict_proba
logic fromMultipleQuantileRegressor
as a parameter change betweenQPD_Empirical
andEmpirical
, and usesQPD_Empirical
directly to simplify the logic. This part is a pure refactor and does not change the internal logic.As it changes the type of the return object, we may consider a deprecation cycle.
For discussion at the moment.
Alternative approaches for consideration:
Empirical
which, if passed, does the reparametrization.Empirical
, but from a joint mixin or parent.FYI @Ram0nB, @FelixWick, @setoguchi-naoki - I would be interested to hear what you think.