Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] interface probabilistic regressors from ngboost package #135

Closed
satya-pattnaik opened this issue May 2, 2021 · 19 comments · Fixed by #215
Closed

[ENH] interface probabilistic regressors from ngboost package #135

satya-pattnaik opened this issue May 2, 2021 · 19 comments · Fixed by #215
Labels
feature request New feature or request good first issue Good for newcomers interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module

Comments

@satya-pattnaik
Copy link

satya-pattnaik commented May 2, 2021

Update by @fkiraly - we should interface ngboost regressors as probabilistic supervised (tabular) regressors.
As discussed in the below, this should be an skpro regressor, which in turn can be used in sktime reduction forecasters (such as YfromX). The original request was for using ngboost as a forecaster in sktime, but this is a tabular proba regressor that needs to go through an additional reduction step (which is now implemented in sktime).

It should be straightforward to interface ngboost using the probabilistic regressor extension template:
https://github.com/sktime/skpro/blob/main/extension_templates/regression.py
so adding it as a good first issue.

The main techincal concern might be translating the ngboost probability distributions into skpro probability distributions, but that should also be addressable with a lean adaptation layer (personally, I would add that adaptation in an adapter utility subpackage in regression.ngboost).

Original request below.


Is your feature request related to a problem? Please describe.
Can we build a Probabilistic Forecaster using Ngboost. A Probabilistic regression like Ngboost method will give us the confidence intervals out of the box unlike others where we need a heuristic(like quantile loss) to get the values of Conf/Pred Ints.

Describe the solution you'd like
A rough sketch:

class ProbForecaster:
     def __init__():
           self.estimator_ = NgboostRegressor()
     def fit(y,X=None):
           y = make_reduction(y)
           self.estimator_.fit(y)
     def predict(fh,X=none,alpha=0.95,conf_int=True):
           mean_preds =  self.estimator_.predict(y)
           distribution =  self.estimator_.pred_dist(y)
           conf_ints = scipy.norm.interval(alpha, distribution.loc, distribution.scale )
           return mean_preds,conf_ints

Describe alternatives you've considered
Using Xgboost/LightGBM with quantile loss makes the calculation of prediction intervals inefficient.

Additional context
Ngboost Doc- https://stanfordmlgroup.github.io/projects/ngboost/

@fkiraly
Copy link
Collaborator

fkiraly commented May 2, 2021

Interesting!
NGboost is a probabilistic supervised prediction algorithm (for tabular data), not a forecaster - so we would first have to build a full probabilistic interface. That would be a great thing to have for sktime.

Now about something that's partly funny and partly not funny...
"NGboost" is basically identical to the probabilistic boosting algorithm I proposed (earlier) in section 6.4 of my 2018 paper https://arxiv.org/abs/1801.00753
The probabilistic prediction interface of the NGboost python package closely follows principles in the sktime companion package skpro (scikit-learn like probabilistic prediction), or the R package mlr3proba.

I'm reasonably certain that Ng et al know of both the methodological paper and the software interface designs, but still don't cite them... that's not very nice.

Anyway, we should develop skpro into the probabilistic scikit-learn that it was meant to be, but unfortunately it's currently without maintainer. Might that be something you would be interested in?

@satya-pattnaik
Copy link
Author

Yes, 6.4 of the above mentioned paper deals with a similar concept.
That sounds good to me, I can use Skpro to build it. The question is how to build that probabilistic interface, if I can get some starting point I can carry it from there, as I was planning to use make_reduction(Which I assume now is not the correct way???)

@fkiraly
Copy link
Collaborator

fkiraly commented May 3, 2021

well, if you want to use skpro for probabilistic predictions, the package itself needs an update - it's a bit of a larger project than just adding a method. If you're interested to do that, we should set up a call - also (I believe you are part of the mentoring programme?) discuss with your mentor.

@satya-pattnaik
Copy link
Author

Yes, will discuss about this with my mentor. Thanks

@mloning mloning added the feature request New feature or request label May 6, 2021
@drackham
Copy link

I'm very interested in the capability discussed here. It doesn't appear like there has been any progress here. Would you correct me if that is in fact not the case? Thank you!

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 31, 2023

actually yes!

We've been re-working the probabilistic forecasting interface:
sktime/sktime#4359

This will enable using probabilistic supervised learners in compositors like make_reduction much more easily.

Want to help work on this, @drackham ? It's a bit of an engineering project, but there's a step-by-step roadmap.

Would be much appreciated!

We'll probably move this a bit over the Easter holdays where the volunteer contributors tend to have more time.

What would also be helpful is testing the probabilistic forecasting interface and reporting your experiences or any design suggestions (in sktime/sktime#4359). Will be released experimental in 0.17.0 and full in 0.18.0.

@fkiraly fkiraly changed the title Probabilistic Forecasting using NGboost [ENH] Probabilistic Forecasting using NGboost Mar 31, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Mar 31, 2023

speaking of which, @frthjf, are you still around?

I would like to move the probabilistic interface into skpro within the next year or so and use it as an import.

That would be step number 7 or 8 in sktime/sktime#4359 (not there yet, but see context above).

@drackham
Copy link

@fkiraly thank you!

I discovered sktime recently, and these types of models are not really in my core areas of competency, so I'd likely be unable to contribute effectively. That said, I'll take a look at the contribution documentation and see if my apprehension is unwarranted.

I'll also take a look at the probabilistic forecasting interface and report back. Thanks again!

@frthjf
Copy link
Collaborator

frthjf commented Mar 31, 2023

@fkiraly I am still around :-)

Just to clarify, do I understand correctly that the plan is to resurrect the skpro package which in turn becomes an optional dependency of sktime? I could certainly help with updating the skpro code.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 31, 2023

@fkiraly I am still around :-)

Nice to hear of you again! Let's catch up, discord perhaps?

Just to clarify, do I understand correctly that the plan is to resurrect the skpro package which in turn becomes an optional dependency of sktime? I could certainly help with updating the skpro code.

Yes!

For now, I've been working in the sktime/proba module. Have a look and let me now what you think!

The design is a mix of pandas, sklearn (base interface), and tensorflow-probability (parameter broadcast).

I'd like to move it out to skpro, and make skpro a core (not optional) dependency of sktime. Both sktime and skpro would eventually depend on skbase which has the base class framework.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 1, 2023

For comments, the topical issue is here, @frthjf:
sktime/sktime#4359

(this issue is about a specific probabilistic forecaster)

@frthjf
Copy link
Collaborator

frthjf commented Apr 1, 2023

I see, in that case, why move this out of sktime into an independent skpro package if it's required back in anyway? Wouldn't it be easier to port the skpro features into sktime instead? It seems to me that the sktime package already has all the CI and package infrastructure we would have to recreate for a skpro re-release.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 2, 2023

(will continue on sktime/sktime#4359 for architecture discussion)

@fkiraly fkiraly transferred this issue from sktime/sktime Oct 28, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 28, 2023

The skpro package is now sufficiently mature to accommodate an interface to ngboost - moved the issue therefore to skpro.

@fkiraly fkiraly added good first issue Good for newcomers module:regression probabilistic regression module interfacing algorithms Interfacing existing algorithms/estimators from third party packages labels Oct 28, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Oct 28, 2023

FYI @drackham, @satya-pattnaik, @frthjf - I have updated the issue with instructions on "how-to". Together with the skpro machinery and the existing integration to sktime, this should now be pretty straightforward.

Would be great if one of you would like to implement this, I'm happy to advise!

Also FYI @Alex-JG3, @Ram0nB, in case one of you is interested, this intersects with your previous contribution topics.

@fkiraly fkiraly changed the title [ENH] Probabilistic Forecasting using NGboost [ENH] interface probabilistic regressors from ngboost package Oct 28, 2023
@KiwiAthlete
Copy link

KiwiAthlete commented Jan 22, 2024

Have you considered XGBoostLSS and LightGBMLSS as well? Both offer great flexibility and are based on the two most commonly used tabular data boosting machines. Yet, there is no sklearn API available, but has a PR on this.

Shall I open a new issue for this?

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 26, 2024

Excellent suggestion, @KiwiAthlete - opened an issue here: #135

@ShreeshaM07
Copy link
Contributor

I'll try to start working on adding an interface of ngboost to skpro soon after going through its docs.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 17, 2024

Great! I notice I linked the wrong issue, fixed the link.
I'd recommend you continue discussing in the issue #135 how the interface would look like.

fkiraly pushed a commit that referenced this issue May 2, 2024
Resolves #135 - adds a `NGBoostRegressor`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers interfacing algorithms Interfacing existing algorithms/estimators from third party packages module:regression probabilistic regression module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants