-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design/implement forecast interval predictions #97
Comments
This is a can of worms... Why: this is a different "kind" of prediction, essentially a third option next to deterministic and probabilistic supervised prediction or forecast. Prediction intervals can hence appear in any case when the predicted object has a continuous/numeric type. To my knowledge there is no package that has a clean interface for prediction forecasts, but what comes closest is ... As a hack, one could make PI boundaries part of a dict return object, similar to mean/var in sklearn, but it wouldn't account for:
As a slightly less hacky option, one could attach traits to each method about which kinds of predictions it can make: PI mean/var; PI, quantile-based, fully probabilistic, etc. Anything "nicer" (e.g., along the lines of MLJ's ontology of det/prob/etc) would require an interface redesign and/or further work on skpro, in my opinion. |
@frthjf 's opinion might also be helpful, if he's still following github discussions... |
Maybe we should just adopt a quick, hacky statsmodel/pmdarima-style solution where one can optionally compute confidence intervals for a user given alpha value (related to #104). The fully probabilistic interface requires much more work. |
A bit late to the party so please if ignore if no longer applicable as already closed. I might lack context of the wider project here, but from what I understand it seems indeed like a type of problem that the skpro API tried to address. I'd like to think of the problem as 'speculative API' problem where the issue is that we try to define an intermediate return object without knowing how it is going to be used. In skpro, the object was a distribution and the issue is that the user might be interested in something as complex as the pdf() but might just as well only care about the mean. The return type somehow has to represent mean, pdf() and any other conceivable property of interest but ideally only the ones that are actually used to avoid a lot of wasted computation. Here, we have a similar problem where the general return object are intervals ranging from simple boundery values to something like get_interval(confidence=0.9). In skpro, the solution paradigm to this issue was lazy execution; the computation of mean, pdf() etc. is described but only executed when accessed and actually used. This has the problem that one still needs to define how all the potential properties of interest would be computed for each an every model, so the computional burden of full eager execution is transformed into an implementational burden that is not very user friendly. The other option is something like a predictive API where you say, well, we assume most people are only interested in alpha value intervals, so let's hack that in and if it turns out people actually care about other things then we have to go back and implement something else (I think that's the way to go as a quick fix) Now, I believe there is a better way to solve this because we shouldn't be forced to speculate about what is actually needed if we could look at the entire computation ahead of the execution of each steps and truncate intermediate objects that are not needed for the end result (a bit like in Tensorflow's compute graph). Unfortunately, as @fkiraly recognized, this is a true can of worms and would require a huge amount of work. In any case, maybe this perspective helps with the further development. |
See #218 and corresponding API design document |
Linked to evaluation framework #64
The text was updated successfully, but these errors were encountered: