Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adds _predict_interval to SARIMAX to support predict_interval and predict_quantiles #4439

Merged
merged 8 commits into from Apr 13, 2023
Merged

Conversation

yarnabrina
Copy link
Collaborator

@yarnabrina yarnabrina commented Apr 6, 2023

Reference Issues/PRs

Fixes #4301.

What does this implement/fix? Explain your changes.

Before this PR, SARIMAX did not support prediction of confidence intervals and quantiles. This PR adds those functionalities, and tests to check that results match with underlying statsmodels estimator.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Yes.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@yarnabrina yarnabrina requested a review from fkiraly as a code owner April 6, 2023 02:30
fkiraly
fkiraly previously approved these changes Apr 6, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, as usual!

Non-blocking comments:

  • you should add yourself as an author
  • the _predict_interval looks very similar to that in other statsmodels functions. It might make sense to see whether the duplications (or partial duplications) can be refactored to live in a single place. I would consider that out of scope though for this PR, just making a note.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 6, 2023

Regarding questions:

It can be considered to add handling of that

Yes, that would make sense.
If you look at that, it would be great if you also could add the missing fit args, see here: #4377

skipped generalisation attempt as of now as suggested #4301 (comment).

I suppose that's the same comment I just made in the review? Fair enough, so what you're saying, there are subtle interface inconsistencies?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I noticed, apologies for not letting you know first time round - I just fixed something similar as part of the pandas 2 compatibility patch.

lines 242, 243 look identical to lines in AutoETS.
These cause a failure on pandas 2 which no longer allows to write to index locations that do not exist.
Could you kindly follow the fix in #4427 (AutoETS) instead? (this replaces the offending lines by an alternative call that works on both pandas 1 and pandas 2)

I also notice, the entire _predict_interval looks identical, except the name of the method of prediction_results which is called.
Could a refactor work by adding an attribute or tag to the adapter parent class which has the name of the attribute (string) and is looked up inside _predict_interval? Or a private function with that as an argument, which is called from _predict_interval?
Would appreciate an issue with any observations you might have on similarities and/or how to refactor this (if you don't get to it, I will open one later when I'm done with the pandas 2 upgrade)

@yarnabrina
Copy link
Collaborator Author

what you're saying, there are subtle interface inconsistencies?

Yes. For example, DynamicFactor has a method predict_interval that can be used, while SARIMAX does not and one has to use get_prediction or similar first and on top that use conf_int. I can't think of a way to merge these.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 6, 2023

I can't think of a way to merge these.

Well, but there are at least three instances where the similarities are very high, see above.
We can deal with these, and override _predict_interval in cases of substantial difference, e.g., DynamicFactor.

(maybe we should move this discussion to a refactor issue, as it isn't really about #4439 anymore)

@yarnabrina yarnabrina requested a review from fkiraly April 8, 2023 02:11
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's exactly what I meant!

@fkiraly fkiraly merged commit f37b7ea into sktime:main Apr 13, 2023
22 checks passed
@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Apr 13, 2023
@yarnabrina yarnabrina deleted the enh-4301 branch April 14, 2023 16:35
fkiraly pushed a commit that referenced this pull request Apr 22, 2023
… `statsmodels` based forecasters to reduce code duplication (#4465)

Closes #4447. Depends on #4439 and #4452.

The implementation of `_predict_interval` in `AutoETS`, `SARIMAX` and
`UnobservedComponents` were very similar. This PR helps to define these
at a single place at base level which should help maintaining easier.

Observations on multivariate estimators as being discussed in #4447.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] SARIMAX - implement predict_interval or predict_quantiles (for statsmodels SARIMAX interface)
2 participants