Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] adds _predict_interval in _StatsModelsAdapter and inherits in other estimator to reduce code duplication #4465

Merged
merged 23 commits into from Apr 22, 2023

Conversation

yarnabrina
Copy link
Collaborator

@yarnabrina yarnabrina commented Apr 12, 2023

Reference Issues/PRs

Closes #4447. Depends on #4439 and #4452.

What does this implement/fix? Explain your changes.

The implementation of _predict_interval in AutoETS, SARIMAX and UnobservedComponents were very similar. This PR helps to define these at a single place at base level which should help maintaining easier.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

All but ThetaForecaster implements _predict_interval and hence that is failing the tests. However, the _predict_quantiles of ThetaForecaster just adds Gaussian quantiles to point predictions. If it is a proper way to do interval prediction, a specific exception can be made for ThetaForecaster in the base method. If not, subject to deprecation, can it be considered to remove interval support for ThetaForecaster?

Did you add any tests for the change?

No.

Any other comments?

Observations on multivariate estimators as being discussed in #4447.

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

Anirban Ray and others added 13 commits April 4, 2023 22:31
* origin/enh-4301:
  pandas 2 compatibility changes
  fix name of test
  added author
  fixed pydocstyle D412
  added test contibution
  added test cases
  added basic implementation
  added capability tag
1. all inheriting multivariate estimators must override `_predict_interval`
2. all inheriting univariate estimators need to override at least `_extract_conf_int`
sktime/forecasting/ets.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, design-wise, looks good!

Left some minor comments.

Also, you already noted the fiddly task of finding out what happens downstream with the failures and Theta.

* origin/main:
  [ENH] transformer that creates features from time index or hierarchy label (#4416)
  [ENH] Adds `_predict_interval` to `SARIMAX` to support `predict_interval` and `predict_quantiles` (#4439)
@yarnabrina
Copy link
Collaborator Author

Current failures:

  1. The new base method expects all subclasses to use _predict_interval to generate probabilistic forecasts. Since ThetaForecaster (inheriting from ExponentialSmoothing) implements _predict_quantiles, it is causing issues.
    • temporarily remove predict proba support from ThetaForecaster -> will break significantly, not an option
    • change predict proba support for ThetaForecaster by replacing quantiles with interval (reference) - effort/effects?
    • how to tackle other estimators inheriting from _StatsModelsAdapter relying on quantiles instead of interval?
  2. There is a test (test_pred_int_tag) that ensures absence of estimators with implementation of _predict_interval or _predict_quantiles or _predict_proba and without capability:pred_int tag. Base method in _StatsModelsAdapter is causing failures in all inheriting estimators without predict proba support to fail.
    • skip this test - effects?
    • modify the test to allow for abstract methods - how?

This is the current status, and planning on how to handle these. Any comments/suggestions are very much welcome.

(FYI: @fkiraly)

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 15, 2023

  1. this is interesting, didn't think of this.

I can see option 2 as well - that might be the cleaner option long term, as all statsmodels models return prediction intervals afaik?

However, this seems like scope creep and unclear effort, so I would go for the following strategy:

  • aim for an end state where we have a pair of methods for _predict_interval and _predict_quantiles, i.e., _predict_quantiles and _predict_quantiles_adapt (does not exist), or _predict_interval and _predict_interval_adapt (currently called _extract_conf_int)
  • use _has_implementation_of - a base method present in all skbase objects - to check which of the adapter pair (_predict_interval_adapt, _predict_quantiles_adapt) has been implemented, in _predict_interval, and _predict_quantiles both. If the other one has been implemented, default to the parent _predict_interval. If the same has been implemented, run the adapter logic
  • for the start, only cover the _predict_interval side, at least in this PR. _predict_quantiles might not be worth dealing with. That is, in _predict_interval, use the check logic to dispatch to either _predict_interval_adapt or _predict_quantiles via the parent _predict_interval (via super).

What do you think?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 15, 2023

  1. the test test_pred_int_tag already has one exception, the _DelegatedForecaster. The simplest way would be to skip the impacted forecasters the same way, but that weakens test coverage.

A slightly more elaborate way would be, changing the test so it checks whether the functionality is there rather than an implementation. That would require a config option which turns off the error message and lets the method run. But I consider this scope creep for this PR, so out of scope.

@yarnabrina
Copy link
Collaborator Author

Agreed on out-of-scope point on both 1 and 2. Then, I'm thinking to skip the tag test for now, and mark as a TODO comment + GH issue to be addressed later (hopefully soon). Please let me know if that's not okay.


Is it possible to redirect to parent's _predict_interval if we are not able to check if extract_conf_int is implemented AND "functional"? There's a very high chance that I misinterpreted your suggestion, so can you please elaborate a little bit more?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 15, 2023

Is it possible to redirect to parent's _predict_interval if we are not able to check if extract_conf_int is implemented AND "functional"? There's a very high chance that I misinterpreted your suggestion, so can you please elaborate a little bit more?

_has_implementation_of should return True if and only if the method has been overridden at least once. That should be an if and only if with "functional", right?

It's important to note the difference between "implemented" (appears at least once in the inheritance resolution order), and "overridden at least once" (appears at least twice). _has_implementation_of checks the latter.

@yarnabrina
Copy link
Collaborator Author

That should be an if and only if with "functional", right?

Not sure I'm certain on this. Say the adapt example you gave, basically my implementation of _predict_interval in _StatsModelsAdapter. I'd have said it's non-functional as it can lead to NotImplementedError.

I'll give it a try tomorrow, but I've a feeling that super() may not be enough (or at least for me). Will update.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 15, 2023

Not sure I'm certain on this. Say the adapt example you gave, basically my implementation of _predict_interval in _StatsModelsAdapter. I'd have said it's non-functional as it can lead to NotImplementedError.

Ahh, I see - what I meant, you check the extract_conf_int with _has_implementation_of, not the _predict_interval! If that is overridden at least once, it is functional. If I understand the design correctly?

@yarnabrina
Copy link
Collaborator Author

Ahh, I see - what I meant, you check the extract_conf_int with _has_implementation_of, not the _predict_interval! If that is overridden at least once, it is functional. If I understand the design correctly?

As of now, definitely yes. But I suppose theoretically it's possible that in future one can inherit _StatsModelsAdapter and have a template extract_conf_int to be overridden by it's child classes afterwards. Admittedly a very stretched scenario, but possible I think.


As of now, I believe I managed to handle the first issue using your suggestion of dispatch to other methods. As I guessed last night, I failed to take benefit of super, as all inherited estimators (with no restriction on how low/high they are on hierarchy) need to call the method present in BaseForecaster. I'll highly appreciate a better way, probably with a smart super call, to replace L181.

https://github.com/yarnabrina/sktime-fork/blob/caadd66f083d6ec44c432b4323e66a9469aa3ee9/sktime/forecasting/base/adapters/_statsmodels.py#L177-L181

All tests on test_predict_intervaland test_predict_quantiles pass locally, and hopefully CI will report just tag failures. If that's the case, will add to _config.py for now referring to all failing statsmodels based estimators without the tag.

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 16, 2023

CI will report just tag failures.

I would add it as an explicit exception in the test, i.e., pass the test if _has_implementation_of ("extract_conf_int") (and add a comment where it is coming from, i.e., the statsmodels adapter)

I would also recommend, let's open an issue to review the test, and test the tag against actual functionality (if called) rather than _has_implementation_of.

@yarnabrina yarnabrina marked this pull request as ready for review April 16, 2023 12:21
@fkiraly
Copy link
Collaborator

fkiraly commented Apr 21, 2023

I assume this is ready to review and possibly merge?

@yarnabrina
Copy link
Collaborator Author

yarnabrina commented Apr 21, 2023 via email

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting refactor Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature. enhancement Adding new functionality labels Apr 21, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great, thanks!

Seeing this, I would do the following:

  • in the "returns" of the docstring of _extract_conf_int, be crystal clear what the implementation contract is. I.e. that it should return a DataFrame with two columns "lower", "upper", in this order, and row index should be integer, relative to cutoff, and should contain the relative integer horizon of fh ( and can be strictly larger)
  • I would implement the following default logic for _extract_conf_int: if there is a unique method ending in _int and having argument alpha, use that

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 21, 2023

I would implement the following default logic

that is, I would suggest [m for m in dir(prediction_results) if m.endswith("_int")], check that length equals 1 and has alpha as arg (via inspect.fullargspec), otherwise raise error

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 21, 2023

none blockers for now, but for consideration. Will merge tomorrow in this state if no change.

@yarnabrina
Copy link
Collaborator Author

Updated the docstring as suggested, please take a look.

I'm not comfortable adding default method based on patterns, skipped it for now. Even if we add it, it's not going to be used in inherited classes unless someone specifically enabled the capability:pred_int tag. And I'm personally of the opinion that it's significantly better to explicitly use a method in inherited classes.

@yarnabrina yarnabrina requested a review from fkiraly April 22, 2023 06:56
@fkiraly
Copy link
Collaborator

fkiraly commented Apr 22, 2023

ok, makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting refactor Restructuring without changing its external behavior. Neither fixing a bug nor adding a feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] refactoring of _predict_interval inheritance in descendants of _StatsModelsAdapter
2 participants