Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] in BaseForecaster, move check for capability:insample to _check_fh boilerplate #6593

Merged
merged 12 commits into from
Jun 25, 2024

Conversation

Xinyu-Wu-0000
Copy link
Contributor

For issue [ENH] in sample fh should raise NotImplementedError in _check_fh #6579

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This approach will not work, because at the time the code is added:

  • fh could be None, so the code will fail
  • cutoff may not have been seen yet, which is required to know whether fh is in or out of sample relative to the training data

My recommendation is, kindly test the code locally, together with some forecasters, some of which can forecast in-sample, some that cannot.

@fkiraly fkiraly changed the title [ENH] In sample check [ENH] in BaseForecaster, move check for capability:insample to _check_fh boilerplate Jun 14, 2024
@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting enhancement Adding new functionality labels Jun 14, 2024
@Xinyu-Wu-0000
Copy link
Contributor Author

I am adding more changes than I originally thought and I am not sure if there is a better way to implement it with less impact. @fkiraly could you take a look?

@@ -380,7 +380,7 @@ def fit(self, y, X=None, fh=None):
self._update_y_X(y_inner, X_inner)

# check forecasting horizon and coerce to ForecastingHorizon object
fh = self._check_fh(fh)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_fh is not a pure/static method, it is already using parameters from self. So instead of passing here, I would suggest to simply get self._cutoff at the start, _cutoff = self._cutoff, and add it in the docstring on the list of accessed attributes if not already present there. That way, you need to change much less of the lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

and not self._fh.is_all_out_of_sample(cutoff)
):
msg = (
f"in-sample prediction is currently not supported: "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the forecaster name more early in the error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -616,7 +617,8 @@ def predict_quantiles(self, fh=None, X=None, alpha=None):
# input checks and conversions

# check fh and coerce to ForecastingHorizon, if not already passed in fit
fh = self._check_fh(fh)
cutoff = get_cutoff(self._y, self.cutoff, return_index=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why you are recomputing the cutoff, can we not get self._cutoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if self._cutoff is updated before _chech_fh. In some functions like update_predict_single, _check_cutoff is actully called before _update_y_X. I moved _check_cutoff and removed passing cutoff now.

else self.get_tag("capability:pred_int:insample")
)
if (
not in_sample_pred
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these conditions all needed?

E.g., is there an example in which we do know fh but do not know the cutoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure so I put them all. I tested it without cutoff, it worked. cutoff is removed now.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some review comments.

  • passing of self._cutoff could be replaced by inspection of self, is that correct?
  • why are we recomputing the cutoff in proba methods instead of looking it up? Explanation would be appreciated, if you think it's necessary
  • comment about the condition in the block added, is that too complex or is it necessary (please explain)

Further, we should pick one or two forecasters that raise this message manually currently, and remove the private raise there. Otherwise we have no example where the message would be correctly raised.

Short-term, possibly in this PR, we would like to do this for all forecasters that do not support in-sample predictions.

@@ -227,9 +227,12 @@ def test_predict_time_index(

# if estimator cannot forecast in-sample and fh is in-sample, terminate
# if the tag correctly states this, we consider this fine as per contract
# todo: check that estimator raises error message when fitting instead
# check that estimator raises error message when fitting
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use with pytest.raises instead, that is the idiomatic check for an exception being raised. Also check that the error message agrees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Xinyu-Wu-0000
Copy link
Contributor Author

Xinyu-Wu-0000 commented Jun 21, 2024

Further, we should pick one or two forecasters that raise this message manually currently, and remove the private raise there.

All forecasters with capability:insample=False or capability:pred_int:insample=False are checked to remove private raise.

                                         name                                             object
0                              CINNForecaster  <class 'sktime.forecasting.conditional_inverti... (cleared in parent class)
1           DirRecTabularRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Dir... (cleared)
2        DirRecTimeSeriesRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Dir... (cleared)
3           DirectTabularRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Dir... (cleared)
4        DirectTimeSeriesRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Dir... (cleared)
5                               DynamicFactor  <class 'sktime.forecasting.dynamic_factor.Dyna... (no NotImplementedError)
6                    HFTransformersForecaster  <class 'sktime.forecasting.hf_transformers_for... (cleared)
7                       LTSFDLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFDLinearFor... (cleared in parent class)
8                        LTSFLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFLinearFore... (cleared in parent classr)
9                       LTSFNLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFNLinearFor... (cleared in parent class)
10     MultioutputTabularRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Mul... (cleared)
11  MultioutputTimeSeriesRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Mul... (cleared)
12                         NeuralForecastLSTM  <class 'sktime.forecasting.neuralforecast.Neur... (cleared)
13                          NeuralForecastRNN  <class 'sktime.forecasting.neuralforecast.Neur... (cleared)
14                            PyKANForecaster  <class 'sktime.forecasting.pykan_forecaster.Py... (no NotImplementedError)
15       RecursiveTabularRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Rec... (cleared)
16    RecursiveTimeSeriesRegressionForecaster  <class 'sktime.forecasting.compose._reduce.Rec... (cleared)
17                          SquaringResiduals  <class 'sktime.forecasting.squaring_residuals.... (no NotImplementedError)
                        name                                             object
0             CINNForecaster  <class 'sktime.forecasting.conditional_inverti... (cleared in parent class)
1         ConformalIntervals  <class 'sktime.forecasting.conformal.Conformal... (no NotImplementedError)
2   HFTransformersForecaster  <class 'sktime.forecasting.hf_transformers_for... (cleared in parent class)
3      LTSFDLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFDLinearFor... (cleared in parent class)
4       LTSFLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFLinearFore... (cleared in parent class)
5      LTSFNLinearForecaster  <class 'sktime.forecasting.ltsf.LTSFNLinearFor... (cleared in parent class)
6            PyKANForecaster  <class 'sktime.forecasting.pykan_forecaster.Py... (no NotImplementedError)
7          SquaringResiduals  <class 'sktime.forecasting.squaring_residuals.... (no NotImplementedError)
8          StatsForecastMSTL  <class 'sktime.forecasting.statsforecast.Stats... (no NotImplementedError)
9                        VAR               <class 'sktime.forecasting.var.VAR'> (no NotImplementedError)
10                    VARMAX         <class 'sktime.forecasting.varmax.VARMAX'> (no NotImplementedError)
11                      VECM             <class 'sktime.forecasting.vecm.VECM'> (no NotImplementedError)

@Xinyu-Wu-0000
Copy link
Contributor Author

All change requests are addressed and no error in the CI now. @fkiraly could you take a look if they are resovled?

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 21, 2024

Nice!

Quick question: why do VAR, VARMAX, VECM etc not fail currently?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points addressed, from my side.

We should understand why VAR etc are not failing currently or did not need action, before we merge. I do not understand that, any ideas, @Xinyu-Wu-0000?

@Xinyu-Wu-0000
Copy link
Contributor Author

Xinyu-Wu-0000 commented Jun 22, 2024

We should understand why VAR etc are not failing currently or did not need action

In test_predict_time_index, if a forecaster does not support insample predict, the test will simple return None and skip the test, so the forecaster actully doesn't see any insample input in the test, therefor no forecasters should fail.

In test_predict_time_index_in_sample_full, if a forecaster does not support probabilistic insample predict, the test will simple return None and skip the predict_interval and predict_quantiles, so the forecaster actully doesn't see any insample input for the predict_interval and predict_quantiles functions. ConformalIntervals, StatsForecastMSTL, VAR, VARMAX, VECM support insample point prediction but not insample probabilistic prediction, so they should not fail.

However I am not sure why DynamicFactor, PyKANForecaster and SquaringResiduals do not fail in test_predict_time_index_in_sample_full as they are taking insample input for point prediction.

Actully, DynamicFactor and PyKANForecaster in main branch failed in my local environment with check_estimator but I am not sure why they didn't fail in CI. SquaringResiduals in main branch passed in my local environment with check_estimator.

When I search "SquaringResiduals insample" with Google Search, I found potential bug with fh for in-sample prediction with forecasters that require fh in fit. In this implementation, "capability:insample": False, is actully remove from the tag list of SquaringResiduals.

@fkiraly
Copy link
Collaborator

fkiraly commented Jun 22, 2024

hm, odd. Could you open an issue with the failure so we do not forget? I think this can be merged, and that issue is separate.

@fkiraly fkiraly merged commit 04644f7 into sktime:main Jun 25, 2024
42 of 45 checks passed
Spinachboul pushed a commit to Spinachboul/sktime that referenced this pull request Jun 29, 2024
…check_fh` boilerplate (sktime#6593)

For issue [ENH] in sample fh should raise NotImplementedError in
_check_fh sktime#6579
Spinachboul pushed a commit to Spinachboul/sktime that referenced this pull request Jun 29, 2024
…check_fh` boilerplate (sktime#6593)

For issue [ENH] in sample fh should raise NotImplementedError in
_check_fh sktime#6579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants