New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] FallbackForecaster
fails with ForecastByLevel
when nan_predict_policy='raise'
#6231
[BUG] FallbackForecaster
fails with ForecastByLevel
when nan_predict_policy='raise'
#6231
Conversation
FallbackForecaster
fails with ForecastByLevel
when nan_predict_policy='raise'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major concerns, but a quick question:
why is the original line not working?
I'm concerned that the tests did not catch this.
None of the original tests were designed to handle MultiIndex DataFrames. In scenarios where In the first case, where |
On a side note, I see a notification for changes requested but I don't see any actual change requests to the code. Am I missing something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see! I understand now.
The logic attached to nan_predict_policy
required pd.DataFrame
format.
Then this indicates a bigger, unreported bug, which we should try to fix, let me explain.
For y_inner_mtype
, X_inner_mtype
, we allowed ALL_TIME_SERIES_MTYPES
, because the exception handling logic did not require any specific type. E.g., 3D numpy
arrays, weird nested pd.DataFrame
, and dask
array were all fine.
When you added the nan handling policy, you assumed pd.DataFrame
, and this was fine since the regular testing scenarios only resulted in few types to be passed.
There are a few ways to fix the issue:
- if
nan_handling_policy
is nofignore
, in__init__
, we set theX_inner_mtype
andy_inner_mtype
viaself.set_tags
to the types supported, presumbaly["pd.DataFrame", "pd-multiindex", "pd_multiindex_hier"]
. This ensures that in_predict
and_fit
, you only ever see these types. - or, leave the types as they are, and ensure the logic is correct for all data types.
I would say that 2. is infeasible, so we ought to do 1.?
Regarding test coverage, this shows a gap in our testing framework - estimators aren't tested with all data types they could get.
Are you certain that the hierarchical type causes the issues? Then we may have to add a forecaster tests with a hierarchical example. I thought this was covered.
On a side note, I see a notification for changes requested but I don't see any actual change requests to the code. Am I missing something?
No, it is possible to make change requests without reference to code lines, e.g., in general questions like these.
Only with
Could you either post the code that reproduces the issue (assuming you have it handy), or add it directly as a test case to (and, perhaps with a smaller example, perhaps only 2 x 2 x 2 levels?) |
Unless I'm misunderstanding something here I think the error is a pretty simple fix. This test reproduces the error: def test_forecastbylevel_nan_predict():
from sktime.forecasting.compose import ForecastByLevel
from sktime.utils._testing.hierarchical import _make_hierarchical
df = _make_hierarchical(
hierarchy_levels=(2, 2, 2),
max_timepoints=10,
min_timepoints=10,
same_cutoff=True,
n_columns=1,
all_positive=True,
index_type="period",
random_state=0,
add_nan=False,
)
forecaster1 = DummyForecaster(raise_at=None, predict_nans=True)
forecaster2 = NaiveForecaster()
forecaster = ForecastByLevel(
FallbackForecaster(
[
("forecaster1_pred_nans", forecaster1),
("forecaster2_expected_", forecaster2),
],
nan_predict_policy="raise",
)
)
fh = [1, 2, 3]
forecaster.fit(y=df, fh=fh)
y_pred_actual = forecaster.predict()
forecaster2.fit(y=df, fh=[1, 2, 3])
y_pred_expected = forecaster2.predict()
pd.testing.assert_frame_equal(y_pred_expected, y_pred_actual) Error:
|
I think you are, the issue is that users can pass any of 10 data formats to forecasters, and they are converted if not internally supported. This PR fixes the issue for one additional format; the programmatic way would be to set the tag. |
Here's another option: instead of the custom nan checking logic, we could use The expected mtype is in This might be the simplest way to fix this? |
Please let me know if this sounds confusing and I can try to add the fix. It's a few lines, given the discussion. |
There's a few suggestions here, let me start with the earlier one
Just to clarify, we expected a series or a dataframe here, the hiearchical dataframe is where issues arose. If we were to set the mtypes in init, as in your initial suggestion, would it look something like this? def __init__(self, forecasters, verbose=False, nan_predict_policy="ignore"):
super().__init__()
self.forecasters = forecasters
self.current_forecaster_ = None
self.current_name_ = None
self.verbose = verbose
self.nan_predict_policy = _check_nan_policy_option(nan_predict_policy)
if self.nan_predict_policy != "ignore":
allowed_mtypes = ["pd.DataFrame", "pd-multiindex", "pd_multiindex_hier"]
self.set_tags(**dict(
y_inner_mtype=allowed_mtypes,
x_inner_mtype=allowed_mtypes
)) I think I'm missing something, because it doesn't seem to change the behavior from my point of view, eg if I remove |
Yes, that's how it would look like. I think my last solution is better though, using |
which code did you expect not to execute? |
I think I understand now. I was assuming that somewhere in the code, an exception would be raised if the wrong data type was passed in, eg within |
yes, exactly. |
I think I get it. Would it look something like this? def _validate_y_pred(self, y_pred):
if self.nan_predict_policy in ("warn", "raise"):
last_mtype = self._y_mtype_last_seen
expected_mtypes = ["pd.DataFrame", "pd-multiindex", "pd_multiindex_hier"]
passed_mcheck = check_is_mtype(
y_pred,
expected_mtypes
)
if not passed_mcheck:
msg = "`nan_predict_policy` expects mtype data to be one of " \
f"{expected_mtypes} but intead got {last_mtype}. If you think " \
f"FallbackForecaster's nan_predict_policy should be able to " \
f"handle this datatype, raise an issue on sktime"
raise NotImplementedError(msg) I'm not sure what's the best way to add a test case though. What other types of data are available to forecast with? I didn't see any obvious resource online to read more about it. |
not exactly, I was thinking to use the check to get the metadata field def _validate_y_pred(self, y_pred):
if self.nan_predict_policy in ("warn", "raise"):
last_mtype = self._y_mtype_last_seen
_, _, metadata = check_is_mtype(
y_pred,
mtype=last_mtype,
return_metadata=["has_nans"],
)
has_nans = metadata["has_nans"] Now you have a variable You can now use this in further logic in place of Noting, we know that the mtpe should be |
perhaps - could you kindly check, which forecaster returns the unexpected type? |
I think there's a bug either in UPDATE: Found the bug, it was because I had left the following programmed into the init. def __init__(self, forecasters, verbose=False, nan_predict_policy="ignore"):
super().__init__()
self.forecasters = forecasters
self.current_forecaster_ = None
self.current_name_ = None
self.verbose = verbose
self.nan_predict_policy = _check_nan_policy_option(nan_predict_policy)
if self.nan_predict_policy in ("ignore", "warn"):
allowed_mtypes = [
"pd.DataFrame", "pd-multiindex", "pd_multiindex_hier",
# "pd.Series" # This line was missing from the init
]
self.set_tags(**dict(
y_inner_mtype=allowed_mtypes,
x_inner_mtype=allowed_mtypes
)) I'm not very familiar with the handling mechanisms for @fkiraly, would it be appropriate to remove this line and maintain the logic as described in your suggestion, i.e., by adding the |
Ah, of course, that was it - the two fixes should not be applied at the same time.
Yes, that is imo indeed the fix to the secondary bug - simply remove any additional logic from |
Fails when nan_predict_policy = 'raise' or 'warn'. Revised nan check using `check_is_mtype` PR = sktime#6231
…ub.com/ninedigits/sktime into bug_fix_fallback_nan_forecasterbylevel
@fkiraly Any more updates needed on my end or are we good to go? I saw that the check failed earlier but I wasn't sure if it was an actual issue with the code. |
let's see - I'll restart the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Fixes error that occurs when
FallbackForecaster
is combined withForecastByLevel
andnan_predict_policy='raise'
#6230