New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] test_predict_proba
is expecting column name of pred_dist
to be 0
always.
#5349
Comments
test_predict_proba
function in sktime.forecasting.tests.test_all_forecasters.py
is expecting column name of pred_dist
to be 0
always.test_predict_proba
is expecting column name of pred_dist
to be 0
always.
I think this is related to #4709 and #4780. @Vasudeva-bit I think the issue is if isinstance(y_train, pd.Series) and y_train.name:
assert (pred_cols == y_train.name).all()
elif isinstance(y_train, pd.Series): # automatically implies empty name
assert (pred_cols == pd.Index([0])).all()
else:
assert (pred_cols == y_train.columns).all() |
Hm, I think not as closely as one might suspect. The two PRs and related design problems were restricted to interval and quantile forecasts. I think it is more related to "should we use The root of evil is really that depending on how you convert or coerce Unfortunately this also needs to be consistent in many places, conversions, back-conversions, etc. While the original problem - I hope - has been resolved, it seems to have reappeared with the |
Yes, @yarnabrina, I think your test is the correct expectation. But it seems to me that it would be violated by many current forecasters, as @Vasudeva-bit has observed by accident. Also, in case it is not the current behaviour, we need to run it through a deprecation period to move it to the one that seems more logical - people might have adapted to it downstream, and we do not want to break their code. |
A few tests are expecting name to be preserved even if it is Could we return forecasts as This way we can test for only one single type. And, while returning the predicted forecasts from |
That is an interface decision we have already discussed a number of times, but then the question becomes, what do we do with other data containers, such as Not saying that it would be a bad choice, but it would be an exception from the "support multiple formats" that we currently have. |
Oh, I get it. It's better not to treat them specially. |
Having said that, the issue you highlighted originally seems to be an inconsistency we may like to fix - ahtough it requires deprecation. Could you perhaps open a PR which has only your originally proposed fix (changes to test and arch), so we can use this as a starting point? I would use that to look into the problem (domino effects on other estimators) and build a deprecation strategy around it. |
I opened a PR #5384 with originally proposed fix. |
Fixes #5349 Changes a if (check) condition in `_check_predict_proba` function of `sktime/forecasting/tests/test_all_forecasters.py` to preserve column name in univariate forecaster predictions. Upon fixing this issue, `ARCH` doesn't require function `_predict_proba` as `BaseForecaster`'s `_predict_proba` is sufficient. Hence, removed that function.
Related to #5349 (issue), #5384 (PR). > After debugging for a while, > > 1. Firstly, the function `_check_predict_proba` is only testing for `y_train` generated from `_make_data(n_columns=n_columns)`. If the forecaster is univariate, then the `y_train` generated would always have column name `None` of type `str`. The `_check_predict_proba` checks whether the `_predict_proba` is changing `None` to `0` internally. Since every forecaster's `_predict_proba` doing it, all forecasters passed. > 2. Actually `ARCH` should fail this test, because `_predict_proba` function is deleted in this PR. But, the `type(y_train.name)` is `str` (`'None'`), not `None`. Therefore, `None` case is skipped and checks whether `(pred_cols == y_train.name).all()` which is `None` of `str` equals `None` of `str` so, `ARCH` also passed. > > I think it's my bad, there is no bug in `_check_predict_proba` as it is not designed to check whether column names are preserved. Changes should only be in `ARCH`'s `_predict_proba` like below: > > 1. If the column name is `None` of type `None`/`str`, do `pred_dist.name = pd.Index([0])`. > 2. Otherwise, just return output of `Super()._predict_proba()` as it is, i.e., column names are preserved by default. I think there was some ambiguity in my explanation for #5349 (issue) in #5384 (PR), but what I meant was exactly the changes in this PR.
Describe the bug
All the tests in
check_estimator
are expecting column names to be preserved, while a single testcheck_predict_proba
function, invoked insidetest_predict_proba
, insktime.forecasting.tests.test_all_forecasters.py
is expecting column name ofpred_dist
i.e., the output ofpredict_proba
, to be0
always.To Reproduce
location: here.
Expected behavior
The function mentioned above has to be modified such that, it checks for column names of
y_train
,pred_dist
, like rest of the tests.Additional context
Any changes in this test will affect existing univariate probabilistic forecasters such as
NaiveForecaster
. Therefore, fixing this issue also requires necessary changes in all the univariate probabilistic forecasters.The text was updated successfully, but these errors were encountered: