New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] forecasting evaluate
utility failing with quantile forecasts
#5336
Comments
evaluate
failing with quantile forecastsevaluate
utility failing with quantile forecasts
oh, I remember what this is about - this is related to our earlier discussion that the pinball loss was capturing the alpha or coverage parameter for the predict method. We have not resolved that question, have we? The problem is that previously, there was an internal hacky logic that passed on parameters attached to the metric to the evaluator, which now has been removed with the improved, streamlined interface. That is, this used to work and no longer does, as the evaluate(
stuff,
scoring=PinballLoss(alpha=[0.1, 0.5, 0.9]),
more_stuff,
) There are two remaining problems:
|
A salomonic option would be:
|
…onditional testing (#5337) This PR manually links one test in `test_interval_wrappers` reliant on `evaluate` to changes in `evaluate`, i.e., that the respective test is run when code in `evaluate` changes. This is to prevent a future occurrence of #5336, i.e., improvements to `evaluate`. Optimally, the tests in `test_evaluate` would cover the case here, ut that does ot seem e the status quo.
…trics to `evaluate` (#5354) This PR ensures pre-existing syntax to pass `alpha` and `coverage` via metrics to `evaluate` works again, fixing #5336. Not commenting here on whether the status quo is a good idea or not (I think it was cleaner to remove it, or is, in the long run), but such a change should not happen without deprecation. Depends on #5337, so this change should trigger the test that is failing on `main`.
solved here: #5354 |
There is a failure on
main
related toevaluate
failing with quantile forecasts, specificallytest_evaluate_with_window_splitters
. Full diagnostic output can be seen here: https://github.com/sktime/sktime/actions/runs/6379899253/job/17313415803?pr=5083As the full suite ran through with the 0.23.0 release, the regression most likely has been introduced with the only PR since then that changed the
evaluate
logic: #5192 (the test depends on probabilistic metrics andevaluate
, and only one of the two have changed since 0.23.0)The CI for 5192 did not detect this as we have conditional testing, and the failing test is not registered as to be triggered by changes in
evaluate
, unlike the tests intest_evaluate
, instead it is registered as specific to the interval forecasting wrappers.This also means that the
evaluate
specific tests intest_evaluate
- which run ifevaluate
is changed - did not cover the failure, while they should.FYI @hazrulakmal
The text was updated successfully, but these errors were encountered: