[ENH] forecasting `evaluate` utility failing with quantile forecasts #5336

fkiraly · 2023-10-02T13:13:50Z

There is a failure on main related to evaluate failing with quantile forecasts, specifically test_evaluate_with_window_splitters. Full diagnostic output can be seen here: https://github.com/sktime/sktime/actions/runs/6379899253/job/17313415803?pr=5083

As the full suite ran through with the 0.23.0 release, the regression most likely has been introduced with the only PR since then that changed the evaluate logic: #5192 (the test depends on probabilistic metrics and evaluate, and only one of the two have changed since 0.23.0)

The CI for 5192 did not detect this as we have conditional testing, and the failing test is not registered as to be triggered by changes in evaluate, unlike the tests in test_evaluate, instead it is registered as specific to the interval forecasting wrappers.

This also means that the evaluate specific tests in test_evaluate - which run if evaluate is changed - did not cover the failure, while they should.

FYI @hazrulakmal

The text was updated successfully, but these errors were encountered:

fkiraly · 2023-10-02T18:21:55Z

oh, I remember what this is about - this is related to our earlier discussion that the pinball loss was capturing the alpha or coverage parameter for the predict method. We have not resolved that question, have we?

The problem is that previously, there was an internal hacky logic that passed on parameters attached to the metric to the evaluator, which now has been removed with the improved, streamlined interface.

That is, this used to work and no longer does, as the alpha are ignored:

evaluate(
        stuff,
        scoring=PinballLoss(alpha=[0.1, 0.5, 0.9]),
        more_stuff,
    )

There are two remaining problems:

now we can no longer pass alpha to the metric to evaluate, we need to find a syntax before the next release
the matter of deprecating the functionality, in case we decide to change the syntax

fkiraly · 2023-10-02T21:53:25Z

A salomonic option would be:

add another argument to evaluate or deal with the matter another way sustainably
should the metric have a fitting parameter or attribute, read it from there, until possibly deprecation

…onditional testing (#5337) This PR manually links one test in `test_interval_wrappers` reliant on `evaluate` to changes in `evaluate`, i.e., that the respective test is run when code in `evaluate` changes. This is to prevent a future occurrence of #5336, i.e., improvements to `evaluate`. Optimally, the tests in `test_evaluate` would cover the case here, ut that does ot seem e the status quo.

…trics to `evaluate` (#5354) This PR ensures pre-existing syntax to pass `alpha` and `coverage` via metrics to `evaluate` works again, fixing #5336. Not commenting here on whether the status quo is a good idea or not (I think it was cleaner to remove it, or is, in the long run), but such a change should not happen without deprecation. Depends on #5337, so this change should trigger the test that is failing on `main`.

fkiraly · 2023-12-29T10:40:26Z

solved here: #5354

fkiraly added enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules labels Oct 2, 2023

fkiraly changed the title ~~[ENH] evaluate failing with quantile forecasts~~ [ENH] forecasting evaluate utility failing with quantile forecasts Oct 2, 2023

fkiraly added bug Something isn't working and removed enhancement Adding new functionality labels Oct 2, 2023

fkiraly added this to Needs triage & validation in Bugfixing via automation Oct 2, 2023

fkiraly mentioned this issue Oct 2, 2023

[ENH] Link test_interval_wrappers.py to changes in evaluate for conditional testing #5337

Merged

fkiraly moved this from Needs triage & validation to Investigating in Bugfixing Oct 2, 2023

This was referenced Oct 3, 2023

&benheid [BUG] allow alpha and coverage to be passed again via metrics to evaluate #5354

Merged

[MNT] python 3.12 compatibility #5364

Closed

fkiraly closed this as completed Dec 29, 2023

Bugfixing automation moved this from Investigating to Fixed/resolved Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] forecasting `evaluate` utility failing with quantile forecasts #5336

[ENH] forecasting `evaluate` utility failing with quantile forecasts #5336

fkiraly commented Oct 2, 2023 •

edited

fkiraly commented Oct 2, 2023

fkiraly commented Oct 2, 2023

fkiraly commented Dec 29, 2023

[ENH] forecasting evaluate utility failing with quantile forecasts #5336

[ENH] forecasting evaluate utility failing with quantile forecasts #5336

Comments

fkiraly commented Oct 2, 2023 • edited

fkiraly commented Oct 2, 2023

fkiraly commented Oct 2, 2023

fkiraly commented Dec 29, 2023

[ENH] forecasting `evaluate` utility failing with quantile forecasts #5336

[ENH] forecasting `evaluate` utility failing with quantile forecasts #5336

fkiraly commented Oct 2, 2023 •

edited