Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] forecasting evaluate utility failing with quantile forecasts #5336

Closed
fkiraly opened this issue Oct 2, 2023 · 3 comments
Closed

[ENH] forecasting evaluate utility failing with quantile forecasts #5336

fkiraly opened this issue Oct 2, 2023 · 3 comments
Labels
bug Something isn't working module:metrics&benchmarking metrics and benchmarking modules
Projects

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 2, 2023

There is a failure on main related to evaluate failing with quantile forecasts, specifically test_evaluate_with_window_splitters. Full diagnostic output can be seen here: https://github.com/sktime/sktime/actions/runs/6379899253/job/17313415803?pr=5083

As the full suite ran through with the 0.23.0 release, the regression most likely has been introduced with the only PR since then that changed the evaluate logic: #5192 (the test depends on probabilistic metrics and evaluate, and only one of the two have changed since 0.23.0)

The CI for 5192 did not detect this as we have conditional testing, and the failing test is not registered as to be triggered by changes in evaluate, unlike the tests in test_evaluate, instead it is registered as specific to the interval forecasting wrappers.

This also means that the evaluate specific tests in test_evaluate - which run if evaluate is changed - did not cover the failure, while they should.

FYI @hazrulakmal

@fkiraly fkiraly added enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules labels Oct 2, 2023
@fkiraly fkiraly changed the title [ENH] evaluate failing with quantile forecasts [ENH] forecasting evaluate utility failing with quantile forecasts Oct 2, 2023
@fkiraly fkiraly added bug Something isn't working and removed enhancement Adding new functionality labels Oct 2, 2023
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Oct 2, 2023
@fkiraly fkiraly moved this from Needs triage & validation to Investigating in Bugfixing Oct 2, 2023
@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 2, 2023

oh, I remember what this is about - this is related to our earlier discussion that the pinball loss was capturing the alpha or coverage parameter for the predict method. We have not resolved that question, have we?

The problem is that previously, there was an internal hacky logic that passed on parameters attached to the metric to the evaluator, which now has been removed with the improved, streamlined interface.

That is, this used to work and no longer does, as the alpha are ignored:

evaluate(
        stuff,
        scoring=PinballLoss(alpha=[0.1, 0.5, 0.9]),
        more_stuff,
    )

There are two remaining problems:

  • now we can no longer pass alpha to the metric to evaluate, we need to find a syntax before the next release
  • the matter of deprecating the functionality, in case we decide to change the syntax

@fkiraly
Copy link
Collaborator Author

fkiraly commented Oct 2, 2023

A salomonic option would be:

  • add another argument to evaluate or deal with the matter another way sustainably
  • should the metric have a fitting parameter or attribute, read it from there, until possibly deprecation

fkiraly added a commit that referenced this issue Oct 5, 2023
…onditional testing (#5337)

This PR manually links one test in `test_interval_wrappers` reliant on
`evaluate` to changes in `evaluate`, i.e., that the respective test is
run when code in `evaluate` changes.

This is to prevent a future occurrence of
#5336, i.e., improvements to
`evaluate`.

Optimally, the tests in `test_evaluate` would cover the case here, ut
that does ot seem e the status quo.
fkiraly added a commit that referenced this issue Oct 5, 2023
…trics to `evaluate` (#5354)

This PR ensures pre-existing syntax to pass `alpha` and `coverage` via
metrics to `evaluate` works again, fixing
#5336.

Not commenting here on whether the status quo is a good idea or not (I
think it was cleaner to remove it, or is, in the long run), but such a
change should not happen without deprecation.

Depends on #5337, so this change
should trigger the test that is failing on `main`.
@fkiraly
Copy link
Collaborator Author

fkiraly commented Dec 29, 2023

solved here: #5354

@fkiraly fkiraly closed this as completed Dec 29, 2023
Bugfixing automation moved this from Investigating to Fixed/resolved Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:metrics&benchmarking metrics and benchmarking modules
Projects
Bugfixing
Fixed/resolved
Development

No branches or pull requests

1 participant