New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946
Conversation
…tests reason: If all test combinations are run in the same function, the computation takes too long, because there are more than 150 combinations
…edictions, and one for interval predictions Reason: The original functions had 96 parameter combinations and hence 96 tests. Also, quantiles and intervals have their dedicated metrics, so cross-combinations are unnecessary. The separation speeds up the test suite considerably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE! This speeds up test collection time massively for me.
Minor but important thing - linting is failing.
See here for linting & precommit:
https://www.sktime.net/en/latest/developer_guide/coding_standards.html
Hm, I think they should work at least the way round that quantile metrics should be applicable to quantile for interval forecasts - this makes mathematical sense since interval upper/lower define quantile forecasts. If this was tested before and didn't raise errors, it means it's actually implemente that way. |
ok, then I'd leave the original big function in the hope that the test server can handle it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
On details regarding what exactly to test, I think we ought to revisit this anyway with a class inheriting from TestAllObjects
, so I'd not spend too much effort on the details of what is tested.
As far as I can see, your code is an 1:1 translation.
@fkiraly The tests finish green on my machine. However, both take an hour to complete. This should be fine, but if you're looking for performance improvements in the future, that might be the place to look :) |
There's a memory error on the test runner as well.. I guess the vectorized tests from before (combining several data sets into one vector) were more memory efficient, however harder to read, debug and change. Is there a way that we ca slim down the test combinations? |
This is unrelated to your changes - it is an instance of this sporadic error |
Reference Issues/PRs
Fixes #4907
What does this implement/fix? Explain your changes.
Moved the sample data generation inside fixtures, so it doesn't run with each import, but only when testing is performed.
Moreover refactored the tests such that the output tests for interval predictions and quantile predictions are separated.
Reason: With only one parametrized test were 96 parameter combinations, which made the test very slow and CPU heavy.
Also, some parameter combinations are not necessary (quantile metrics belong to quantile forecast only and interval metrics only to interval forecast) - please correct me if I'm wrong.
Separating the tests speeds up the testing process considerably and makes debugging easier.
Does your contribution introduce a new dependency? If yes, which one?
No
What should a reviewer concentrate their feedback on?
I left the original
test_output
with the many parameter combinations in working and refactored state commented out.Decide if you still need it and if not, delete the commented out function.
Did you add any tests for the change?
Added the test
test_sample_data()
to check if the sample data generating fixturesample_data()
is working correctly.Any other comments?
The output test function contains many nested if - else statements. In my opinion, these should be separate tests with separate parametrized inputs, instead of putting all inputs in one test, and then distinguishing with if-else statements.
PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktime
root directory (not theCONTRIBUTORS.md
). Common badges:code
- fixing a bug, or adding code logic.doc
- writing or improving documentation or docstrings.bug
- reporting or diagnosing a bug (get this pluscode
if you also fixed the bug in the PR).maintenance
- CI, test framework, release.See here for full badge reference
See here for further details on the algorithm maintainer role.
For new estimators
docs/source/api_reference/taskname.rst
, follow the pattern.Examples
section.python_dependencies
tag and ensureddependency isolation, see the estimator dependencies guide.