[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

julia-kraus · 2023-07-23T20:17:02Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Moved the sample data generation inside fixtures, so it doesn't run with each import, but only when testing is performed.
Moreover refactored the tests such that the output tests for interval predictions and quantile predictions are separated.

Reason: With only one parametrized test were 96 parameter combinations, which made the test very slow and CPU heavy.
Also, some parameter combinations are not necessary (quantile metrics belong to quantile forecast only and interval metrics only to interval forecast) - please correct me if I'm wrong.
Separating the tests speeds up the testing process considerably and makes debugging easier.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

I left the original test_output with the many parameter combinations in working and refactored state commented out.
Decide if you still need it and if not, delete the commented out function.

Did you add any tests for the change?

Added the test test_sample_data() to check if the sample data generating fixture sample_data() is working correctly.

Any other comments?

The output test function contains many nested if - else statements. In my opinion, these should be separate tests with separate parametrized inputs, instead of putting all inputs in one test, and then distinguishing with if-else statements.

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

…tests reason: If all test combinations are run in the same function, the computation takes too long, because there are more than 150 combinations

…edictions, and one for interval predictions Reason: The original functions had 96 parameter combinations and hence 96 tests. Also, quantiles and intervals have their dedicated metrics, so cross-combinations are unnecessary. The separation speeds up the test suite considerably.

fkiraly

NICE! This speeds up test collection time massively for me.

Minor but important thing - linting is failing.

See here for linting & precommit:
https://www.sktime.net/en/latest/developer_guide/coding_standards.html

fkiraly · 2023-07-23T20:24:48Z

Also, some parameter combinations are not necessary (quantile metrics belong to quantile forecast only and interval metrics only to interval forecast) - please correct me if I'm wrong.

Hm, I think they should work at least the way round that quantile metrics should be applicable to quantile for interval forecasts - this makes mathematical sense since interval upper/lower define quantile forecasts. If this was tested before and didn't raise errors, it means it's actually implemente that way.

julia-kraus · 2023-07-23T21:47:02Z

ok, then I'd leave the original big function in the hope that the test server can handle it :)
thumbs up?

fkiraly

Thanks!

On details regarding what exactly to test, I think we ought to revisit this anyway with a class inheriting from TestAllObjects, so I'd not spend too much effort on the details of what is tested.

As far as I can see, your code is an 1:1 translation.

run all_metrics for test of both interval and quantile data

…est_issue

julia-kraus · 2023-07-24T09:56:26Z

@fkiraly
So I kept the two different functions for the tests for intervals and quantiles, but run all metrics on both.

The tests finish green on my machine. However, both take an hour to complete. This should be fine, but if you're looking for performance improvements in the future, that might be the place to look :)

julia-kraus · 2023-07-24T12:00:31Z

There's a memory error on the test runner as well.. I guess the vectorized tests from before (combining several data sets into one vector) were more memory efficient, however harder to read, debug and change. Is there a way that we ca slim down the test combinations?

fkiraly · 2023-07-24T12:11:11Z

There's a memory error on the test runner as well.. I guess the vectorized tests from before

This is unrelated to your changes - it is an instance of this sporadic error
#4610
which we haven't managed to properly diagnose yet.

julikraus added 5 commits July 22, 2023 15:47

intermediate result

ac8d969

intermediate status

d35c162

refactor: intermediate: refactor tests. Split interval and quantiles …

3255600

…tests reason: If all test combinations are run in the same function, the computation takes too long, because there are more than 150 combinations

docs: add julia-kraus to contributors

31b2c31

julia-kraus requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners July 23, 2023 20:17

fkiraly requested changes Jul 23, 2023

View reviewed changes

style: run black and flake

59be591

fkiraly previously approved these changes Jul 24, 2023

View reviewed changes

fkiraly added 2 commits July 24, 2023 10:14

Update test_probabilistic_metrics.py

c174fd2

Merge branch 'main' into pr/4946

694e062

fkiraly dismissed their stale review via 694e062 July 24, 2023 09:14

fkiraly added module:tests test framework functionality - only framework, excl specific tests enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules labels Jul 24, 2023

julikraus added 2 commits July 24, 2023 11:52

test:

852753e

run all_metrics for test of both interval and quantile data

Merge branch 'pytest_issue' of github.com:julia-kraus/sktime into pyt…

9958cd5

…est_issue

fkiraly merged commit f4815d7 into sktime:main Jul 24, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

julia-kraus commented Jul 23, 2023 •

edited

fkiraly left a comment

fkiraly commented Jul 23, 2023

julia-kraus commented Jul 23, 2023

fkiraly left a comment

julia-kraus commented Jul 24, 2023

julia-kraus commented Jul 24, 2023 •

edited

fkiraly commented Jul 24, 2023

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

Conversation

julia-kraus commented Jul 23, 2023 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly commented Jul 23, 2023

julia-kraus commented Jul 23, 2023

fkiraly left a comment

Choose a reason for hiding this comment

julia-kraus commented Jul 24, 2023

julia-kraus commented Jul 24, 2023 • edited

fkiraly commented Jul 24, 2023

julia-kraus commented Jul 23, 2023 •

edited

julia-kraus commented Jul 24, 2023 •

edited