Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures #4946

Merged
merged 10 commits into from Jul 24, 2023

Conversation

julia-kraus
Copy link
Contributor

@julia-kraus julia-kraus commented Jul 23, 2023

Reference Issues/PRs

Fixes #4907

What does this implement/fix? Explain your changes.

Moved the sample data generation inside fixtures, so it doesn't run with each import, but only when testing is performed.
Moreover refactored the tests such that the output tests for interval predictions and quantile predictions are separated.

Reason: With only one parametrized test were 96 parameter combinations, which made the test very slow and CPU heavy.
Also, some parameter combinations are not necessary (quantile metrics belong to quantile forecast only and interval metrics only to interval forecast) - please correct me if I'm wrong.
Separating the tests speeds up the testing process considerably and makes debugging easier.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

I left the original test_output with the many parameter combinations in working and refactored state commented out.
Decide if you still need it and if not, delete the commented out function.

Did you add any tests for the change?

Added the test test_sample_data() to check if the sample data generating fixture sample_data() is working correctly.

Any other comments?

The output test function contains many nested if - else statements. In my opinion, these should be separate tests with separate parametrized inputs, instead of putting all inputs in one test, and then distinguishing with if-else statements.

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

…tests

reason: If all test combinations are run in the same function, the computation takes too long, because there are more than 150 combinations
…edictions, and one for interval predictions

Reason: The original functions had 96 parameter combinations and hence 96 tests. Also, quantiles and intervals have their dedicated metrics, so cross-combinations are unnecessary. The separation speeds up the test suite considerably.
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE! This speeds up test collection time massively for me.

Minor but important thing - linting is failing.

See here for linting & precommit:
https://www.sktime.net/en/latest/developer_guide/coding_standards.html

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 23, 2023

Also, some parameter combinations are not necessary (quantile metrics belong to quantile forecast only and interval metrics only to interval forecast) - please correct me if I'm wrong.

Hm, I think they should work at least the way round that quantile metrics should be applicable to quantile for interval forecasts - this makes mathematical sense since interval upper/lower define quantile forecasts. If this was tested before and didn't raise errors, it means it's actually implemente that way.

@julia-kraus
Copy link
Contributor Author

ok, then I'd leave the original big function in the hope that the test server can handle it :)
thumbs up?

fkiraly
fkiraly previously approved these changes Jul 24, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

On details regarding what exactly to test, I think we ought to revisit this anyway with a class inheriting from TestAllObjects, so I'd not spend too much effort on the details of what is tested.

As far as I can see, your code is an 1:1 translation.

@fkiraly fkiraly added module:tests test framework functionality - only framework, excl specific tests enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules labels Jul 24, 2023
run all_metrics for test of both interval and quantile data
@julia-kraus
Copy link
Contributor Author

@fkiraly
So I kept the two different functions for the tests for intervals and quantiles, but run all metrics on both.

The tests finish green on my machine. However, both take an hour to complete. This should be fine, but if you're looking for performance improvements in the future, that might be the place to look :)

@julia-kraus
Copy link
Contributor Author

julia-kraus commented Jul 24, 2023

There's a memory error on the test runner as well.. I guess the vectorized tests from before (combining several data sets into one vector) were more memory efficient, however harder to read, debug and change. Is there a way that we ca slim down the test combinations?

@fkiraly
Copy link
Collaborator

fkiraly commented Jul 24, 2023

There's a memory error on the test runner as well.. I guess the vectorized tests from before

This is unrelated to your changes - it is an instance of this sporadic error
#4610
which we haven't managed to properly diagnose yet.

@fkiraly fkiraly merged commit f4815d7 into sktime:main Jul 24, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:metrics&benchmarking metrics and benchmarking modules module:tests test framework functionality - only framework, excl specific tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] rewrite test_probabilistic_metrics using proper pytest fixtures, or directly generate pd.DataFrame.
3 participants