Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] logic problem with new CI - coverage of TestAllEstimators and TestAllObjects incorrect #6352

Closed
fkiraly opened this issue Apr 28, 2024 · 16 comments · Fixed by #6353
Closed
Labels
bug Something isn't working module:tests test framework functionality - only framework, excl specific tests

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 28, 2024

I think there is a bug in the new CI. The coverage is not affected since "old CI" is still running.

The issue occurs when, say, a forecaster changes.

The new CI will then trigger tests in sktime.forecasting, but not in sktime.tests which contains TestAllEstimator.
So, we run the forecaster specific tests for the changed forecaster, but not the object and estimator specific tests.

FYI @yarnabrina, is there a way to fix this?
The "one liner" to "run all tests for estimator X" is check_estimator - however, this is intentionally avoided in the original setup as an entry point since this distributes less well across workers, via pytest-xdist.

@fkiraly fkiraly added bug Something isn't working module:tests test framework functionality - only framework, excl specific tests labels Apr 28, 2024
@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

The "opposite case" is where there are changes in "other" modules, as well as in a forecaster. The TestAllEstimators test for that forecaster then runs in the "other" test job, not in the "module/forecasters" bucket:
https://github.com/sktime/sktime/actions/runs/8861405811/job/24333294008?pr=6348

@yarnabrina
Copy link
Member

I just commented in Discord as well, this job is not coming from "other" case in new CI. It's coming from old CI relying on the Makefile, specifically this line is being called to run the tests:

run: make test_without_datasets

New CI jobs will have been called by this line instead:

run: >-
python3
-m pytest
sktime
--ignore sktime/base
--ignore sktime/datasets
--ignore sktime/alignment
--ignore sktime/annotation
--ignore sktime/classification
--ignore sktime/clustering
--ignore sktime/forecasting
--ignore sktime/networks
--ignore sktime/param_est
--ignore sktime/regression
--ignore sktime/transformations
--matrixdesign True
--only_changed_modules True

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

Sorry, wrong link:
https://github.com/sktime/sktime/actions/runs/8861405811/job/24334091449?pr=6348

This is coming from "other" in new CI, screenshot:
image

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

The solution could be, to run tests in all "module" tests, that will at least ensure that the estimator/object tests are run for changed forecasters.

We cannot turn them off in "other", as estimators may live in "other" modules. But that will trigger only if an "other" and a "module" estimator is changed in the same PR, so perhaps low incidence.

@yarnabrina
Copy link
Member

I've created PR #6353, please take a look. But based on sample runs on my laptop, this is going to make new CI significantly slower, even the test collection part. For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

Only test collection?

Before we merge then, can we diagnose where this is coming from?

Collection should certainly not take so long.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

I just checked construction times with the code below. It looks like construction is not the culprit, most likely.
Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

from sktime.utils.validation._dependencies import _check_estimator_deps
from sktime.registry import all_estimators

ests = all_estimators(return_names=False)

def _measure_init_time(cls, params=None):
    from time import time

    start = time()
    try:
        cls(**params)
    except Exception:
        pass
    end = time()
    return end - start

times = []

for est in ests:
    if _check_estimator_deps(est, severity="none"):
        params = est.get_test_params()
        if not isinstance(params, list):
            params = [params]
        for param in params:
            times.append((est, _measure_init_time(est, param)))

@yarnabrina
Copy link
Member

Just in case it's device or OS specific, what are the times for you for the above reported cases? I used these commands:

# pre-PR
python -m pytest sktime/forecasting --co

# post-PR
python -m pytest sktime/forecasting sktime/tests --co

@yarnabrina
Copy link
Member

I just checked construction times with the code below. It looks like construction is not the culprit, most likely.
Most estimators construct instantaneously, there are some which need 10s of milliseconds which is probably too long (but does not explain collection). It could of course be that I am missing something in a soft dependency I do not have installed.

I think construction or soft dependencies do not affect the above two commands I shared. I am not 100% confident though.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

I ran the first commands on main, python 3.11, windows - I cancelled it after it was running for 10 minutes.

That is very odd. Last time we measured test collection time, it was 15sec (minimal soft deps), or 1min (many extras)

update:

  • 1st command: 661 secs (11 min)
  • 2nd runs >20min, terminated

my show_versions:

Python dependencies:
          pip: 23.3.1
       sktime: 0.28.1
      sklearn: 1.3.2
       skbase: 0.7.6
        numpy: 1.26.2
        scipy: 1.11.4
       pandas: 2.1.4
   matplotlib: 3.8.3
       joblib: 1.3.2
        numba: 0.59.1
  statsmodels: 0.14.0
     pmdarima: 2.0.4
statsforecast: None
      tsfresh: 0.20.2
      tslearn: None
        torch: None
   tensorflow: None
tensorflow_probability: None

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

what are your timings?

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

I think we are getting closer - I updated the issue #6344 with the problem description. I will now run a profiler on the test collection which takes too long.

@yarnabrina
Copy link
Member

For example, the times before and after the change for forecasting with just test collection (--co flag) is 21.85 seconds (43669 tests) vs 314.01 seconds (79603 tests).

These are my times. But I wonder why's mine so much faster than yours!!

Idea: I don't have pytest-xdist or other plugins installed so disabled those in configuration (setup.cfg). Can you try that once?

@yarnabrina
Copy link
Member

I also note how strange tihs is, because isn't 600s the stanard timeout for pytest collect?

Yes, it's configured in setup.cfg. My guess is it is for test execution, not for collection. Don't know how to check that.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

That is very odd. 20 sec are more in line with the times from this older issue: #4900

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 28, 2024

should we move the test times discussion to here? #6344

yarnabrina added a commit that referenced this issue May 9, 2024
Fixes #6352

This PR adds coverage of `sktime/tests` for per module CI workflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:tests test framework functionality - only framework, excl specific tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants