[ENH] idea: testing environments by estimator #5719

fkiraly · 2024-01-09T18:24:21Z

An orthogonal idea for testing, FYI @yarnabrina:

If I write you python code that retrieves:

all unique sets of dependencies, for individual estimators
for each unique set in 1, the estimators giving rise to it

Would it be easy to set up CI that runs tests specific to these estimators? Say, if this is controllable via a pytest flag?

I think this is the only setup that truly scales with number of estimators going to infinity, because ultimately task specific modules will have the same problem of interacting dependency trees.

The text was updated successfully, but these errors were encountered:

yarnabrina · 2024-01-10T15:36:40Z

Can you please explain a bit more details please? I want to understand it better first.

When I first read, I interpreted following happens.

PR is created by user
Detect modified modules using git
Extract (possibly) modified estimators __all__ of these modules (are these always present?)
Loop over estimators and modify the list if any has dependencies through inheritance
Look over estimators and detect python verion and python dependencies from tags (do they always exist??)
Store mapping of estimator name with python and soft dependency requirements (as JSON??)
CI will loop over this dynamic output and create one job for each estimator * supported python version for that estimator * 3 operating systems

After a second read, I am not sure at all. Can you please share the steps you are planning (in python and in CI yaml)?

If possible, please tell me at what step my above understanding went wrong and it'll be easier for me to follow.

fkiraly · 2024-01-10T21:11:12Z

Yes, I think you got it right what I meat, except for step 7.
Sorry for not explaining clearly.

The dynamic output should be:

Part 1: find all estimators that are affected by the change (affected, e.g., via inheritance etc)

Part 2: create enviroments ad run tests

for each unique environment spec in play (python version, OS, packages installed)
collect all estimators that are affected
and run these tests in the environment (python version, OS, packages installed)

In most cases, only one estimator is affected, and then it is run for the product of python version and OS, with the current primary satisfying environment, i.e., package versios installed satisfyig the estimator's requirements.

yarnabrina · 2024-02-08T18:12:16Z

2. Detect modified modules using git

Here's an idea to achieve this from a different discussion:

What I was thinking is very optimistic, and may have other problems. What I was thinking is to do this:

start CI with a python 3.11+ job, which has tomllib.

read current pyproject.toml and that from main.

specifically compare which sets of dependency specifications vary, which will be available as dictionaries if I am not mistaken.

identify mismatched specifications.

identify names of packages from mismatches, and python requirements if any.

use it to find affected estimators, and affected environments if any.

trigger CI only for those environment-estimator combinations (related to [ENH] idea: testing environments by estimator #5719)

It's very different from the current PR I think, probably not worth considering. We can close this conversation.

Originally posted by @yarnabrina in #5727 (comment)

This ideally should work with definite guarantee with correct parsin of only dedidated blocks, but with a slight chance of false positive is already addressed by @fkiraly in #5727 using git diff.

…yproject.toml` (#5727) This PR adds a condition to differential testing, so classes whose dependencies have been updated in `pyproject.toml` are always tested. This logic is based on an utility that determines which package dependencies are changed by a pull request, and adds a condition The utility could further be useful in: * hypothetical test environment setup per estimator, such as discussed in #5719

fkiraly added module:tests test framework functionality - only framework, excl specific tests enhancement Adding new functionality API design API design & software architecture labels Jan 9, 2024

yarnabrina mentioned this issue Jan 10, 2024

[MNT] Design and rework of CI testing workflows #5706

Open

fkiraly mentioned this issue Jan 11, 2024

[ENH] testing estimators whose package dependencies are changed in pyproject.toml #5727

Merged

fkiraly mentioned this issue Jan 20, 2024

[MNT] improvements to modular CI framework - part 2, merge frameworks #5785

Merged

fkiraly mentioned this issue Feb 22, 2024

[MNT] deep learning vs default dependency sets #5979

Open

yarnabrina mentioned this issue Mar 7, 2024

[MNT] update statsforecast version in forecasting extra #6064

Merged

fkiraly mentioned this issue Apr 19, 2024

[MNT] lack of test coverage of pandas 2.2.X *and* deep learning backends #6315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] idea: testing environments by estimator #5719

[ENH] idea: testing environments by estimator #5719

fkiraly commented Jan 9, 2024

yarnabrina commented Jan 10, 2024

fkiraly commented Jan 10, 2024 •

edited

yarnabrina commented Feb 8, 2024

[ENH] idea: testing environments by estimator #5719

[ENH] idea: testing environments by estimator #5719

Comments

fkiraly commented Jan 9, 2024

yarnabrina commented Jan 10, 2024

fkiraly commented Jan 10, 2024 • edited

yarnabrina commented Feb 8, 2024

fkiraly commented Jan 10, 2024 •

edited