Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] idea: testing environments by estimator #5719

Open
fkiraly opened this issue Jan 9, 2024 · 3 comments
Open

[ENH] idea: testing environments by estimator #5719

fkiraly opened this issue Jan 9, 2024 · 3 comments
Labels
API design API design & software architecture enhancement Adding new functionality module:tests test framework functionality - only framework, excl specific tests

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 9, 2024

An orthogonal idea for testing, FYI @yarnabrina:

If I write you python code that retrieves:

  1. all unique sets of dependencies, for individual estimators
  2. for each unique set in 1, the estimators giving rise to it

Would it be easy to set up CI that runs tests specific to these estimators? Say, if this is controllable via a pytest flag?

I think this is the only setup that truly scales with number of estimators going to infinity, because ultimately task specific modules will have the same problem of interacting dependency trees.

@fkiraly fkiraly added module:tests test framework functionality - only framework, excl specific tests enhancement Adding new functionality API design API design & software architecture labels Jan 9, 2024
@yarnabrina
Copy link
Collaborator

Can you please explain a bit more details please? I want to understand it better first.

When I first read, I interpreted following happens.

  1. PR is created by user
  2. Detect modified modules using git
  3. Extract (possibly) modified estimators __all__ of these modules (are these always present?)
  4. Loop over estimators and modify the list if any has dependencies through inheritance
  5. Look over estimators and detect python verion and python dependencies from tags (do they always exist??)
  6. Store mapping of estimator name with python and soft dependency requirements (as JSON??)
  7. CI will loop over this dynamic output and create one job for each estimator * supported python version for that estimator * 3 operating systems

After a second read, I am not sure at all. Can you please share the steps you are planning (in python and in CI yaml)?

If possible, please tell me at what step my above understanding went wrong and it'll be easier for me to follow.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 10, 2024

Yes, I think you got it right what I meat, except for step 7.
Sorry for not explaining clearly.

The dynamic output should be:

Part 1: find all estimators that are affected by the change (affected, e.g., via inheritance etc)

Part 2: create enviroments ad run tests

  • for each unique environment spec in play (python version, OS, packages installed)
  • collect all estimators that are affected
  • and run these tests in the environment (python version, OS, packages installed)

In most cases, only one estimator is affected, and then it is run for the product of python version and OS, with the current primary satisfying environment, i.e., package versios installed satisfyig the estimator's requirements.

@yarnabrina
Copy link
Collaborator

2. Detect modified modules using git

Here's an idea to achieve this from a different discussion:

What I was thinking is very optimistic, and may have other problems. What I was thinking is to do this:

  1. start CI with a python 3.11+ job, which has tomllib.
  2. read current pyproject.toml and that from main.
  3. specifically compare which sets of dependency specifications vary, which will be available as dictionaries if I am not mistaken.
  4. identify mismatched specifications.
  5. identify names of packages from mismatches, and python requirements if any.
  6. use it to find affected estimators, and affected environments if any.
  7. trigger CI only for those environment-estimator combinations (related to [ENH] idea: testing environments by estimator #5719)

It's very different from the current PR I think, probably not worth considering. We can close this conversation.

Originally posted by @yarnabrina in #5727 (comment)

This ideally should work with definite guarantee with correct parsin of only dedidated blocks, but with a slight chance of false positive is already addressed by @fkiraly in #5727 using git diff.

fkiraly added a commit that referenced this issue Feb 10, 2024
…yproject.toml` (#5727)

This PR adds a condition to differential testing, so classes whose
dependencies have been updated in `pyproject.toml` are always tested.

This logic is based on an utility that determines which package
dependencies are changed by a pull request, and adds a condition

The utility could further be useful in:

* hypothetical test environment setup per estimator, such as discussed
in #5719
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture enhancement Adding new functionality module:tests test framework functionality - only framework, excl specific tests
Projects
None yet
Development

No branches or pull requests

2 participants