Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] split optional dependencies and CI testing per components #5101

Closed
4 tasks done
yarnabrina opened this issue Aug 15, 2023 · 20 comments
Closed
4 tasks done

[MNT] split optional dependencies and CI testing per components #5101

yarnabrina opened this issue Aug 15, 2023 · 20 comments
Labels
enhancement Adding new functionality maintenance Continuous integration, unit testing & package distribution module:tests test framework functionality - only framework, excl specific tests

Comments

@yarnabrina
Copy link
Collaborator

yarnabrina commented Aug 15, 2023

I want to suggest a continuation from the recent introduction of CI testing only per module.

As of now, sktime offers only one optional dependencies: all_extras, which is massive and definitely has some packages which have overlapping (if not conflicting) dependencies. My suggestion is to introduce few more optional dependencies, one per each component:

  1. forecasting
  2. classification
  3. transformation
  4. regression
  5. clustering
  6. etc.

What pip install sktime[forecasting] will do is install sktime and all soft dependencies by all forecasters in sktime, and similarly for other components.

There can be further classifications, for example forecasting-dl vs forecasting-non-dl can be one split, as deep learning packages (e.g. torch) often have serious system dependencies which may not be always possible to be used by all users in all environments.

After addition of these, I want to propose to split the CI test job into component wise parts. For example, instead of using one single job on python 3.9 and macos, we can have multiple, one for each component. Each of this component runs only if change is detected in that directory or in base (Github Actions support this natively) and if selected, runs the tests on modified modules in that component after only installing the relevant sub-dependencies.

Proposed tasks

  • optional dependency specification in pyproject.toml per component
  • CI job split per component
  • CI job condition change per component
  • new CI job to be run manually before each release (or as required) for testing all components

Inviting @fkiraly, @benHeid, @achieveordie to consider this suggestion and share feedbacks.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 15, 2023

  • agreed on dependency specifications, that might make it easier for users to get what they want
    • although I'm not sure about separating transformations. There are some dependencies there which seem more specific to one or the other learning task, such as holidays being specific to forecasting.
  • re splitting up CI elements, is that really necessary now that we have incremental testing? I think we'll get down to below 10 minutes with the decorators for non-suite tests without problems. Given this, what is the benefit from splitting the CI?
    • it will have no effect on the TestAll tests anymore, as long as only one estimator is changed
    • it will have no effect on non-suite tests that are decorated with run_test_for_class, which in the optimal end state are all estimator specific ones
    • and the remaining tests are of a kind that I would always execute, as they cover the framework (even if it is specific to an estimator type)

@fkiraly fkiraly added maintenance Continuous integration, unit testing & package distribution module:tests test framework functionality - only framework, excl specific tests enhancement Adding new functionality labels Aug 15, 2023
@yarnabrina
Copy link
Collaborator Author

re splitting up CI elements, is that really necessary now that we have incremental testing? I think we'll get down to below 10 minutes with the decorators for non-suite tests without problems. Given this, what is the benefit from splitting the CI?

Let's consider a case: I (any contributor) create a PR modifying AutoARIMA forecaster.

If I am not wrong, the way it currently works is pytest first lists all tests across all components, which is ~77K tests and it takes takes about 1 minute. The numbers are based on my local setup. Then it goes over all of these, decides whether to run it or not, and then goes over the remaining to decide whether to run in this job or something else.

I think this is not required, as I (the contributor) and reviewers will know that it is enough to look between only forecasting modules. It'll be much smaller list to loop through just sktime/forecasting instead of full sktime. Also, if something fails unexpectedly on a full CI run on main, browsing traceback corresponding the failed test in the huge test log seems really difficult to me.

So, the way I see it, it'll help us to have logically similar smaller groups of tests, and easy to debug. Also, I don't know how coverage gets combined from multiple jobs currently, but this will allow us to report coverage per component as well.

(To be honest, my suggestion was really based on unnecessary search over all components and smaller logical grouping. Coverage seemed like a nice possible side effect now, didn't think of it earlier.)


Even if you don't agree about 2nd or 3rd tasks, I think the 4th task (manually triggerable CI job to run all tests) is still relevant? Or, do we have a way to test this now already? I remembered some CRON job discussion @hazrulakmal suggested somewhere, but don't know if it's done already or not.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 15, 2023

Let's consider a case

I see - I consider this a valid, and strong argument for a lookup architecture instead of the current filtering architecture. Where lookup/filtering refers to the logic to find the tests to execute.

Filtering would be: find all tests first, and then decide for each and everyone which to run. Examples are exhaustive search then filter by condition or property.

Lookup would be: determine a list of tests to run first, and then find them. Examples would be dispatch or list lookup.

Noting that the "split by module" archithecture proposed in the OP moves from the current, which is entirely filtering based, to a lookup/filtering hybrid. But, also noting, that the argument above is not specifically in favour of this solution (which I consider a bit over-complicating).

Following this line of analysis, would a pure lookup architecture not be best, hence?
The general question being, which entry points we would have to address. Is it all_estimators and the fixture generation in the TestAll classes, or is it also pytest's own crawling mechanisms?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 15, 2023

I think the 4th task (manually triggerable CI job to run all tests) is still relevant? Or, do we have a way to test this now already? I remembered some CRON job discussion @hazrulakmal suggested somewhere, but don't know if it's done already or not.

I agree that this is desirable and we should do this.

The current solution is here: #5083 (manual trigger to run all tests), but CRON and automated seems better. I just don't know how to set it up, off the top of my head. I could probably learn it, but then it becomes a time prioritization exercise.

@yarnabrina
Copy link
Collaborator Author

I'm glad to convince you, but got a bit worried by this:

But, also noting, that the argument above is not specifically in favour of this solution (which I consider a bit over-complicating).

Following this line of analysis, would a pure lookup architecture not be best, hence?

Just to confirm we're talking about same thing, here's what I was trying to suggest:

current case:

old-all.yml
name: test workflow
on:
  pull_request:
    branches:
      - main
  push:
    branches:
      - main
jobs:
  test-job-id:
    name: test job name
    strategy:
      fail-fast: false
      matrix:
        python-version:
          - "3.10"
          - "3.11"
        os:
          - macos-latest
          - ubuntu-latest
          - windows-latest
    runs-on: ${{ matrix.os }}
    steps:
      - name: repository checkout step
        uses: actions/checkout@v3
      - name: python environment step
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: dependencies installation step
        run: python3 -m pip install .[all_extras]
      - name: unit test step
        run: python3 -m pytest
      - name: test coverage step
        uses: codecov/codecov-action@v3

proposed case:

new-all.yml
name: all test workflow
on:
  schedule:
    - cron: 0 0 * * 0
  workflow_dispatch:
jobs:
  all-test-job-id:
    name: all test job name
    strategy:
      fail-fast: false
      matrix:
        python-version:
          - "3.10"
          - "3.11"
        os:
          - macos-latest
          - ubuntu-latest
          - windows-latest
    runs-on: ${{ matrix.os }}
    steps:
      - name: repository checkout step
        uses: actions/checkout@v3
      - name: python environment step
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: dependencies installation step
        run: python3 -m pip install .[all_extras]
      - name: unit test step
        run: python3 -m pytest sktime
      - name: test coverage step
        uses: codecov/codecov-action@v3
new-base.yml
name: base test workflow
on:
  pull_request:
    branches:
      - main
    paths:
      - sktime/base
  push:
    branches:
      - main
    paths:
      - sktime/base
jobs:
  base-test-job-id:
    name: base test job name
    strategy:
      fail-fast: false
      matrix:
        python-version:
          - "3.10"
          - "3.11"
        os:
          - macos-latest
          - ubuntu-latest
          - windows-latest
    runs-on: ${{ matrix.os }}
    steps:
      - name: repository checkout step
        uses: actions/checkout@v3
      - name: python environment step
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: dependencies installation step
        run: python3 -m pip install .
      - name: unit test step
        run: python3 -m pytest sktime/base
      - name: test coverage step
        uses: codecov/codecov-action@v3
new-forecast.yml
name: forecast test workflow
on:
  pull_request:
    branches:
      - main
    paths:
      - sktime/foreasting
  push:
    branches:
      - main
    paths:
      - sktime/foreasting
jobs:
  foreast-test-job-id:
    name: forecast test job name
    strategy:
      fail-fast: false
      matrix:
        python-version:
          - "3.10"
          - "3.11"
        os:
          - macos-latest
          - ubuntu-latest
          - windows-latest
    runs-on: ${{ matrix.os }}
    steps:
      - name: repository checkout step
        uses: actions/checkout@v3
      - name: python environment step
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
      - name: dependencies installation step
        run: python3 -m pip install .[forecasting]
      - name: unit test step
        run: python3 -m pytest sktime/forecasting
      - name: test coverage step
        uses: codecov/codecov-action@v3

etc.

None of these workflows are tested, these are just for illustration. There are definitely lots of duplications, which hopefully can be solved with nox like tools in combination with published actions like dorny/paths-filter.

So, my idea was that for a new PR, forecast workflow will be triggered only if there's a change in sktime/forecasting sub-directory, and only tests contained in that will be executed. The non-necessary installations should also be skipped, which means that this will be tested against latest version of the packages allowed by other packages in this soft-dependency, and since it'd be a lot less than overall list of dependencies, it'll be differently and likely newer than current setup. Similarly for other components, including base framework though that can be modified as required.

And for manual/scheduled trigger of all tests, new-all workflow can be used. It can be triggered manually anytime manually from UI (may not be accessible to all though) and will also run every Sunday at 00:00 UTC.

Given the above, can you please elaborate on these? I didn't understand those points.

The general question being, which entry points we would have to address. Is it all_estimators and the fixture generation in the TestAll classes, or is it also pytest's own crawling mechanisms?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 15, 2023

The non-necessary installations should also be skipped, which means that this will be tested against latest version of the packages allowed by other packages in this soft-dependency, and since it'd be a lot less than overall list of dependencies, it'll be differently and likely newer than current setup. Similarly for other components, including base framework though that can be modified as required.

Ah, I did not appreciate that this would also reduce requirement conflicts!

That's a new argument in favour (at least in my head, if might have been obvious to others earlier in the thread).

I now have a question about this - how would this play along with the pandas 2 vs pandas 1 testing, which imo we still need to support both for a while?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 15, 2023

Given the above, can you please elaborate on these? I didn't understand those points.

Hm, I think I did not succeed to communicate the lookup model?

I mean, there is a third, not discussed option which your strongest (first) argument seems to be in favour of, because it is a general argument in favour of lookup based architecture, rather than primarily in favour of your proposed chunking of dependency sets (which is closer to lookup than status quo, but not the "pure" form of it).

Is it clear what I meant by lookup architeture?

@yarnabrina
Copy link
Collaborator Author

Sorry, but I'm completely lost.

Lookup would be: determine a list of tests to run first, and then find them.

My understanding was this:

  1. determine - pass path to pytest to tell it to ignore everything else
  2. find - newly added module change logic

I'm guessing you meant something else, and still struggling to identify the 3rd pure form option that is supposed to be coming from my own arguments 😞

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 16, 2023

Hm, ok, let me maybe explain better.

The general context is the task of retrieval, "find all X with the properties Y".

High-level, there's a slow (but general) and a fast (but specific) way to do this.

The slow way is, first find all X; then, for each X check whether it has property Y. If there is a large number of X, and a small fraction of those with property Y, this can be arbitrarily inefficient.

A faster way would be to index the property Y, or more generally, specific properties. E.g., to prepare a list of X with property Y. That way, we only access that list, based on the "indexing" of Y.

In terms of abstract data strutures, the "slow" way could be implemented by a list of X which we have to go through. Each list element points to tags/properties.

The "fast" way could be implemented by a lookup dictionary, i.e., properties pointing to sub-lists of X.

Currently, we are doing the first; what I was saying, we should consider moving to a smarter indexing.

@yarnabrina
Copy link
Collaborator Author

Are you saying something along the lines of what I was saying here: #4762? With Y being properties to test for an estimator, and X being list of estimators?

I think I understand the two points you said above, but I'm still confused on how to relate this to package/workflow split though. Or may may be that's where I'm stuck, you're not talking about that at all and suggesting to replace pytest_generate_tests or some other way to modify how pytest lists available tests?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 16, 2023

Are you saying something along the lines of what I was saying here: #4762?

Yes, I think so - not from a user perspective though, but as an architectural principle in test collection.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 16, 2023

but I'm still confused on how to relate this to package/workflow split though.

The package/workflow split restricts the set of estimators X to a sub-list per CI element, namely the property Y "is a forecaster" or "is a classifier" etc. So, it introduces some indexing, from forecasters to all tests in the forecaster module, what I'm saying, that is still a pretty wide net as most of the tests may not be relevant in the context of the change.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 19, 2023

@yarnabrina, I have thought about this and think the "lookup" would probably be much harder to realise than the "split by estimator type" that you propose.

I also think that splitting is better than the status quo.

So, I am now convinced of doing this!
Assuming it doesn't get too fiddly...

fkiraly pushed a commit that referenced this issue Sep 9, 2023
fkiraly pushed a commit that referenced this issue Oct 21, 2023
…5304)

Part 2, 3 and 4 of #5101
Depends on #5375

1. new extra with just test dependencies
2. action to run tests in `sktime/base` with only core and test
dependencies
3. action to run tests in `sktime/<component>` with only core, test and
component specific dependencies
4. workflow to run base framework tests only when `sktime/base` gets
modified
5. workflow to run component specific tests only when
`sktime/<component>` gets modified
6. workflow to run all (base framework and all components) tests every
SUnday 00:00 and manually if desired
@yarnabrina
Copy link
Collaborator Author

Based on my understanding, #5136 + #5304 implement the proposals in this issue. However, they do not yet replace the existing CI, and they run alongside. This is to reduce breaking changes in the sense that don't break what is working unless we know for sure new CI has all capability of old CI and does something extra that is beneficial.

However, having both CI run will increase CI time significantly, and there may be some limit in Github free planning as well. Therefore, requesting @fkiraly, @benHeid, @achieveordie, @hazrulakmal and other @sktime/core-developers to take a look at these pull requests and let me know what is missing in new CI as of now. I'll add my observations about what is missing below.

  1. I added these extras: alignment, annotation, classification, clustering, forecasting, networks, param_est, regression, transformations. All tests (if any) in sktime/<extra_name> will be tested in new CI, along with sktime/base. Another PR [MNT] Dataset downloader testing workflow #5437 introduces independent workflow for sktime/datasets as well. The other sub-directories, e.g. sktime/benchmarking, sktime/performance_metrics, sktime/registry, sktime/split, etc. are not tested as of now.

    • I am not familiar well enough with all of these, but my idea is that not all of these are useful on their own, so creating new extra for each may not make sense. Few may not even need any extra soft dependency. So, we can either create valid or dummy extras for all of these and test as we are testing forecasting or classification now, or we can try to modify base workflow and run tests as part of that workflow installing no additional soft dependencies. This however needs confirmation from someone familiar with these modules saying what all will need extra dependencies and what all does not, and what all do not even need regular testing (probably empty subset, but still).
  2. The old CI few other jobs in test.yml other than test-full. Few of these are very specific, e.g. test-unix-pandas1, test-cython-estimators, etc. There are few that are more generic, e.g. run-notebook-examples, test-mlflow, etc.

    • I am not sure what to do with these, but do not feel that these needs to be tested for every single PR. One possibility is adding these to run only as part of scheduled CRON workflows (or specific manual triggers) and not part of every PR.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

However, they do not yet replace the existing CI, and they run alongside. This is to reduce breaking changes in the sense that don't break what is working unless we know for sure new CI has all capability of old CI and does something extra that is beneficial.

I was of the opinion that the new tests should focus on testing estimators, we can then remove this from the old tests.
That is, we should replace the TestAll elements with the new CI, and leave the rest of the old CI unchanged.

I am not familiar well enough with all of these

Then we should have the planned meeting where we ask each other about the new and old CI. I assumed you were familiar with these and had a plan on the mid-term state of tests...

Imo we should clarify the mid-term plan before the next release (early Nov?) and then decide which state we leave the old tests in.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 22, 2023

Answering further Q by @yarnabrina from #5424:

New CI is not testing all directories, as I created extras for few major ones only. Should we create new extras and/or plan how to integrate these in new CI, if you and other core developers agree to eventually replace old CI?

Yes, and we should aim to reduce overall test time and do the math carefully. Let's try to plan in the next dev meeting (Oct 27), or is that inconvenient due to holidays?

@yarnabrina
Copy link
Collaborator Author

Oct 27 UTC 3 pm is fine for me.

I was of the opinion that the new tests should focus on testing estimators, we can then remove this from the old tests.
That is, we should replace the TestAll elements with the new CI, and leave the rest of the old CI unchanged.

I'm not sure I follow, but along with other CI confusions, we can discuss in the next dev call.

fkiraly pushed a commit that referenced this issue Nov 12, 2023
Related #5101, #5475

* this fixes the "lower coverage" issue by adding the `test_all`
workflow, which tests everyting minus the module conditional tests and
always runs
* now, if anything changes in `base`, it also runs all module jobs
@yarnabrina
Copy link
Collaborator Author

@fkiraly hopefully this is the right issue to talk about old CI retirement/deprecation, as this initiated new CI plans. All tasks as per the main issue is already addressed over different PR's.

I suggest the following for old CI:

  1. disable current workflow (test.yml and cancel.yml) from CI settings -> I believe project maintainer will have this access, and it means it can be reactivated any time without any change. (ref. https://docs.github.com/en/actions/using-workflows/disabling-and-enabling-a-workflow)
  2. create a new workflow copying test.yml except the following sections and let it run weekly on schedule, with the option to trigger manually anytime.
    • code-quality
    • test-full
  3. The following can also be removed, but I am not sure why those were split in the first place so keeping these separate.
    • test-nodevdeps: not sure what is this one for.
    • test-nosoftdeps, test-nosoftdeps-full: are these not covered already, not sure why these are tested separately in 2 more jobs.

(Note: if you and other @sktime/core-developers agree, 1+2 is basically equivalent to commenting those 2 sections and adding a CRON job configuration, so that can be done as well.)

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 1, 2024

Sounds good, although I would like to ensure I can easily switch to new workflow (test & release) before we completely remove the old CI.

There are four issues currently, for me:

  • names of jobs are too long, many jobs look the same in the GitHub CI GUI tab. I do not want to switch or complicate worfklows, so we should shorten the names.
  • many jobs are spawned in some cases. Maybe this cannot be reduced, but it makes searching for single failed jobs hard
  • new CI does not auto-cancel like the old CI, this creates additional clicks: [ENH] new CI not covered by cancel workflow #5525
  • new CI always shows as failed, even if all jobs pass (in the "actions" view)

@yarnabrina
Copy link
Collaborator Author

Closing this as it's implemented already, and further discussions on new CI can take place in #5706

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality maintenance Continuous integration, unit testing & package distribution module:tests test framework functionality - only framework, excl specific tests
Projects
None yet
Development

No branches or pull requests

2 participants