Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pytest-test-groups by custom tests splitter #114381

Merged
merged 20 commits into from
Apr 3, 2024
Merged

Conversation

edenhaus
Copy link
Contributor

@edenhaus edenhaus commented Mar 28, 2024

Breaking change

Proposed change

The new script/split_tests.py script does the following:

  • Collecting all tests with pytest
  • Create a combined list for all test folders including their tests file and sort them afte the number of tests. (For each TestFolder, add itself but also all it's children to the list)
  • For each entry in the list, do:
    • If entry was already added to bucket, skip it
    • Get the smallest bucket
    • If it is a TestFolder, check if the folder fits into the bucket. If yes add it, otherwise skip it
    • If entry is a TestFile, it will always added to the smallest bucket
  • Add we don't split a single file, it can happen that the buckets difference slightly in size
  • Create the filepytest_buckets.txt, where each line is a different bucket.
    • The file is upload as pytest_bucket artifact and can be used reproduce test issues

This script is executed once and the produced file will be shared as an artifact to the test jobs.
The test jobs no longer need to discover all tests but only a subset specified by the bucket. This reduces a lot of overhead.

Comparison:

Savings 176 minutes -> ~ 3 hours on each full CI run

Additional explanation can be found in the comment #114381 (comment)

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant home-assistant bot added cla-signed small-pr PRs with less than 30 lines. labels Mar 28, 2024
@edenhaus edenhaus force-pushed the edenhaus-split-tests branch 2 times, most recently from c681aad to 787040e Compare March 28, 2024 16:39
@edenhaus edenhaus changed the title Use custom tests splitter Replace pytest-test-groups by custom tests splitter Mar 28, 2024
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
.github/workflows/ci.yaml Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
edenhaus and others added 4 commits March 29, 2024 08:28
Co-authored-by: Martin Hjelmare <marhje52@gmail.com>
Co-authored-by: Paulus Schoutsen <balloob@gmail.com>
@edenhaus
Copy link
Contributor Author

edenhaus commented Mar 29, 2024

Update the script to sort the folders and files first (from most to least tests) and add them to the smallest bucket.
So the buckets are nearly equally sized.
I try to add the biggest folder first (if they are fitting) as it has shown that the pytest discovery is faster on a subset of folders compared to randomly adding files to a bucket. We can save around ~10-15 minutes with this approach compared to fully random. See #114434 Probably due less conftest-files and fixture analysis.

Nevertheless, I need to check the tests, as some fail if they are executed randomly.

script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved
script/split_tests.py Outdated Show resolved Hide resolved

def add_to_bucket(self) -> None:
"""Add test file to bucket."""
if self.added_to_backet:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen? we run over a flattened list of all files, and process each file exactly once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never happen. It's a safety if the script gets modified and suddenly a test will be added twice

script/split_tests.py Outdated Show resolved Hide resolved
@edenhaus
Copy link
Contributor Author

edenhaus commented Mar 29, 2024

@balloob I try to explain it better.
Currently, we have 34231 tests, which results in a bucket size of ~3424.

The flatten function (renamed it to get_all_flatten) creates a list of TestFolder and TestFile.
After sorting it , the first items are the following: (Copied from the output)

34231 tests in tests (TestFolder)
29298 tests in tests/components  (TestFolder)
 2019 tests in tests/components/mqtt  (TestFolder)
 1958 tests in tests/util  (TestFolder)
 1816 tests in tests/helpers  (TestFolder)
 1352 tests in tests/util/test_unit_conversion.py  (TestFile)

The loop does the following:

  1. There is no space for 34231 tests in the smallest bucket (as the limit is 3424), skip the entry
  2. There is also no space for 29298 tests, skip the entry
  3. The smallest bucket (all are still empty) have space for 2019 tests, therefore the folder tests/components/mqtt will be added. The folder and all children will be marked as added to bucket.
  4. The smallest bucket (an empty one) have space for 1958 tests, therefore the folder tests/util will be added. The folder and all children will be marked as added to bucket.
  5. The smallest bucket (an empty one) have space for 1816 tests, therefore the folder tests/helpers will be added. The folder and all children will be marked as added to bucket.
  6. The tests of tests/util/test_unit_conversion.py were already added in step 4. Therefore we skip this one.
    And so one....
    If the current loop entry is an TestFile instance, we will add it regardless of the bucket size to the smallest one.

We need the added_to_bucket check as the list contains the TestFolder to group tests but also each file

.vscode/launch.json Outdated Show resolved Hide resolved
Copy link
Member

@balloob balloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@edenhaus edenhaus marked this pull request as ready for review April 3, 2024 13:42
@edenhaus edenhaus requested a review from a team as a code owner April 3, 2024 13:42
@edenhaus edenhaus merged commit ed88c2a into dev Apr 3, 2024
33 of 34 checks passed
@edenhaus edenhaus deleted the edenhaus-split-tests branch April 3, 2024 13:43
@edenhaus edenhaus mentioned this pull request Apr 3, 2024
20 tasks
@github-actions github-actions bot locked and limited conversation to collaborators Apr 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed small-pr PRs with less than 30 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants