Testing cleanup by Darlokt · Pull Request #401 · scverse/spatialdata-io

Darlokt · 2026-05-16T05:38:34Z

Hej everyone,

This is the second patch series, this time its a bit bigger. It focuses on restructuring and improving the testing infrastructure. It doesn't touch the current test cases, but improves the infrastructure, provides a central place for dataset management (to also allow easier local dev setups), unifies skipping behavior and adds proper pytest markers for the different readers for easy differential tests.

Changes:

Split existing tests into unit and integration tests, with clear guidelines in CONTRIBUTING.md, for better organization and clarity.
- Existing tests were split into unit and integration tests based on their scope.
- Split tests were moved to their new locations under tests/unit/ and tests/integration/.
- Move reader-specific integration tests into subdirectories under tests/integration/readers/ (e.g. tests/integration/readers/xenium/).
- Move reader-specific unit tests into subdirectories under tests/unit/readers/ (e.g. tests/unit/readers/xenium/).
- Update test imports and references to reflect the new directory structure.
- Added per reader pytest markers (e.g. @pytest.mark.xenium) to allow for more flexible test selection.
- Update contributing documentation to clarify the distinction between unit and integration tests, and to specify the use of dataset keys for integration tests that require external datasets.
Unified missing datasets handling
- Tests that require external datasets now check for their presence and skip with a clear message if unavailable.
- Previously only some test suites did so (macsima, visium hd, etc.) but in different ways, with a non unified behavior.
Added unified dataset loading script/module under scripts/test_data_downloader
- Provides central source of listed datasets (datasets.toml)
- Provides general zero-dependency script for local and CI data loading
Updated CI workflows to use the new downloader script, ensuring that the same datasets are used in both CI and local development.
- This change ensures consistency between CI and local testing environments, reducing the likelihood of discrepancies due to missing or different datasets.
- Improved CI caching and execution rules
Bumped CI actions versions to their latest major versions (redundant with previous patch set)
- Updated actions/checkout to v6 and actions/setup-python to v6 in all workflows.
- Updated actions/cache to v5 in the test workflow.

…ling This commit refactors the test infrastructure to unify the handling of optional datasets across CI and local development. Changes: - Split existing tests into unit and integration tests, with clear guidelines in CONTRIBUTING.md, for better organization and clarity. - Existing tests were split into unit and integration tests based on their scope. - Split tests were moved to their new locations under `tests/unit/` and `tests/integration/`. - Unified missing locally missing datasets handling - Tests that require external datasets now check for their presence and skip with a clear message if unavailable. - Previously only some test suites did so (macsima, visium hd, etc.) but in different ways, with a non unified behaviour. - Added a new script `scripts/download_test_data.py` to download all optional datasets used by CI, with a clear CLI and documentation in CONTRIBUTING.md. - This script centralizes the logic for downloading test datasets, making it easier for developers to set up their local environment with the same data used in CI. - Added a new script `scripts/download_test_data_datasets.py` that defines the dataset keys and their metadata, which is used by both the downloader and the tests, to avoid duplication and ensure consistency. - This script serves as a single source of truth for available test datasets, their keys, and metadata, improving maintainability. - Allows for easier addition of new datasets in the future, as they only need to be added in one place. - Allows for easier developer experience, as they can easily download the datasets using the same script used in CI, without needing to manually find and download them from their sources. - Updated CI workflows to use the new downloader script, ensuring that the same datasets are used in both CI and local development. - This change ensures consistency between CI and local testing environments, reducing the likelihood of discrepancies due to missing or different datasets. - Bumped CI actions versions to their latest major versions - Updated `actions/checkout` to v6 and `actions/setup-python` to v6 in all workflows. - Updated `actions/cache` to v5 in the test workflow.

This commit refactors the test structure further into separate subdirectories for each reader under both `tests/integration/readers/` and `tests/unit/readers/`. In preparation for further test/reader improvements/refactoring to split the monolithic readers/tests into separate modules for more modularity, maintainability, testability, and clarity. Changes: - Move reader-specific integration tests into subdirectories under `tests/integration/readers/` (e.g. `tests/integration/readers/xenium/`). - Move reader-specific unit tests into subdirectories under `tests/unit/readers/` (e.g. `tests/unit/readers/xenium/`). - Update test imports and references to reflect the new directory structure. - Added per reader pytest markers (e.g. `@pytest.mark.xenium`) to allow for more flexible test selection. - Update contributing documentation to clarify the distinction between unit and integration tests, and to specify the use of dataset keys for integration tests that require external datasets.

This commit focuses on improvements to the CI workflow and test data management. Changes: - Updated the `prepare_test_data.yaml` - Adjusted the cron schedule to run on the first day of every other month at midnight, ensuring that test data is refreshed regularly while preventing unnecessary runs. - Increased the retention period for test data artifacts to 64 days, providing a longer window for access and reducing the likelihood of data expiration before it can be used. - Added a condition to update the artifact if the data handler script/dataset list has changed, ensuring that the test data is always up-to-date with the latest changes in the codebase. - Added comments and docstrings to the data handler scripts to improve code readability and maintainability. - Added small checker to DatasetList to ensure that all datasets have full integrity before collection. - Improved testing

This commit refactors the test data download scripts and splits the dataset list into a separate `.toml` file. Changes: - Moved the dataset list from `download_test_data.py` to `datasets.toml`. - Updated the download script to read from the new `.toml` file. - Updated the GitHub Actions workflow to reflect the changes in the download script. - Restructured the test data downloader into a subdirectory/script package for better organization.

This commit updates the datasets.toml file to include comments for each dataset from the original CI. Changes: - Added comments for each dataset in the datasets.toml file to preserve the information from the original CI.

codecov-commenter · 2026-05-16T05:40:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.47%. Comparing base (a63ca08) to head (ae792ee).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #401      +/-   ##
==========================================
+ Coverage   63.38%   63.47%   +0.09%     
==========================================
  Files          26       26              
  Lines        3217     3217              
==========================================
+ Hits         2039     2042       +3     
+ Misses       1178     1175       -3

see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Zethson · 2026-05-16T11:20:17Z

Thank you very much! Great work.

I would probably suggest what I suggested in #400 (comment) -> let's try to adhere to the template first and then maybe get this in?

Zethson

Also, concerning downloads: We had great experiences with pooch and will likely slowly move all scverse packages to use it. I think this PR could also tackle this. Feel free to have a look at scirpy for example to learn how to use it well.

Darlokt added 5 commits May 16, 2026 07:14

Preserver the per dataset comments

ae792ee

This commit updates the datasets.toml file to include comments for each dataset from the original CI. Changes: - Added comments for each dataset in the datasets.toml file to preserve the information from the original CI.

Zethson reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing cleanup#401

Testing cleanup#401
Darlokt wants to merge 5 commits into
scverse:mainfrom
Darlokt:testing-cleanup

Darlokt commented May 16, 2026

Uh oh!

codecov-commenter commented May 16, 2026 •

edited

Loading

Uh oh!

Zethson commented May 16, 2026

Uh oh!

Zethson left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Darlokt commented May 16, 2026

Uh oh!

codecov-commenter commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Zethson commented May 16, 2026

Uh oh!

Zethson left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 16, 2026 •

edited

Loading