feat(ci): two-tier test split with service classification#115140
Closed
mchen-sentry wants to merge 7 commits into
Closed
feat(ci): two-tier test split with service classification#115140mchen-sentry wants to merge 7 commits into
mchen-sentry wants to merge 7 commits into
Conversation
- service_classifier.py: hybrid static + runtime classification plugin that maps each test to its service dependencies (Snuba, Kafka, etc.) - classify-services.yml: workflow to generate classification across 22 shards - split-tests-by-tier.py: splits classification into tier1 (postgres-only) and tier2 (full Snuba stack) test lists - backend.yml: add split-tiers + backend-light jobs, wire backend-test to use tier2 list when classification is available - Selective testing (PRs) and tiers (master) are mutually exclusive
Hybrid distribution mode based on experiment data: --dist=load cuts tier 2 shard-time variance by 54% (179s -> 82s spread) by load-balancing individual tests across workers, but hurts tier 1 (where small fast tests benefit from fixture reuse via loadfile). Apply load only when tiers are active. Backend-test without tiers (selective PRs, master without classification) keeps --dist=loadfile for backwards compatibility.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backend CI currently runs all tests through the full Snuba stack regardless of whether a test actually uses Snuba. This wastes ~50s/shard on Snuba startup for tests that only need Postgres.
This adds a two-tier split: tier 1 (postgres-only, 5 shards,
backend-light) and tier 2 (full Snuba stack, 17 shards,backend-test), totaling 22 shards - same as today. The split is driven by aclassify-servicesworkflow that runs once per branch and maps each test file to its service dependencies.service_classifier.py: hybrid static + runtime plugin that maps tests to their service deps via marker inspection and Snuba/Redis import detectionclassify-services.yml: workflow that classifies all tests across 22 shards, uploads results as an artifactsplit-tests-by-tier.py: splits the classification JSON into tier1/tier2 file lists for shard distributionbackend-lightjob: 5 shards with only Postgres + Redis Cluster + Kafka (no Snuba), uses--dist=loadfilebackend-testreduced from 22 to 17 shards when tiers are active, uses--dist=loadfor tier2 to reduce variancebackend-testruns the full suite as beforeMeasured -19% wall clock and -19% runner-minutes vs master baseline on the working branch.
Depends on: #114104 (parallel devservices), #114107 (collection pruning), #114108 (pg unix socket), #114215 (snuba pool cap) - but can land independently; those are additive optimizations.