Skip to content

feat(ci): two-tier test split with service classification#115140

Closed
mchen-sentry wants to merge 7 commits into
masterfrom
mchen/ci-two-tier-split
Closed

feat(ci): two-tier test split with service classification#115140
mchen-sentry wants to merge 7 commits into
masterfrom
mchen/ci-two-tier-split

Conversation

@mchen-sentry
Copy link
Copy Markdown
Member

Backend CI currently runs all tests through the full Snuba stack regardless of whether a test actually uses Snuba. This wastes ~50s/shard on Snuba startup for tests that only need Postgres.

This adds a two-tier split: tier 1 (postgres-only, 5 shards, backend-light) and tier 2 (full Snuba stack, 17 shards, backend-test), totaling 22 shards - same as today. The split is driven by a classify-services workflow that runs once per branch and maps each test file to its service dependencies.

  • service_classifier.py: hybrid static + runtime plugin that maps tests to their service deps via marker inspection and Snuba/Redis import detection
  • classify-services.yml: workflow that classifies all tests across 22 shards, uploads results as an artifact
  • split-tests-by-tier.py: splits the classification JSON into tier1/tier2 file lists for shard distribution
  • backend-light job: 5 shards with only Postgres + Redis Cluster + Kafka (no Snuba), uses --dist=loadfile
  • backend-test reduced from 22 to 17 shards when tiers are active, uses --dist=load for tier2 to reduce variance
  • When no classification is available (e.g. first run, selective PR testing), backend-test runs the full suite as before

Measured -19% wall clock and -19% runner-minutes vs master baseline on the working branch.

Depends on: #114104 (parallel devservices), #114107 (collection pruning), #114108 (pg unix socket), #114215 (snuba pool cap) - but can land independently; those are additive optimizations.

- service_classifier.py: hybrid static + runtime classification plugin
  that maps each test to its service dependencies (Snuba, Kafka, etc.)
- classify-services.yml: workflow to generate classification across 22 shards
- split-tests-by-tier.py: splits classification into tier1 (postgres-only)
  and tier2 (full Snuba stack) test lists
- backend.yml: add split-tiers + backend-light jobs, wire backend-test
  to use tier2 list when classification is available
- Selective testing (PRs) and tiers (master) are mutually exclusive
Hybrid distribution mode based on experiment data: --dist=load cuts tier 2
shard-time variance by 54% (179s -> 82s spread) by load-balancing individual
tests across workers, but hurts tier 1 (where small fast tests benefit from
fixture reuse via loadfile). Apply load only when tiers are active.

Backend-test without tiers (selective PRs, master without classification)
keeps --dist=loadfile for backwards compatibility.
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 7, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators May 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant