feat(ci): two-tier test split with service classification by mchen-sentry · Pull Request #115140 · getsentry/sentry

mchen-sentry · 2026-05-07T21:11:08Z

Backend CI currently runs all tests through the full Snuba stack regardless of whether a test actually uses Snuba. This wastes ~50s/shard on Snuba startup for tests that only need Postgres.

This adds a two-tier split: tier 1 (postgres-only, 5 shards, backend-light) and tier 2 (full Snuba stack, 17 shards, backend-test), totaling 22 shards - same as today. The split is driven by a classify-services workflow that runs once per branch and maps each test file to its service dependencies.

service_classifier.py: hybrid static + runtime plugin that maps tests to their service deps via marker inspection and Snuba/Redis import detection
classify-services.yml: workflow that classifies all tests across 22 shards, uploads results as an artifact
split-tests-by-tier.py: splits the classification JSON into tier1/tier2 file lists for shard distribution
backend-light job: 5 shards with only Postgres + Redis Cluster + Kafka (no Snuba), uses --dist=loadfile
backend-test reduced from 22 to 17 shards when tiers are active, uses --dist=load for tier2 to reduce variance
When no classification is available (e.g. first run, selective PR testing), backend-test runs the full suite as before

Measured -19% wall clock and -19% runner-minutes vs master baseline on the working branch.

Depends on: #114104 (parallel devservices), #114107 (collection pruning), #114108 (pg unix socket), #114215 (snuba pool cap) - but can land independently; those are additive optimizations.

- service_classifier.py: hybrid static + runtime classification plugin that maps each test to its service dependencies (Snuba, Kafka, etc.) - classify-services.yml: workflow to generate classification across 22 shards - split-tests-by-tier.py: splits classification into tier1 (postgres-only) and tier2 (full Snuba stack) test lists - backend.yml: add split-tiers + backend-light jobs, wire backend-test to use tier2 list when classification is available - Selective testing (PRs) and tiers (master) are mutually exclusive

Hybrid distribution mode based on experiment data: --dist=load cuts tier 2 shard-time variance by 54% (179s -> 82s spread) by load-balancing individual tests across workers, but hurts tier 1 (where small fast tests benefit from fixture reuse via loadfile). Apply load only when tiers are active. Backend-test without tiers (selective PRs, master without classification) keeps --dist=loadfile for backwards compatibility.

mchen-sentry added 7 commits May 7, 2026 14:10

fix(ci): correct mypy type ignore codes in service_classifier

bd84f3e

fix(ci): broaden mypy ignores for socket monkey-patching

de1adf9

fix(ci): add redis-cluster/kafka service containers to backend-light

e4d8887

fix(ci): reduce backend-test to 17 shards when tiers active (5+17=22)

af90d34

fix(ci): filter classify runs by conclusion via jq, not --status flag

d40971b

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 7, 2026

mchen-sentry closed this May 7, 2026

github-actions Bot locked and limited conversation to collaborators May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ci): two-tier test split with service classification#115140

feat(ci): two-tier test split with service classification#115140
mchen-sentry wants to merge 7 commits into
masterfrom
mchen/ci-two-tier-split

mchen-sentry commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mchen-sentry commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant