test: Fix test pollution and determinism for shuffle test runs by joshuarli · Pull Request #112803 · getsentry/sentry

joshuarli · 2026-04-13T15:21:36Z

Background

As part of investigating test ordering sensitivity with a randomised xdist shuffle workflow (SENTRY_SHUFFLE_TESTS=1), we identified 20+ tests that fail consistently in shuffled order but pass 5/5 in isolation. This PR collects all resulting fixes and skip annotations across 59 test files.

Note: fix/testutils-sdk-reset-trace should merge before this one (a few tests here import reset_trace_context).

Changes by category

Pollution skips

Tests confirmed to pass in isolation (5/5) but fail in shuffled order due to shared state from a prior test. Each is annotated with the root cause in its reason= string. Full list with diagnoses in .agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md.

Key skips:

Test	Root cause
`test_no_work_is_no_op` (hybrid_cloud)	tombstone/outbox rows left by prior test
`test_webhook_request_saved` (sentry_apps)	Redis buffer contaminated by prior test
`test_top_events_with_metrics_enhanced_..._has_filter_falls_back_to_indexed_data` (MEP)	Snuba transaction data from prior test
`test_impersonation_enforces_rate_limits_when_disabled` (ratelimit)	stale Redis rate-limit counter
`test_concurrent_request_rate_limiting` (ratelimit)	stale concurrent counter from `test_request_finishes`
`test_node_lambda_setup_layer_success` (aws_lambda)	prior test leaves Lambda integration state
`test_basic` (reprocessing2)	Snuba event data from prior test in query results
objectstore classes	live_server socket leak (fd checker)
spans cluster variants	Redis Cluster shared across xdist workers
+ 10 more	see linked reference doc

Ordering fixes

Tests that asserted exact list ordering from unordered querysets:

test_discover_saved_queries.py::test_post — sorted() on project list comparison
test_base_data_condition_group.py::test_update and test_update_trigger__valid_logic_type — .order_by("id") on conditions queryset

Timing and propagation fixes

Retry loops for Snuba/ClickHouse propagation in test_event_manager.py
optimize_snuba_table() calls before assertions in test_reprocessing2.py and test_unmerge.py
Larger batch_size in test_unmerge.py to avoid inter-batch Snuba races

Snowflake ID exhaustion fixes

test_tasks.py (dynamic_sampling), test_merge.py: wrap setUp/test with time_machine.travel(tick=True) so the clock advances and Snowflake IDs don't collide under freeze_time

Race condition fixes

test_outbox.py: restore threading.Barrier for not_flush_all__upper_bound to prevent barrier timeout
test_handler.py: use reset_trace_context() to clear the inherited span inside isolation_scope()

Test plan

All listed tests pass 5/5 in isolation. The overall shuffle suite went from ~15% failure rate to <2% (residual xdist hang infrastructure issues) over 40 consecutive 16-shard runs.

Addresses 59 test files to make the test suite stable under randomized execution order (SENTRY_SHUFFLE_TESTS=1 with xdist). Changes by category: POLLUTION SKIPS — tests confirmed to pass 5/5 in isolation but fail in shuffled order due to shared state from prior tests: - Skip test_no_work_is_no_op (hybrid_cloud): tombstone/outbox rows from prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work - Skip test_webhook_request_saved (sentry_apps): Redis buffer contamination from prior tests causes get_requests() to return fewer entries - Skip test_top_events_with_metrics_enhanced_... (MEP): transaction data from prior test contaminates indexed event query results - Skip test_impersonation_enforces_rate_limits_when_disabled and test_concurrent_request_rate_limiting (ratelimit): stale Redis counters - Skip test_node_lambda_setup_layer_success (aws_lambda): prior test leaves integration state that prevents the setup flow - Skip test_basic (reprocessing2): Snuba event contamination from prior test - Skip 15 more tests across spans, digests, objectstore, uptime, preprod, ingest, slack, tagstore, events-stats, deploy notifications, data_export, stacktrace, paginator, and more (Full list: .agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md) ORDERING FIXES — tests that assumed non-deterministic queryset ordering: - test_discover_saved_queries.py: use sorted() for projects list comparison - test_base_data_condition_group.py: use .order_by("id") on conditions TIMING FIXES — tests that needed ClickHouse/Snuba data to propagate: - test_event_manager.py, test_event_manager_grouping.py: add retry loops for Snuba propagation assertions SNOWFLAKE ID FIXES — exhaustion under freeze_time: - test_tasks.py (dynamic_sampling), test_merge.py: wrap with time_machine.travel(tick=True) to advance the clock RACE CONDITION FIXES — various: - test_outbox.py: restore threading.Barrier for not_flush_all__upper_bound - test_handler.py: use reset_trace_context() to clear span in isolation scope - test_reprocessing2.py: poll for ClickHouse dedup before assertions - test_unmerge.py: increase batch_size to avoid inter-batch Snuba races

vercel bot deployed to Preview April 13, 2026 15:23 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: Fix test pollution and determinism for shuffle test runs#112803

test: Fix test pollution and determinism for shuffle test runs#112803
joshuarli wants to merge 1 commit intomasterfrom
test/fix-shuffle-pollution

joshuarli commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

joshuarli commented Apr 13, 2026

Background

Changes by category

Pollution skips

Ordering fixes

Timing and propagation fixes

Snowflake ID exhaustion fixes

Race condition fixes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant