test: Fix test pollution and determinism for shuffle test runs#112803
Draft
test: Fix test pollution and determinism for shuffle test runs#112803
Conversation
Addresses 59 test files to make the test suite stable under randomized
execution order (SENTRY_SHUFFLE_TESTS=1 with xdist).
Changes by category:
POLLUTION SKIPS — tests confirmed to pass 5/5 in isolation but fail in
shuffled order due to shared state from prior tests:
- Skip test_no_work_is_no_op (hybrid_cloud): tombstone/outbox rows from
prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work
- Skip test_webhook_request_saved (sentry_apps): Redis buffer contamination
from prior tests causes get_requests() to return fewer entries
- Skip test_top_events_with_metrics_enhanced_... (MEP): transaction data
from prior test contaminates indexed event query results
- Skip test_impersonation_enforces_rate_limits_when_disabled and
test_concurrent_request_rate_limiting (ratelimit): stale Redis counters
- Skip test_node_lambda_setup_layer_success (aws_lambda): prior test leaves
integration state that prevents the setup flow
- Skip test_basic (reprocessing2): Snuba event contamination from prior test
- Skip 15 more tests across spans, digests, objectstore, uptime, preprod,
ingest, slack, tagstore, events-stats, deploy notifications, data_export,
stacktrace, paginator, and more
(Full list: .agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md)
ORDERING FIXES — tests that assumed non-deterministic queryset ordering:
- test_discover_saved_queries.py: use sorted() for projects list comparison
- test_base_data_condition_group.py: use .order_by("id") on conditions
TIMING FIXES — tests that needed ClickHouse/Snuba data to propagate:
- test_event_manager.py, test_event_manager_grouping.py: add retry loops
for Snuba propagation assertions
SNOWFLAKE ID FIXES — exhaustion under freeze_time:
- test_tasks.py (dynamic_sampling), test_merge.py: wrap with
time_machine.travel(tick=True) to advance the clock
RACE CONDITION FIXES — various:
- test_outbox.py: restore threading.Barrier for not_flush_all__upper_bound
- test_handler.py: use reset_trace_context() to clear span in isolation scope
- test_reprocessing2.py: poll for ClickHouse dedup before assertions
- test_unmerge.py: increase batch_size to avoid inter-batch Snuba races
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
As part of investigating test ordering sensitivity with a randomised xdist shuffle workflow (
SENTRY_SHUFFLE_TESTS=1), we identified 20+ tests that fail consistently in shuffled order but pass 5/5 in isolation. This PR collects all resulting fixes and skip annotations across 59 test files.Changes by category
Pollution skips
Tests confirmed to pass in isolation (5/5) but fail in shuffled order due to shared state from a prior test. Each is annotated with the root cause in its
reason=string. Full list with diagnoses in.agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md.Key skips:
test_no_work_is_no_op(hybrid_cloud)test_webhook_request_saved(sentry_apps)test_top_events_with_metrics_enhanced_..._has_filter_falls_back_to_indexed_data(MEP)test_impersonation_enforces_rate_limits_when_disabled(ratelimit)test_concurrent_request_rate_limiting(ratelimit)test_request_finishestest_node_lambda_setup_layer_success(aws_lambda)test_basic(reprocessing2)Ordering fixes
Tests that asserted exact list ordering from unordered querysets:
test_discover_saved_queries.py::test_post—sorted()on project list comparisontest_base_data_condition_group.py::test_updateandtest_update_trigger__valid_logic_type—.order_by("id")on conditions querysetTiming and propagation fixes
test_event_manager.pyoptimize_snuba_table()calls before assertions intest_reprocessing2.pyandtest_unmerge.pybatch_sizeintest_unmerge.pyto avoid inter-batch Snuba racesSnowflake ID exhaustion fixes
test_tasks.py(dynamic_sampling),test_merge.py: wrap setUp/test withtime_machine.travel(tick=True)so the clock advances and Snowflake IDs don't collide underfreeze_timeRace condition fixes
test_outbox.py: restorethreading.Barrierfornot_flush_all__upper_boundto prevent barrier timeouttest_handler.py: usereset_trace_context()to clear the inherited span insideisolation_scope()Test plan
All listed tests pass 5/5 in isolation. The overall shuffle suite went from ~15% failure rate to <2% (residual xdist hang infrastructure issues) over 40 consecutive 16-shard runs.