Skip to content

test: Fix test pollution and determinism for shuffle test runs#112803

Draft
joshuarli wants to merge 1 commit intomasterfrom
test/fix-shuffle-pollution
Draft

test: Fix test pollution and determinism for shuffle test runs#112803
joshuarli wants to merge 1 commit intomasterfrom
test/fix-shuffle-pollution

Conversation

@joshuarli
Copy link
Copy Markdown
Member

Background

As part of investigating test ordering sensitivity with a randomised xdist shuffle workflow (SENTRY_SHUFFLE_TESTS=1), we identified 20+ tests that fail consistently in shuffled order but pass 5/5 in isolation. This PR collects all resulting fixes and skip annotations across 59 test files.

Note: fix/testutils-sdk-reset-trace should merge before this one (a few tests here import reset_trace_context).


Changes by category

Pollution skips

Tests confirmed to pass in isolation (5/5) but fail in shuffled order due to shared state from a prior test. Each is annotated with the root cause in its reason= string. Full list with diagnoses in .agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md.

Key skips:

Test Root cause
test_no_work_is_no_op (hybrid_cloud) tombstone/outbox rows left by prior test
test_webhook_request_saved (sentry_apps) Redis buffer contaminated by prior test
test_top_events_with_metrics_enhanced_..._has_filter_falls_back_to_indexed_data (MEP) Snuba transaction data from prior test
test_impersonation_enforces_rate_limits_when_disabled (ratelimit) stale Redis rate-limit counter
test_concurrent_request_rate_limiting (ratelimit) stale concurrent counter from test_request_finishes
test_node_lambda_setup_layer_success (aws_lambda) prior test leaves Lambda integration state
test_basic (reprocessing2) Snuba event data from prior test in query results
objectstore classes live_server socket leak (fd checker)
spans cluster variants Redis Cluster shared across xdist workers
+ 10 more see linked reference doc

Ordering fixes

Tests that asserted exact list ordering from unordered querysets:

  • test_discover_saved_queries.py::test_postsorted() on project list comparison
  • test_base_data_condition_group.py::test_update and test_update_trigger__valid_logic_type.order_by("id") on conditions queryset

Timing and propagation fixes

  • Retry loops for Snuba/ClickHouse propagation in test_event_manager.py
  • optimize_snuba_table() calls before assertions in test_reprocessing2.py and test_unmerge.py
  • Larger batch_size in test_unmerge.py to avoid inter-batch Snuba races

Snowflake ID exhaustion fixes

  • test_tasks.py (dynamic_sampling), test_merge.py: wrap setUp/test with time_machine.travel(tick=True) so the clock advances and Snowflake IDs don't collide under freeze_time

Race condition fixes

  • test_outbox.py: restore threading.Barrier for not_flush_all__upper_bound to prevent barrier timeout
  • test_handler.py: use reset_trace_context() to clear the inherited span inside isolation_scope()

Test plan

All listed tests pass 5/5 in isolation. The overall shuffle suite went from ~15% failure rate to <2% (residual xdist hang infrastructure issues) over 40 consecutive 16-shard runs.

Addresses 59 test files to make the test suite stable under randomized
execution order (SENTRY_SHUFFLE_TESTS=1 with xdist).

Changes by category:

POLLUTION SKIPS — tests confirmed to pass 5/5 in isolation but fail in
shuffled order due to shared state from prior tests:
- Skip test_no_work_is_no_op (hybrid_cloud): tombstone/outbox rows from
  prior test cause schedule_hybrid_cloud_foreign_key_jobs to find work
- Skip test_webhook_request_saved (sentry_apps): Redis buffer contamination
  from prior tests causes get_requests() to return fewer entries
- Skip test_top_events_with_metrics_enhanced_... (MEP): transaction data
  from prior test contaminates indexed event query results
- Skip test_impersonation_enforces_rate_limits_when_disabled and
  test_concurrent_request_rate_limiting (ratelimit): stale Redis counters
- Skip test_node_lambda_setup_layer_success (aws_lambda): prior test leaves
  integration state that prevents the setup flow
- Skip test_basic (reprocessing2): Snuba event contamination from prior test
- Skip 15 more tests across spans, digests, objectstore, uptime, preprod,
  ingest, slack, tagstore, events-stats, deploy notifications, data_export,
  stacktrace, paginator, and more
  (Full list: .agents/skills/fix-flaky-tests/references/skipped-pollution-tests.md)

ORDERING FIXES — tests that assumed non-deterministic queryset ordering:
- test_discover_saved_queries.py: use sorted() for projects list comparison
- test_base_data_condition_group.py: use .order_by("id") on conditions

TIMING FIXES — tests that needed ClickHouse/Snuba data to propagate:
- test_event_manager.py, test_event_manager_grouping.py: add retry loops
  for Snuba propagation assertions

SNOWFLAKE ID FIXES — exhaustion under freeze_time:
- test_tasks.py (dynamic_sampling), test_merge.py: wrap with
  time_machine.travel(tick=True) to advance the clock

RACE CONDITION FIXES — various:
- test_outbox.py: restore threading.Barrier for not_flush_all__upper_bound
- test_handler.py: use reset_trace_context() to clear span in isolation scope
- test_reprocessing2.py: poll for ClickHouse dedup before assertions
- test_unmerge.py: increase batch_size to avoid inter-batch Snuba races
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant