Skip to content

feat(ci): xdist backend test workflow#110776

Merged
mchen-sentry merged 11 commits intomasterfrom
mingchen/di-1713-xdist-workflow
Mar 18, 2026
Merged

feat(ci): xdist backend test workflow#110776
mchen-sentry merged 11 commits intomasterfrom
mingchen/di-1713-xdist-workflow

Conversation

@mchen-sentry
Copy link
Member

@mchen-sentry mchen-sentry commented Mar 16, 2026

Enable pytest-xdist in the backend CI workflow with per-worker Snuba isolation. Depends on #DI-1712 (xdist infra) being merged first.

xdist is disabled for PRs, only affects merges in sentry.

@mchen-sentry mchen-sentry requested a review from a team as a code owner March 16, 2026 18:30
@linear-code
Copy link

linear-code bot commented Mar 16, 2026

DI-1713 xdist workflow

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 16, 2026
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Missing pipefail lets bootstrap failure go undetected
    • Added pipefail option to set command (changed 'set -e' to 'set -eo pipefail') to ensure bootstrap failures in the pipeline are properly detected.
  • ✅ Fixed: Experimental branch triggers will disable production CI
    • Restored production CI triggers by replacing experimental branch names with 'master' and re-adding pull_request event triggers.

Create PR

Or push these changes by commenting:

@cursor push 4594d2aaa4
Preview (4594d2aaa4)
diff --git a/.github/workflows/backend.yml b/.github/workflows/backend.yml
--- a/.github/workflows/backend.yml
+++ b/.github/workflows/backend.yml
@@ -3,8 +3,9 @@
 on:
   push:
     branches:
-      - mchen/tiered-xdist-v2
-      - mchen/tiered-xdist-v2-clean
+      - master
+  pull_request:
+    types: [opened, synchronize, reopened, labeled]
   workflow_dispatch:
 
 # Cancel in progress workflows on pull_requests.
@@ -292,7 +293,7 @@
       - name: Bootstrap per-worker Snuba instances
         if: env.XDIST_PER_WORKER_SNUBA == '1'
         run: |
-          set -e
+          set -eo pipefail
           XDIST_N=${XDIST_WORKERS:-3}
           SNUBA_IMAGE=$(docker inspect snuba-snuba-1 --format '{{.Config.Image}}')
           SNUBA_NETWORK=$(docker inspect snuba-snuba-1 --format '{{range $k, $v := .NetworkSettings.Networks}}{{$k}}{{end}}')

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Backend Test Failures

Failures on 6b3141a in this run:

tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-nochunk-basic_span]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-chunk1-null_fields]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-nochunk-null_fields]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-nochunk-span_with_accepted_outcome]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-chunk1-basic_span]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-nochunk-span_with_accepted_outcome]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-chunk1-span_with_accepted_outcome]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-nochunk-null_fields]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-chunk1-basic_span]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[single-nochunk-basic_span]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-chunk1-span_with_accepted_outcome]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'
tests/sentry/spans/test_buffer.py::test_schema_examples[cluster-chunk1-null_fields]log
tests/sentry/spans/test_buffer.py:1616: in test_schema_examples
    span = Span(
E   TypeError: Span.__new__() got an unexpected keyword argument 'end_timestamp'

Copy link
Member

@joshuarli joshuarli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a little messy (can't really help it tbh) but lgtm

Enable pytest-xdist on master pushes and workflow_dispatch while keeping
the existing single-process selective testing for pull requests.

Changes:
- Add XDIST_WORKERS, XDIST_PER_WORKER_SNUBA, PYTHONHASHSEED env vars
  gated on github.event_name != 'pull_request'
- Bootstrap per-worker Snuba instances with proper PID tracking and
  health checks for parallel job error detection
- Append xdist flags to PYTEST_ADDOPTS (preserving --reruns from
  setup-sentry) and run via make test-python-ci
- Add 20-minute timeout wrapper to catch xdist hangs
- Skip checkout/setup-sentry/shard-calculation when selective testing
  is not active (static 22-shard fast path)
- Dump per-worker Snuba logs on failure for debugging
- Rewrite Snuba bootstrap to use devservices compose network instead of
  --network host (which silently ignores -p port mappings)
- Inspect running snuba-snuba-1 for image and network instead of pulling latest
- Per-worker ClickHouse databases for full isolation
- Fix docker stop to target snuba-snuba-1 (correct devservices name)
- Change PYTHONHASHSEED from '1' to '0' for canonical deterministic hashing
- Add PostgreSQL lock_timeout=30s under xdist to prevent indefinite lock waits
signal.signal() and signal.setitimer() only work in the main thread.
Under xdist, execnet workers run test code in a non-main thread,
causing ValueError. These tests still run on every PR (single-process).
setdefault() returns object from mypy's perspective; indexed
assignment on object is not supported.
30s was too aggressive under xdist, causing retries that inflated
shard durations.
30s was too aggressive under xdist, causing retries that inflated
shard durations.
Ruff reformatted the line so the ignore ended up on the string
literal instead of the actual indexed assignment, making mypy
report both an error and an unused-ignore.
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

PYTHONHASHSEED="" is parsed by CPython's strtoul as 0, which
disables hash randomization. Use 'random' to explicitly enable
the default random hash seed on PR runs where xdist is disabled.
29 consecutive green runs on the stress-test branch; no longer
need the soft-failure guard.
@mchen-sentry mchen-sentry merged commit cdcfccd into master Mar 18, 2026
80 checks passed
@mchen-sentry mchen-sentry deleted the mingchen/di-1713-xdist-workflow branch March 18, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants