Skip to content

perf(ci): shard + parallelize integration tests for ~5x speedup#1263

Merged
danielmeppiel merged 3 commits into
mainfrom
perf/integration-tests-fast
May 11, 2026
Merged

perf(ci): shard + parallelize integration tests for ~5x speedup#1263
danielmeppiel merged 3 commits into
mainfrom
perf/integration-tests-fast

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

The merge-queue Integration Tests (Linux) step takes ~30 minutes single-process. This PR cuts it to ~5-7 min by combining three industry-standard levers: shard 4-way with pytest-split, parallelize within each shard with pytest-xdist --dist worksteal, and cache ~/.cache/apm/ across runs. Race-safe: the four files that mutate os.environ['HOME'] are pinned to a single worker via xdist_group. Gate-required check name preserved via a fan-in job.

Why

The integration suite is the slowest gate on the merge queue. After PR #1166 widened discovery from 28 enumerated files to the full tests/integration/ (~700 collected, ~171 active per the e2e-mode filter), wall-clock kept growing. Each test is dominated by subprocess.run waits invoking the apm binary, which is the textbook case for I/O parallelism.

What changed

Lever Before After Expected speedup
Shards 1 runner 4 runners (matrix) ~4x
In-shard parallelism sequential -n 2 --dist worksteal ~1.5-2x
APM cache cold every run ~/.cache/apm cached weekly reduces network/rate-limit jitter
Net ~30 min ~5-7 min ~5x

Race-safety audit

tempfile.mkdtemp() calls are xdist-safe (unique dirs per call). Module-scoped fixtures are per-worker (xdist re-imports). The only global state is os.environ['HOME'] — mutated by 4 files, all now marked:

pytestmark = [
    ...,
    pytest.mark.xdist_group(name="home_env"),
]

Files: test_auto_install_e2e.py, test_golden_scenario_e2e.py, test_runtime_smoke.py, test_mcp_env_var_copilot_e2e.py. xdist_group pins them all to the same worker within a shard, so they run serially while the rest parallelize.

Gate compatibility

merge-gate.yml requires a check named exactly Integration Tests (Linux). The matrix shards run as Integration Tests Shard N (Linux); a fan-in job named Integration Tests (Linux) aggregates the 4 results. No merge-gate.yml edits required.

APM cache key

apm-cache-shard{N}-{ISO-week}-{hash(uv.lock)} with {shard}-{week}- and {shard}- restore-key fallbacks. Weekly bucket prevents the cache from becoming load-bearing for correctness; uv.lock hash invalidates on dependency moves.

How to verify

# Local sanity (no PAT needed for collect)
uv run pytest tests/integration/ --collect-only -q --splits 4 --group 1
# 171/700 tests collected -> shards balanced

CI will produce 4 Integration Tests Shard N (Linux) runs and one fan-in Integration Tests (Linux). The fan-in is the gate-required check.

Deferred (intentionally not in this PR)

  • pytest-recording / vcrpy cassettes — record-and-replay HTTP for the GitHub-PAT-bound tests. High effort, separate PR.
  • Test-impact analysis — run only tests touching changed code paths. Requires per-test coverage map.
  • In-process apm invocation — drop the subprocess fork. Requires CLI restructure.

Validation evidence

  • uv run --extra dev ruff check src/ tests/ -- silent
  • uv run --extra dev ruff format --check src/ tests/ -- silent
  • uv run pytest tests/integration/ --collect-only -q --splits 4 --group {1,2,3,4} -- 4 balanced shards (171/171/171/187)
  • uv run pytest tests/integration/test_apm_dependencies.py -n 2 --dist worksteal -- xdist healthy

Cuts the Integration Tests (Linux) merge-queue step from ~30 min single
process to ~5-7 min by combining three industry-standard levers:

1. Shard 4-way with pytest-split (matrix of 4 runners, ~171-187 tests
   each). Deterministic partitioning means a given test always lands on
   the same shard run-to-run, which keeps reruns and triage predictable.
2. xdist -n 2 --dist worksteal inside each shard. Most integration
   tests are subprocess-bound (apm CLI invocations), so a small worker
   count + work-stealing reaps the wait time without overloading the
   runner.
3. Cache ~/.cache/apm across runs (weekly bucket + uv.lock hash). APM
   re-resolves the same handful of upstream packages on every run; a
   warm cache short-circuits the network leg and reduces PAT
   rate-limit risk.

Race safety: 4 integration files mutate os.environ['HOME'] globally
(test_auto_install_e2e, test_golden_scenario_e2e, test_runtime_smoke,
test_mcp_env_var_copilot_e2e). Each is now marked
xdist_group(name='home_env') so xdist serializes them onto a single
worker within the shard while the rest still parallelize.

Gate compatibility: the gate-required check name 'Integration Tests
(Linux)' is preserved via a fan-in job that needs the 4 shard jobs.
No merge-gate.yml change required.

Deferred (separate PRs):
- pytest-recording / vcrpy cassettes (record-and-replay HTTP).
- Test-impact analysis (only run tests touching changed code).
- In-process apm invocation to drop the subprocess fork cost.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 11, 2026 01:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets merge-queue latency by restructuring the Tier-2 Linux integration workflow to run the integration suite faster via sharding, per-shard parallelism, and caching, while preserving the required check name Integration Tests (Linux) through a fan-in job.

Changes:

  • Add pytest-split to the dev dependency set (and lockfile) to enable deterministic 4-way sharding.
  • Mark HOME-mutating integration modules with pytest.mark.xdist_group(...) to attempt to serialize them under xdist.
  • Update .github/workflows/ci-integration.yml to run 4 shard jobs with xdist parallelism, add an APM cache step, and add a fan-in job that preserves the required check name.
Show a summary per file
File Description
uv.lock Adds the locked pytest-split dependency for sharding support.
pyproject.toml Adds pytest-split to the dev extras.
tests/integration/test_runtime_smoke.py Adds xdist_group marker alongside existing E2E gating.
tests/integration/test_mcp_env_var_copilot_e2e.py Adds xdist_group marker for HOME-mutating module.
tests/integration/test_golden_scenario_e2e.py Adds xdist_group marker for HOME-mutating module.
tests/integration/test_auto_install_e2e.py Adds xdist_group marker for HOME-mutating module.
.github/workflows/ci-integration.yml Shards integration tests, parallelizes with xdist, adds cache, and fans in to preserve the required check name.

Copilot's findings

  • Files reviewed: 6/7 changed files
  • Comments generated: 3

Comment thread .github/workflows/ci-integration.yml Outdated
Comment thread .github/workflows/ci-integration.yml Outdated
Comment thread .github/workflows/ci-integration.yml
Daniel Meppiel and others added 2 commits May 11, 2026 04:04
Three real issues caught in PR review:

1. Cache restore was effectively discarded. The previous run-step
   restored ~/.cache/apm via actions/cache, then immediately
   'rm -rf'd it before symlinking it to a workspace-relative XDG
   path. Every shard started cold even on a cache hit. Fix: drop
   the symlink dance entirely. APM defaults to ~/.cache/apm on
   Linux when XDG_CACHE_HOME is unset (src/apm_cli/cache/paths.py),
   so actions/cache restores straight into the path the binary
   reads from.

2. xdist_group marker was silently ignored. With --dist worksteal
   pytest-xdist does NOT honor pytest.mark.xdist_group; only
   --dist loadgroup does. The 4 HOME-mutating files would have
   raced across workers despite the marker. Fix: switch to
   --dist loadgroup, which honors the group marker and otherwise
   distributes by file. Update the marker comment in each of the
   4 test files to call out the scheduler dependency.

3. Runtime setup was skipped. The previous run-step invoked pytest
   directly, bypassing scripts/test-integration.sh. The script does
   'apm runtime setup copilot/codex/llm' before pytest; without it
   the conftest auto-skip would silently green-list every
   requires_runtime_* test (false-green). Fix: route the run
   through scripts/test-integration.sh and parameterize the pytest
   invocation via PYTEST_EXTRA_ARGS so the script still owns
   runtime + token setup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit b109bb3 into main May 11, 2026
9 checks passed
@danielmeppiel danielmeppiel deleted the perf/integration-tests-fast branch May 11, 2026 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants