Skip to content

ci: fix macOS release integration-test 20min timeout#1575

Merged
danielmeppiel merged 4 commits into
mainfrom
fix-macos-release-integration-timeout
Jun 1, 2026
Merged

ci: fix macOS release integration-test 20min timeout#1575
danielmeppiel merged 4 commits into
mainfrom
fix-macos-release-integration-timeout

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Problem

The v0.16.1 release pipeline (run 26764269738) failed on Build & Validate (macOS x86_64). The Run integration tests step hit its timeout-minutes: 20 ceiling at ~61% progress. The tests were passing -- the slower macos-15-intel runner simply could not finish the full suite serially within 20 minutes. The arm64 runner passed but was also near the edge (~22min job). Because the macOS build failed, all downstream publish jobs (GitHub Release, PyPI, Homebrew, Scoop) were skipped -- so v0.16.1 never actually shipped.

Root cause

Unlike ci-integration.yml (which shards the integration suite across 4 Linux runners with -n 2 --dist loadgroup), the release workflow runs the entire integration suite on a single scarce macOS runner, serially (no PYTEST_EXTRA_ARGS). Serial wall-time on the Intel runner extrapolates to ~33min -- well over the 20min step budget.

Fix

For both consolidated macOS integration steps (Intel + arm64):

  • Add PYTEST_EXTRA_ARGS: "-n 2 --dist loadgroup" to parallelise in-process.
    • -n 2 matches ci-integration.yml's proven per-shard width, bounding the shared-PAT API concurrency these E2E tests generate.
    • --dist loadgroup is required to honor the pytest.mark.xdist_group(name="home_env") markers, which keep HOME-mutating E2E tests serialized on a single worker (race-safe).
  • Raise timeout-minutes from 20 to 30 for headroom on the slow Intel runner.

-n 2 gives ~2x speedup (~33min -> ~17-20min), comfortably under the new 30min ceiling. The Linux/Windows integration job is left untouched -- it passed on the failed run.

Validation

  • YAML parses cleanly (yaml.safe_load).
  • Scope limited to the two macOS Run integration tests steps; release-validation steps and the Linux/Windows job are unchanged.
  • Per cicd.instructions.md: this preserves the consolidated-macOS-job architecture (scarce runners, no extra sharding).

Recovery plan after merge

The v0.16.1 tag points at a commit without this fix, so gh run rerun would not include it. After this merges, the v0.16.1 tag will be re-created on the new main HEAD and pushed to re-trigger the full release pipeline (nothing shipped yet, so reusing the version is safe).

danielmeppiel and others added 4 commits June 1, 2026 16:13
Two unit tests asserted st_mode & 0o111 == 0o111, which fails on
Windows because NTFS does not honor POSIX execute bits. Guard both
with pytest.mark.skipif(sys.platform == 'win32'), matching the
existing convention used elsewhere in the suite.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
scripts/test-integration.sh runs under `set -euo pipefail`. On macOS
the default /bin/bash is 3.2, where expanding an empty array with a
bare "${arr[@]}" raises an unbound-variable error. Local integration
runs (PYTEST_EXTRA_ARGS unset) aborted before pytest with
'extra_args[@]: unbound variable'. Use the ${arr[@]+"${arr[@]}"}
guard so the empty-array expansion is safe; CI behaviour (with
PYTEST_EXTRA_ARGS set) is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Roll the [Unreleased] changelog into a dated 0.16.1 block and bump
pyproject.toml + uv.lock. Adds the previously-missing user-facing
entries for #1539 (apm doctor), #1566, #1569, #1567, #1553, #1552,
and #1538 surfaced by enumerating merged PRs since v0.16.0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The v0.16.1 release pipeline failed on the macOS x86_64 (Intel) build:
the consolidated job's "Run integration tests" step hit its 20-minute
timeout at ~61% progress. The tests were passing -- the slower Intel
runner simply could not finish the full suite serially in time, and the
arm64 runner was also near the edge.

Unlike ci-integration.yml, which shards the suite across four runners,
the release workflow runs the whole integration suite on a single
scarce macOS runner. Parallelise it in-process with xdist (-n 2,
matching ci-integration's per-shard width to bound shared-PAT API
concurrency) using --dist loadgroup so the home_env xdist_group markers
keep HOME-mutating tests serialized on one worker. Also raise the step
timeout to 30 minutes for headroom on the slow Intel runner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 16:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the release pipeline to avoid macOS integration tests timing out during tagged/scheduled/dispatch release runs, by enabling limited pytest-xdist parallelism and increasing the step timeout. It also rolls forward the project version/changelog for 0.16.1 and adjusts unit tests that assert POSIX executable bits to be skipped on Windows.

Changes:

  • Release workflow (macOS Intel + ARM) now sets PYTEST_EXTRA_ARGS="-n 2 --dist loadgroup" and increases the integration-test step timeout to 30 minutes.
  • Integration test runner script now uses a bash-safe empty-array expansion guard when passing through PYTEST_EXTRA_ARGS.
  • Version/changelog updates for 0.16.1, plus Windows skips for unit tests that rely on POSIX execute-bit preservation.
Show a summary per file
File Description
uv.lock Bumps apm-cli version to 0.16.1.
pyproject.toml Bumps project version to 0.16.1.
CHANGELOG.md Creates 0.16.1 section and moves entries out of Unreleased.
.github/workflows/build-release.yml Adds xdist args + increases macOS integration-test timeout to avoid release failures.
scripts/test-integration.sh Makes PYTEST_EXTRA_ARGS pass-through safe under bash 3.2 + set -u.
tests/unit/test_file_ops.py Skips execute-bit preservation assertion on Windows.
tests/unit/test_download_strategies.py Skips execute-bit preservation assertion on Windows.

Copilot's findings

  • Files reviewed: 6/7 changed files
  • Comments generated: 2

Comment on lines +208 to +217
# macOS runners are scarce, so this consolidated job runs the
# whole integration suite on one runner instead of sharding it
# across four like ci-integration.yml. Run it serially and the
# slower Intel runner overruns the step timeout, so parallelise
# in-process with xdist. --dist loadgroup is required: it is the
# only scheduler that honors pytest.mark.xdist_group, which keeps
# the HOME-mutating tests serialized on a single worker. Kept at
# -n 2 (matching ci-integration's per-shard width) to bound the
# shared-PAT API concurrency these E2E tests generate.
PYTEST_EXTRA_ARGS: "-n 2 --dist loadgroup"
Comment on lines +361 to +363
# The ${arr[@]+"${arr[@]}"} guard keeps an empty array expansion safe
# under `set -u` on bash 3.2 (the default /bin/bash on macOS), where a
# bare "${arr[@]}" on an empty array raises an unbound-variable error.
@danielmeppiel danielmeppiel merged commit 0825153 into main Jun 1, 2026
22 checks passed
@danielmeppiel danielmeppiel deleted the fix-macos-release-integration-timeout branch June 1, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants