Skip to content

chore: fail-fast CI hangs + refresh README header#75

Merged
Sunrisepeak merged 2 commits into
mainfrom
chore/ci-timeout-and-readme-refresh
May 24, 2026
Merged

chore: fail-fast CI hangs + refresh README header#75
Sunrisepeak merged 2 commits into
mainfrom
chore/ci-timeout-and-readme-refresh

Conversation

@Sunrisepeak
Copy link
Copy Markdown
Member

Summary

Recent main runs (notably the v0.0.28 bump) burned the full 60-min job budget on 10_env_command.sh hanging in xlings install on a fresh MCPP_HOME. The hang is intermittent and lives at the network boundary; no fix to xlings itself here — just shorten the feedback loop so the offending test is identified in 10 min instead of 60.

The PR also brings the README header up to v0.0.28 reality (Windows LLVM is now ✅, CI is green on all three platforms) and elevates the key external links above the fold so newcomers don't have to scroll to find docs / mcpp-index / mcpplibs / forum.

CI changes

  • tests/e2e/run_all.sh — wraps every test with timeout 600 (override via E2E_TEST_TIMEOUT). Distinguishes TIMEOUT (exit 124) from regular FAIL and surfaces the offending test name in the summary. Uses timeout if available, falls back to gtimeout (macOS coreutils), else skips wrapping and relies on the step-level guard.
  • ci.yml / ci-macos.yml / ci-windows.yml — add timeout-minutes: 25 to each E2E suite step. A hung suite now fails in ~25 min instead of eating the full 60-min job, freeing the downstream toolchain test steps to still run / be diagnosed.

Behaviour matrix

Scenario Before After
10_env_command.sh stalls in xlings download 60 min wall-clock → step cancelled with no clue which test Single test killed at 10 min → TIMEOUT: 10_env_command.sh (exceeded 600s — likely network / xlings stall) printed; rest of suite continues; step cap at 25 min
All tests pass unchanged unchanged (just adds Per-test timeout: 600s (via timeout) line at start)
Real test failure unchanged unchanged (FAIL: NN_x.sh (exit N))

Local verification

$ E2E_TEST_TIMEOUT=3 bash run_all.sh
...
=== 99_hang.sh ===
synthetic hang test starting
TIMEOUT: 99_hang.sh (exceeded 3s — likely network / xlings stall)

E2E Summary: 1 passed, 1 failed, 0 skipped
Timed out: 99_hang.sh
Failed: 99_hang.sh (TIMEOUT)
$ echo $?
1

README changes

  • New centered links table directly under the badges (2 rows):
    • row 1: docs / quick start / mcpp.toml guide / examples / toolchains
    • row 2: mcpp-index / mcpplibs / forum / Issues / Releases
  • Add live CI status badges for Linux / macOS / Windows; drop the static Self-hosted badge that the CI badges now subsume.
  • Promote Windows x86_64 Clang/LLVM from 🔄 to ✅ (CI green since v0.0.27); add footnote noting the MSVC BuildTools dependency and the future llvm-mingw direction.

Test plan

  • ci.yml green (Linux self-host, E2E suite under 25 min)
  • ci-macos.yml green
  • ci-windows.yml green (incl. the stdin regression step from fix: seal child-process stdin on Windows (first-run hang) #74)
  • If 10_env_command.sh hangs in CI again, the failure log shows TIMEOUT: 10_env_command.sh instead of Error: The operation was canceled.

Recent main runs (notably the v0.0.28 bump) burned the full 60-min job
budget on `10_env_command.sh` hanging in xlings install on a fresh
MCPP_HOME. The hang is intermittent and lives at the network boundary;
no fix to xlings itself in this PR — just shorten the feedback loop.

CI:
- tests/e2e/run_all.sh wraps every test with `timeout 600` (override via
  E2E_TEST_TIMEOUT). Distinguishes TIMEOUT (exit 124) from regular FAIL
  and surfaces the offending test name in the summary. Uses `timeout` if
  available, falls back to `gtimeout` (macOS coreutils), else skips
  wrapping and relies on the step-level guard.
- ci.yml / ci-macos.yml / ci-windows.yml: add `timeout-minutes: 25` to
  each "E2E suite" step. A hung suite now fails in ~25 min instead of
  eating the full 60-min job, freeing the toolchain test steps to still
  run / be diagnosed.

README:
- Add a prominent links table directly under the badges: docs / quick
  start / mcpp.toml guide / examples / toolchains on row 1; package
  index, mcpplibs, forum, Issues, Releases on row 2.
- Add live CI status badges for Linux/macOS/Windows; drop the static
  "Self-hosted" badge that the CI badges now subsume.
- Promote Windows x86_64 LLVM/Clang from 🔄 to ✅ (CI is green since
  v0.0.27); add footnote noting the MSVC BuildTools dependency and the
  llvm-mingw direction.
- Workflow renamed (and `name:` updated) to match the platform-prefixed
  ci-macos / ci-windows convention. Comment cross-references updated.
- README: drop the 3 CI badges from the badges row and add them as the
  table's last row alongside the docs / community links — keeps the
  badges row focused on project metadata (release / language / license).
@Sunrisepeak Sunrisepeak merged commit a349d71 into main May 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant