Skip to content

ops: self-hosted runner Docker intermittent — workspace-test checkout fails #725

@noahgift

Description

@noahgift

Problem

The workspace-test CI job intermittently fails at actions/checkout with
docker: command not found on the self-hosted Intel runner. This blocks PR merges.

Five-Whys (from ci-infrastructure.md spec)

  1. Checkout fails → Docker not available
  2. Docker not available → daemon down or job routed to runner without Docker
  3. Routed to wrong runner → possibly multiple self-hosted runners with different configs
  4. Different configs → no runner health check or Docker prerequisite enforcement
  5. No prerequisite → runner setup is manual, not infrastructure-as-code

Evidence

  • workspace-test PASSED on fix/stale-numbers (2026-04-10) but FAILED on
    fix/spec-infra (same day, same runner pool)
  • Transient — retrying the same commit sometimes passes

Fix

  1. Verify Docker is installed and running on all self-hosted runners
  2. Ensure localhost:5000/sovereign-ci:stable image is available
  3. Add runner health check (Docker presence) before job dispatch
  4. Consider runner labels to route container jobs only to Docker-capable runners

Contract

contracts/ci-infra-v1.yaml — infrastructure reliability
docs/specifications/components/ci-infrastructure.md — five-whys analysis

Priority

P0 — blocks ALL PR merges when workspace-test is a required status check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions