Skip to content

Evergreen workflow startup_failure: apm lacks access to the 'ubuntu-slim' runner group #57

@mrjf

Description

@mrjf

Symptom

The Evergreen — PR Health Keeper workflow fails immediately with startup_failure — no jobs run, no log is produced.

Root cause

All support jobs in .github/workflows/evergreen.lock.yml (and autoloop.lock.yml) target runs-on: ubuntu-slim:

$ grep -n "runs-on:" .github/workflows/evergreen.lock.yml
82:    runs-on: ubuntu-slim       ← activation
337:   runs-on: ubuntu-latest     ← agent
1011:  runs-on: ubuntu-slim       ← conclusion
1162:  runs-on: ubuntu-latest     ← detection
1364:  runs-on: ubuntu-slim       ← push_repo_memory
1441:  runs-on: ubuntu-slim       ← safe_outputs

ubuntu-slim is not a stock GitHub-hosted runner label. It is an org-level custom runner (a runner group or larger-runner) managed under the githubnext org. When GitHub Actions can't match a runs-on: label to a runner the calling repo is allowed to use, the workflow fails at dispatch with startup_failure and produces no log (which is why the run page just says "This run likely failed because of a workflow file issue").

Verification: the same lock-file output runs successfully in githubnext/tsb — meaning tsb is in the runner group's allowed-repositories list. apm isn't. (See the parallel issue githubnext/tsikit-learn#21, which hit the identical problem.)

Fix — pick one

Option A (simplest): grant apm access to the ubuntu-slim runner group.

Requires githubnext org admin. In Settings → Actions → Runner groups, locate the group exposing ubuntu-slim and add apm to its repository access list. No code change needed; matches what tsb has.

Option B: override the runner label in the workflow source.

In .github/workflows/evergreen.md (and .github/workflows/autoloop.md) frontmatter, add:

runs-on: ubuntu-latest

Then recompile with gh aw compile. Verify with:

grep "runs-on:" .github/workflows/*.lock.yml

All occurrences should be ubuntu-latest (or the chosen label) — no more ubuntu-slim.

Confirmation

The run page should show a banner like:

No hosted runner is currently matching the labels: ubuntu-slim

That confirms the diagnosis. Once the access is granted (Option A) or the label is overridden (Option B), trigger the workflow manually and confirm jobs schedule and run.

Notes

  • This is not related to gh-aw version. apm is on v0.74.2; tsikit-learn was on v0.74.4; both emit ubuntu-slim because that's the gh-aw template default.
  • The lock file also contains keys actionlint flags (concurrency.queue: max, permissions.copilot-requests, permissions.vulnerability-alerts). Those are tolerated by GitHub Actions and are not the cause of startup_failure — worth filing upstream against github/gh-aw separately for cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions