Skip to content

Evergreen workflow startup_failure: tsikit-learn lacks access to the 'ubuntu-slim' runner group #21

@mrjf

Description

@mrjf

Symptom

The Evergreen — PR Health Keeper workflow fails immediately with startup_failure — no jobs run, no log is produced.

Root cause

All gh-aw-emitted jobs in .github/workflows/evergreen.lock.yml (and autoloop.lock.yml) target runs-on: ubuntu-slim:

$ grep -n "runs-on:" .github/workflows/evergreen.lock.yml
82:    runs-on: ubuntu-slim
338:   runs-on: ubuntu-latest
1013:  runs-on: ubuntu-slim
1165:  runs-on: ubuntu-latest
1368:  runs-on: ubuntu-slim
1446:  runs-on: ubuntu-slim

ubuntu-slim is not a stock GitHub-hosted runner label. It's a custom runner — almost certainly an org-level larger-runner or runner group managed under githubnext. When GitHub Actions can't match the label to a runner the calling repo is allowed to use, the workflow fails at dispatch with startup_failure and produces no log.

Verification: tsb has the byte-identical lock file (same v0.74.4 compiler output, same ubuntu-slim labels) and its runs succeed — meaning tsb is in the runner group's allowed-repositories list. tsikit-learn isn't.

Fix — pick one

Option A (simplest): grant tsikit-learn access to the ubuntu-slim runner group.

Requires org admin in githubnext. In Settings → Actions → Runner groups, locate the group exposing ubuntu-slim, and add tsikit-learn to its repository access list.

Option B: override the runner label in the workflow source so the lock file uses a stock runner.

In .github/workflows/evergreen.md (and .github/workflows/autoloop.md) frontmatter, set:

runs-on: ubuntu-latest

(or whichever stock label is appropriate). Then recompile with gh aw compile. Verify with:

grep "runs-on:" .github/workflows/*.lock.yml

All occurrences should be ubuntu-latest (or your chosen label) — no more ubuntu-slim.

Confirmation in the run UI

The run pages (URLs above) should display a banner like:

No hosted runner is currently matching the labels: ubuntu-slim. The job requires a runner with these labels but no runner is currently online with them.

That confirms the diagnosis.

What this issue is NOT

An earlier draft of this issue (and the actionlint output) flagged queue: max, copilot-requests, and vulnerability-alerts keys in the lock file. Those are real lint warnings (worth reporting upstream to gh-aw separately), but GitHub Actions tolerates them — they are not the cause of startup_failure. tsb has the same keys in its lock file and runs fine.

Related (separate upstream concern)

The lock file emitted by gh-aw v0.74.4 contains keys actionlint flags as invalid:

  • concurrency.queue: max (line 1022) — only group/cancel-in-progress are valid keys
  • permissions.copilot-requests: write (lines 344, 1168) — not a valid permission scope
  • permissions.vulnerability-alerts: read (line 355) — not a valid permission scope

These don't break anything today (Actions silently ignores them) but should be filed upstream against github/gh-aw to clean up emit output.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions