Skip to content

[aw-failures] Daily Issues Report Generator: 9-day chronic node-missing-in-chroot failure on aw-gpu-runner-T4 #31370

@github-actions

Description

@github-actions

Problem statement

Daily Issues Report Generator has failed 9 consecutive days (every scheduled run since 2026-05-02) with the same exit-code 127 startup failure. Today's run (§25631557245, 2026-05-10 14:42 UTC) and yesterday's (§25603788749, 2026-05-09 14:41 UTC) both fail before the agent emits a single token, in identical fashion. Auto-issues like #31350 are created daily and auto-expire daily, so the chronic pattern is invisible in the open-issue queue.

Affected workflows and run IDs

Other Copilot/Codex workflows pinned to the aw-gpu-runner-T4 runner appear to be failing in the same window:

  • Daily Fact About gh-aw (§25635982688 and many more push-triggered runs) — fails before agent activation with 0 jobs recorded — a different failure mode that nonetheless correlates with the same runner.
  • Daily News — chronic failures going back to at least 2026-05-04.

Probable root cause

agent-stdio.log for both today's and yesterday's runs ends with:

[entrypoint][ERROR] Copilot CLI requires Node.js, but 'node' is not available inside AWF chroot.
[entrypoint][ERROR] Ensure Node.js is installed on the runner and reachable from PATH inside the chroot.
[entrypoint][ERROR] If using setup-node or nvm, verify the install path is present and bind-mounted into /host.
[entrypoint][ERROR] Example locations include /opt/hostedtoolcache/... and /home/runner/.nvm/...
[WARN] Command completed with exit code: 127
Process exiting with code: 127

The workflow correctly declares runtimes.node: version: "24" in daily-issues-report.md:21-23, so setup-node should run. But after the AWF chroot mounts /host, the Node binary that setup-node installed is not reachable inside the chroot's PATH search. This is specific to the aw-gpu-runner-T4 self-hosted runner — the same workflow shape (Copilot + runtimes.node) works on standard ubuntu-latest runners (e.g. Auto-Triage Issues, PR Triage Agent, Daily Safe Output Integrator all passed today on standard runners).

The runner appears to install Node in a location the AWF entrypoint's find /opt/hostedtoolcache /home/runner/work/_tool -maxdepth 5 -type d -name bin heuristic cannot find inside the chroot. Possibilities:

  1. Node toolchain is installed outside /opt/hostedtoolcache and /home/runner/work/_tool on the GPU runner image
  2. Bind-mount of the toolcache into /host is missing for this runner class
  3. Path is right, but symlinks resolve differently after the chroot

audit-diff between today's and yesterday's failing runs shows zero behavioral drift (same firewall posture, identical 0-token/0-turn signature) — this is a stable failure mode, not flaky infra.

Proposed remediation

Pick one of:

  1. Pin runs-on: back to a standard hosted runner (e.g. runs-on: ubuntu-latest) in .github/workflows/daily-issues-report.md and the other gpu-runner-T4 Copilot workflows until the runner image is fixed. Daily Issues Report has no documented GPU dependency in its prompt; the GPU pin appears optional.
  2. Fix the AWF chroot Node lookup for aw-gpu-runner-T4. Inspect the runner image to determine where setup-node lands the binary, then either add that path to the entrypoint's PATH-search heuristic, or update the runner image's bind-mount config so /host exposes the install location.
  3. Pre-bake Node into the aw-gpu-runner-T4 base image so runtimes.node doesn't have to install at runtime.

Success criteria / verification

  • Two consecutive successful scheduled runs of Daily Issues Report Generator with non-zero Turns and a posted report.
  • A passing run on aw-gpu-runner-T4 of a Copilot-engine workflow that declares runtimes.node, observed via agenticworkflows logs.

References

Generated by [aw] Failure Investigator (6h) · ● 41.1M ·

  • expires on May 17, 2026, 7:25 PM UTC


6h investigator update — 2026-05-13 13:31 UTC

Daily News is also affected by the same root cause on the same runner (aw-gpu-runner-T4). New evidence within the 6h window:

  • Run §25790672304 (2026-05-13 09:31 UTC), exit code 127. Agent step log: [entrypoint][ERROR] Copilot CLI requires Node.js, but 'node' is not available inside AWF chroot.
  • Runner: aw-gpu-runner-t4-1187438201 (Requested labels: aw-gpu-runner-T4).
  • Daily News has failed every scheduled run since at least 2026-05-04 (last 8 daily runs: all failure). Adds Daily News to the affected-workflow list alongside the Daily Issues Report Generator originally reported here.

No behavior change vs. the parent issue's root cause — adding cross-reference so the fix scope captures Daily News too.

Generated by [aw] Failure Investigator (6h) · ● 28.7M ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions