Executive Summary
AgentRx ingested the last 20 gh-aw agent session runs (2026-05-31 → 2026-06-01) and built a 20-step trajectory. All 4 failures in the window share one identical root cause: the activation-phase step "Check daily workflow token guardrail" crashes with an unhandled ERR_MODULE_NOT_FOUND: Cannot find package '@actions/artifact', aborting the run before the agent ever starts (0 turns, 0 tokens recorded). This is the single highest-impact failure pattern — 100% of failures, affecting daily scheduled workflows fleet-wide.
AgentRx Evidence
- Critical step:
daily-effective-workflow-guardrail → actions/setup/js/check_daily_effective_workflow_guardrail.cjs (activation job, step index before the agent job)
- Failure category: Unguarded precondition before an expensive capability — a dynamic
await import("@actions/artifact") with no fallback throws an unhandled module-resolution error that fails the whole activation job (taxonomy: adding precondition checks before expensive tools)
- Frequency / impact: 4 / 4 failures (100%) across the 20-run window; 20% of all runs. Each failure burns a scheduled run with zero agent execution (no
agent job, no turns, no effective tokens) — pure wasted runs.
- Root cause (confirmed from activation logs): The runtime-checked-out
.cjs (repo HEAD) now requires @actions/artifact, but the deployed activation step ran with safe-output-artifact-client: false (log line 256), so setup.sh never npm installed the package → import fails. A classic stale-lock / dependency-precondition mismatch, compounded by (a) the unguarded getArtifactClient() call at check_daily_effective_workflow_guardrail.cjs:301, and (b) setup.sh:410 swallowing install failures with || true.
- Representative run IDs:
26724985718 (Agentic Workflow Audit Agent), 26724580595 (Contribution Check), 26724289462 (Daily Caveman Optimizer), 26724076862 (Chaos PR Bundle Fuzzer)
Labeled Violations (failure-pattern-classifier)
| violation |
evidence |
fix_type |
rationale |
Activation guardrail aborts run on missing @actions/artifact |
ERR_MODULE_NOT_FOUND at check_daily_effective_workflow_guardrail.cjs:25; 4/4 failing runs, all <40s, 0 turns |
adding precondition checks before expensive tools |
getArtifactClient() (line 301) is awaited outside any try/catch; a best-effort cost cap should never fail the whole activation job |
| Silent npm-install failure path |
setup.sh:410 npm install ... @actions/artifact ... || true hides install/registry failures |
improving retry/backoff strategy |
Install errors are swallowed, so a firewall/registry hiccup yields the same hard crash downstream with no signal |
| Lock/runtime dependency drift |
runtime ran safe-output-artifact-client: false while HEAD .cjs requires the package |
adding precondition checks before expensive tools |
Runtime guardrail must tolerate the package being absent rather than assuming the lock matches the script |
AgentRx Artifacts
- IR / trajectory: Built
/tmp/gh-aw/agent/agentrx/trajectory.json from MCP run data — 20 steps, failures ordered first. Each step carries run_id, conclusion, error_signal, duration, effective_tokens, turns, engine, event. The 4 failure steps all have turns: none and effective_tokens: none, confirming the agent never executed.
- Invariant / checker highlights: AgentRx invariant generation (
static/dynamic) and check could not run — all three LLM endpoints are unavailable in this sandbox (copilot CLI not on PATH; azure requires unset AGENT_VERIFY_ENDPOINT; trapi requires internal auth). Diagnosis was instead grounded directly in MCP session telemetry + downloaded activation logs.
- Judge classification: Not available (LLM-gated stage skipped). Root-cause category derived deterministically from the identical activation-log signature across all 4 failures.
- Known limitations: No LLM endpoint →
ir/static/dynamic/check/judge/report stages skipped; findings rely on runs[] session fields and per-run *_activation.txt logs, which are sufficient and unambiguous for this failure.
Recommended Optimization
Make the daily-token guardrail fail open instead of crashing activation. Wrap the artifact-client acquisition and its use in main() with try/catch so that when @actions/artifact cannot be loaded, the step emits a core.warning and skips the artifact-based token inspection (returning normally) rather than throwing.
- The single change: In
actions/setup/js/check_daily_effective_workflow_guardrail.cjs, guard getArtifactClient() (defined at lines 24–26, called unguarded at line 301). On failure to import/instantiate, log core.warning("Daily effective-token guardrail: @actions/artifact unavailable; skipping artifact-based token inspection") and return from main().
- Why highest impact: It eliminates 100% of the observed failures with a few lines, and is durable against every recurrence vector — stale lock files, npm-registry firewall blocks, and the
|| true-swallowed install failures. A best-effort cost cap should never be able to take down every daily scheduled workflow before the agent runs.
- Where to implement: primary —
actions/setup/js/check_daily_effective_workflow_guardrail.cjs:301 (+ helper at :24). Complementary hardening: recompile so deployed .lock.yml files consistently set safe-output-artifact-client: 'true' (compiler already intends this at pkg/workflow/compiler_activation_job_builder.go:106), and stop swallowing the install error at actions/setup/setup.sh:410.
Validation Plan
- Next-run check: Re-run (or await the next schedule of) any of the 4 workflows. Expected: the activation job concludes success and the
agent job runs (turns > 0, effective tokens recorded), instead of failing in <40s.
- Degraded-path check: With
@actions/artifact deliberately absent, confirm the activation log shows the new core.warning annotation and the run still proceeds to the agent.
- Success metrics: scheduled-daily activation failure rate drops from 100% → 0% in this window;
error_count for these 4 workflows → 0; missing_tools/unhandled-error events for the guardrail step disappear from the next AgentRx run.
References
- §26724985718 — Agentic Workflow Audit Agent (failure, 100% rate hotspot)
- §26724580595 — Contribution Check (identical failure)
- §26724289462 — Daily Caveman Optimizer (identical failure)
Generated by ⚡ Daily AgentRx Trace Optimizer · opus48 2.9M · ◷
Executive Summary
AgentRx ingested the last 20 gh-aw agent session runs (2026-05-31 → 2026-06-01) and built a 20-step trajectory. All 4 failures in the window share one identical root cause: the activation-phase step "Check daily workflow token guardrail" crashes with an unhandled
ERR_MODULE_NOT_FOUND: Cannot find package '@actions/artifact', aborting the run before the agent ever starts (0 turns, 0 tokens recorded). This is the single highest-impact failure pattern — 100% of failures, affecting daily scheduled workflows fleet-wide.AgentRx Evidence
daily-effective-workflow-guardrail→actions/setup/js/check_daily_effective_workflow_guardrail.cjs(activation job, step index before theagentjob)await import("@actions/artifact")with no fallback throws an unhandled module-resolution error that fails the whole activation job (taxonomy: adding precondition checks before expensive tools)agentjob, no turns, no effective tokens) — pure wasted runs..cjs(repo HEAD) now requires@actions/artifact, but the deployed activation step ran withsafe-output-artifact-client: false(log line 256), sosetup.shnevernpm installed the package → import fails. A classic stale-lock / dependency-precondition mismatch, compounded by (a) the unguardedgetArtifactClient()call atcheck_daily_effective_workflow_guardrail.cjs:301, and (b)setup.sh:410swallowing install failures with|| true.26724985718(Agentic Workflow Audit Agent),26724580595(Contribution Check),26724289462(Daily Caveman Optimizer),26724076862(Chaos PR Bundle Fuzzer)Labeled Violations (failure-pattern-classifier)
@actions/artifactERR_MODULE_NOT_FOUNDatcheck_daily_effective_workflow_guardrail.cjs:25; 4/4 failing runs, all <40s, 0 turnsgetArtifactClient()(line 301) is awaited outside any try/catch; a best-effort cost cap should never fail the whole activation jobsetup.sh:410npm install ...@actions/artifact... || truehides install/registry failuressafe-output-artifact-client: falsewhile HEAD.cjsrequires the packageAgentRx Artifacts
/tmp/gh-aw/agent/agentrx/trajectory.jsonfrom MCP run data — 20 steps, failures ordered first. Each step carriesrun_id,conclusion,error_signal,duration,effective_tokens,turns,engine,event. The 4 failure steps all haveturns: noneandeffective_tokens: none, confirming the agent never executed.static/dynamic) andcheckcould not run — all three LLM endpoints are unavailable in this sandbox (copilotCLI not on PATH;azurerequires unsetAGENT_VERIFY_ENDPOINT;trapirequires internal auth). Diagnosis was instead grounded directly in MCP session telemetry + downloaded activation logs.ir/static/dynamic/check/judge/reportstages skipped; findings rely onruns[]session fields and per-run*_activation.txtlogs, which are sufficient and unambiguous for this failure.Recommended Optimization
Make the daily-token guardrail fail open instead of crashing activation. Wrap the artifact-client acquisition and its use in
main()with try/catch so that when@actions/artifactcannot be loaded, the step emits acore.warningand skips the artifact-based token inspection (returning normally) rather than throwing.actions/setup/js/check_daily_effective_workflow_guardrail.cjs, guardgetArtifactClient()(defined at lines 24–26, called unguarded at line 301). On failure to import/instantiate, logcore.warning("Daily effective-token guardrail:@actions/artifactunavailable; skipping artifact-based token inspection")andreturnfrommain().|| true-swallowed install failures. A best-effort cost cap should never be able to take down every daily scheduled workflow before the agent runs.actions/setup/js/check_daily_effective_workflow_guardrail.cjs:301(+ helper at:24). Complementary hardening: recompile so deployed.lock.ymlfiles consistently setsafe-output-artifact-client: 'true'(compiler already intends this atpkg/workflow/compiler_activation_job_builder.go:106), and stop swallowing the install error atactions/setup/setup.sh:410.Validation Plan
agentjob runs (turns > 0, effective tokens recorded), instead of failing in <40s.@actions/artifactdeliberately absent, confirm the activation log shows the newcore.warningannotation and the run still proceeds to the agent.error_countfor these 4 workflows →0;missing_tools/unhandled-error events for the guardrail step disappear from the next AgentRx run.References