Skip to content

[agentrx-optimizer] Daily Workflow Optimization - 2026-06-01 #36159

@github-actions

Description

@github-actions

Executive Summary

AgentRx ingested the last 20 gh-aw agent session runs (2026-05-31 → 2026-06-01) and built a 20-step trajectory. All 4 failures in the window share one identical root cause: the activation-phase step "Check daily workflow token guardrail" crashes with an unhandled ERR_MODULE_NOT_FOUND: Cannot find package '@actions/artifact', aborting the run before the agent ever starts (0 turns, 0 tokens recorded). This is the single highest-impact failure pattern — 100% of failures, affecting daily scheduled workflows fleet-wide.

AgentRx Evidence

  • Critical step: daily-effective-workflow-guardrailactions/setup/js/check_daily_effective_workflow_guardrail.cjs (activation job, step index before the agent job)
  • Failure category: Unguarded precondition before an expensive capability — a dynamic await import("@actions/artifact") with no fallback throws an unhandled module-resolution error that fails the whole activation job (taxonomy: adding precondition checks before expensive tools)
  • Frequency / impact: 4 / 4 failures (100%) across the 20-run window; 20% of all runs. Each failure burns a scheduled run with zero agent execution (no agent job, no turns, no effective tokens) — pure wasted runs.
  • Root cause (confirmed from activation logs): The runtime-checked-out .cjs (repo HEAD) now requires @actions/artifact, but the deployed activation step ran with safe-output-artifact-client: false (log line 256), so setup.sh never npm installed the package → import fails. A classic stale-lock / dependency-precondition mismatch, compounded by (a) the unguarded getArtifactClient() call at check_daily_effective_workflow_guardrail.cjs:301, and (b) setup.sh:410 swallowing install failures with || true.
  • Representative run IDs: 26724985718 (Agentic Workflow Audit Agent), 26724580595 (Contribution Check), 26724289462 (Daily Caveman Optimizer), 26724076862 (Chaos PR Bundle Fuzzer)

Labeled Violations (failure-pattern-classifier)

violation evidence fix_type rationale
Activation guardrail aborts run on missing @actions/artifact ERR_MODULE_NOT_FOUND at check_daily_effective_workflow_guardrail.cjs:25; 4/4 failing runs, all <40s, 0 turns adding precondition checks before expensive tools getArtifactClient() (line 301) is awaited outside any try/catch; a best-effort cost cap should never fail the whole activation job
Silent npm-install failure path setup.sh:410 npm install ... @actions/artifact ... || true hides install/registry failures improving retry/backoff strategy Install errors are swallowed, so a firewall/registry hiccup yields the same hard crash downstream with no signal
Lock/runtime dependency drift runtime ran safe-output-artifact-client: false while HEAD .cjs requires the package adding precondition checks before expensive tools Runtime guardrail must tolerate the package being absent rather than assuming the lock matches the script
AgentRx Artifacts
  • IR / trajectory: Built /tmp/gh-aw/agent/agentrx/trajectory.json from MCP run data — 20 steps, failures ordered first. Each step carries run_id, conclusion, error_signal, duration, effective_tokens, turns, engine, event. The 4 failure steps all have turns: none and effective_tokens: none, confirming the agent never executed.
  • Invariant / checker highlights: AgentRx invariant generation (static/dynamic) and check could not run — all three LLM endpoints are unavailable in this sandbox (copilot CLI not on PATH; azure requires unset AGENT_VERIFY_ENDPOINT; trapi requires internal auth). Diagnosis was instead grounded directly in MCP session telemetry + downloaded activation logs.
  • Judge classification: Not available (LLM-gated stage skipped). Root-cause category derived deterministically from the identical activation-log signature across all 4 failures.
  • Known limitations: No LLM endpoint → ir/static/dynamic/check/judge/report stages skipped; findings rely on runs[] session fields and per-run *_activation.txt logs, which are sufficient and unambiguous for this failure.

Recommended Optimization

Make the daily-token guardrail fail open instead of crashing activation. Wrap the artifact-client acquisition and its use in main() with try/catch so that when @actions/artifact cannot be loaded, the step emits a core.warning and skips the artifact-based token inspection (returning normally) rather than throwing.

  • The single change: In actions/setup/js/check_daily_effective_workflow_guardrail.cjs, guard getArtifactClient() (defined at lines 24–26, called unguarded at line 301). On failure to import/instantiate, log core.warning("Daily effective-token guardrail: @actions/artifact unavailable; skipping artifact-based token inspection") and return from main().
  • Why highest impact: It eliminates 100% of the observed failures with a few lines, and is durable against every recurrence vector — stale lock files, npm-registry firewall blocks, and the || true-swallowed install failures. A best-effort cost cap should never be able to take down every daily scheduled workflow before the agent runs.
  • Where to implement: primary — actions/setup/js/check_daily_effective_workflow_guardrail.cjs:301 (+ helper at :24). Complementary hardening: recompile so deployed .lock.yml files consistently set safe-output-artifact-client: 'true' (compiler already intends this at pkg/workflow/compiler_activation_job_builder.go:106), and stop swallowing the install error at actions/setup/setup.sh:410.

Validation Plan

  • Next-run check: Re-run (or await the next schedule of) any of the 4 workflows. Expected: the activation job concludes success and the agent job runs (turns > 0, effective tokens recorded), instead of failing in <40s.
  • Degraded-path check: With @actions/artifact deliberately absent, confirm the activation log shows the new core.warning annotation and the run still proceeds to the agent.
  • Success metrics: scheduled-daily activation failure rate drops from 100% → 0% in this window; error_count for these 4 workflows → 0; missing_tools/unhandled-error events for the guardrail step disappear from the next AgentRx run.

References

  • §26724985718 — Agentic Workflow Audit Agent (failure, 100% rate hotspot)
  • §26724580595 — Contribution Check (identical failure)
  • §26724289462 — Daily Caveman Optimizer (identical failure)

Generated by ⚡ Daily AgentRx Trace Optimizer · opus48 2.9M ·

  • expires on Jun 8, 2026, 12:13 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions