[agentrx-optimizer] Daily Workflow Optimization - 2026-06-01

### Executive Summary

AgentRx ingested the last 20 gh-aw agent session runs (2026-05-31 → 2026-06-01) and built a 20-step trajectory. **All 4 failures in the window share one identical root cause**: the activation-phase step **"Check daily workflow token guardrail"** crashes with an unhandled `ERR_MODULE_NOT_FOUND: Cannot find package '`@actions/artifact`'`, aborting the run **before the agent ever starts** (0 turns, 0 tokens recorded). This is the single highest-impact failure pattern — 100% of failures, affecting daily *scheduled* workflows fleet-wide.

### AgentRx Evidence

- **Critical step:** `daily-effective-workflow-guardrail` → `actions/setup/js/check_daily_effective_workflow_guardrail.cjs` (activation job, step index before the `agent` job)
- **Failure category:** Unguarded precondition before an expensive capability — a dynamic `await import("`@actions/artifact`")` with no fallback throws an *unhandled* module-resolution error that fails the whole activation job (taxonomy: *adding precondition checks before expensive tools*)
- **Frequency / impact:** **4 / 4 failures (100%)** across the 20-run window; **20% of all runs**. Each failure burns a scheduled run with **zero agent execution** (no `agent` job, no turns, no effective tokens) — pure wasted runs.
- **Root cause (confirmed from activation logs):** The runtime-checked-out `.cjs` (repo HEAD) now requires `@actions/artifact`, but the **deployed** activation step ran with `safe-output-artifact-client: false` (log line 256), so `setup.sh` never `npm install`ed the package → import fails. A classic stale-lock / dependency-precondition mismatch, compounded by (a) the **unguarded** `getArtifactClient()` call at `check_daily_effective_workflow_guardrail.cjs:301`, and (b) `setup.sh:410` swallowing install failures with `|| true`.
- **Representative run IDs:** `26724985718` (Agentic Workflow Audit Agent), `26724580595` (Contribution Check), `26724289462` (Daily Caveman Optimizer), `26724076862` (Chaos PR Bundle Fuzzer)

#### Labeled Violations (failure-pattern-classifier)

| violation | evidence | fix_type | rationale |
|---|---|---|---|
| Activation guardrail aborts run on missing `@actions/artifact` | `ERR_MODULE_NOT_FOUND` at `check_daily_effective_workflow_guardrail.cjs:25`; 4/4 failing runs, all <40s, 0 turns | adding precondition checks before expensive tools | `getArtifactClient()` (line 301) is awaited outside any try/catch; a best-effort cost cap should never fail the whole activation job |
| Silent npm-install failure path | `setup.sh:410` `npm install ... `@actions/artifact` ... \|\| true` hides install/registry failures | improving retry/backoff strategy | Install errors are swallowed, so a firewall/registry hiccup yields the same hard crash downstream with no signal |
| Lock/runtime dependency drift | runtime ran `safe-output-artifact-client: false` while HEAD `.cjs` requires the package | adding precondition checks before expensive tools | Runtime guardrail must tolerate the package being absent rather than assuming the lock matches the script |

<details>
<summary>AgentRx Artifacts</summary>

- **IR / trajectory:** Built `/tmp/gh-aw/agent/agentrx/trajectory.json` from MCP run data — 20 steps, failures ordered first. Each step carries `run_id`, `conclusion`, `error_signal`, `duration`, `effective_tokens`, `turns`, `engine`, `event`. The 4 failure steps all have `turns: none` and `effective_tokens: none`, confirming the agent never executed.
- **Invariant / checker highlights:** AgentRx invariant generation (`static`/`dynamic`) and `check` could **not** run — all three LLM endpoints are unavailable in this sandbox (`copilot` CLI not on PATH; `azure` requires unset `AGENT_VERIFY_ENDPOINT`; `trapi` requires internal auth). Diagnosis was instead grounded directly in MCP session telemetry + downloaded activation logs.
- **Judge classification:** Not available (LLM-gated stage skipped). Root-cause category derived deterministically from the identical activation-log signature across all 4 failures.
- **Known limitations:** No LLM endpoint → `ir/static/dynamic/check/judge/report` stages skipped; findings rely on `runs[]` session fields and per-run `*_activation.txt` logs, which are sufficient and unambiguous for this failure.

</details>

### Recommended Optimization

**Make the daily-token guardrail fail *open* instead of crashing activation.** Wrap the artifact-client acquisition and its use in `main()` with try/catch so that when `@actions/artifact` cannot be loaded, the step emits a `core.warning` and **skips** the artifact-based token inspection (returning normally) rather than throwing.

- **The single change:** In `actions/setup/js/check_daily_effective_workflow_guardrail.cjs`, guard `getArtifactClient()` (defined at lines 24–26, called unguarded at line 301). On failure to import/instantiate, log `core.warning("Daily effective-token guardrail: `@actions/artifact` unavailable; skipping artifact-based token inspection")` and `return` from `main()`.
- **Why highest impact:** It eliminates 100% of the observed failures with a few lines, and is durable against *every* recurrence vector — stale lock files, npm-registry firewall blocks, and the `|| true`-swallowed install failures. A best-effort cost cap should never be able to take down every daily scheduled workflow before the agent runs.
- **Where to implement:** primary — `actions/setup/js/check_daily_effective_workflow_guardrail.cjs:301` (+ helper at `:24`). Complementary hardening: recompile so deployed `.lock.yml` files consistently set `safe-output-artifact-client: 'true'` (compiler already intends this at `pkg/workflow/compiler_activation_job_builder.go:106`), and stop swallowing the install error at `actions/setup/setup.sh:410`.

### Validation Plan

- **Next-run check:** Re-run (or await the next schedule of) any of the 4 workflows. Expected: the activation job concludes **success** and the **`agent` job runs** (turns > 0, effective tokens recorded), instead of failing in <40s.
- **Degraded-path check:** With `@actions/artifact` deliberately absent, confirm the activation log shows the new `core.warning` annotation and the run still proceeds to the agent.
- **Success metrics:** scheduled-daily activation failure rate drops from **100% → 0%** in this window; `error_count` for these 4 workflows → `0`; `missing_tools`/unhandled-error events for the guardrail step disappear from the next AgentRx run.

### References

- [§26724985718](https://github.com/github/gh-aw/actions/runs/26724985718) — Agentic Workflow Audit Agent (failure, 100% rate hotspot)
- [§26724580595](https://github.com/github/gh-aw/actions/runs/26724580595) — Contribution Check (identical failure)
- [§26724289462](https://github.com/github/gh-aw/actions/runs/26724289462) — Daily Caveman Optimizer (identical failure)







> Generated by [⚡ Daily AgentRx Trace Optimizer](https://github.com/github/gh-aw/actions/runs/26728247057) · opus48 2.9M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-agentrx-trace-optimizer%22&type=issues)
> - [x] expires  on Jun 8, 2026, 12:13 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[agentrx-optimizer] Daily Workflow Optimization - 2026-06-01 #36159

Executive Summary

AgentRx Evidence

Labeled Violations (failure-pattern-classifier)

Recommended Optimization

Validation Plan

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

violation	evidence	fix_type	rationale
Activation guardrail aborts run on missing `@actions/artifact`	`ERR_MODULE_NOT_FOUND` at `check_daily_effective_workflow_guardrail.cjs:25`; 4/4 failing runs, all <40s, 0 turns	adding precondition checks before expensive tools	`getArtifactClient()` (line 301) is awaited outside any try/catch; a best-effort cost cap should never fail the whole activation job
Silent npm-install failure path	`setup.sh:410` `npm install ...` @actions/artifact `... \|\| true` hides install/registry failures	improving retry/backoff strategy	Install errors are swallowed, so a firewall/registry hiccup yields the same hard crash downstream with no signal
Lock/runtime dependency drift	runtime ran `safe-output-artifact-client: false` while HEAD `.cjs` requires the package	adding precondition checks before expensive tools	Runtime guardrail must tolerate the package being absent rather than assuming the lock matches the script

[agentrx-optimizer] Daily Workflow Optimization - 2026-06-01 #36159

Description

Executive Summary

AgentRx Evidence

Labeled Violations (failure-pattern-classifier)

Recommended Optimization

Validation Plan

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions