[safe-output-health] Safe Output Health Report — 2026-05-31: assign_to_agent number-guess failure (96.7% msg success) #36066
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Safe Output Health Monitor. A newer discussion is available at Discussion #36189. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Audited the last 24h of agentic workflow activity (window ≈ 01:24Z–05:27Z, 2026-05-31). 41 runs analyzed (39 completed, 2 in-progress incl. this monitor). 23 safe-output jobs processed 61 messages; 2 failed — both in one run, on
assign_to_agent.assign_to_agent, run-26702419759)Clean day except one new failure cluster in LintMonster (agent guessed issue numbers instead of using temporary-id cross-refs). Separately, the tracked
review_path_unresolved_422Line-variant fallback recovered correctly for the 4th time.Safe-Output Statistics
🔴 Critical Cluster (NEW):
assign_to_agentliteral issue-number guess##[error]✗ Message 4 (assign_to_agent) failed: ... Could not resolve to an Issue with the number of 36048(and 36049).##[error]2 safe output(s) failed.aw_Xa3lqDic/aw_JNSbMXq8/aw_7zxP5sPj, pertemporary-id-map.json). It then emittedassign_to_agentwith literalissue_number36048/36049/36050 (confirmed inagent_output.json) — predicted numbers, off by two. Only [lint-monster] [Lint] Fix pkg/workflow function length violations (286 issues) #36050 matched a real issue and succeeded; [community] Update community contributions in README #36048/[daily-compiler-quality] Daily Compiler Code Quality Report - 2026-05-31 #36049 don't exist → 2 hard failures. The agent guessed numbers instead of using the#aw_<temporaryId>cross-reference form the processor rewrites.assign_to_agenthandler (a) hard-fails the job (##[error]) on an unresolvable issue instead of soft-skipping a best-effort assignment, and (b) does not resolve#aw_temporary-ids on itsissue_numberfield the wayadd_comment/create_pull_requestdo.✅ Positive:
review_path_unresolved_422Line-variant fallback recovered (4th time)submit_pull_request_reviewhit 422"Line could not be resolved"; body-only fallback fired and retried successfully (Failed: 0). 4th Line-variant soft recovery (after 05-22, 05-26)."Path could not be resolved"422 exercised since the 2026-05-27 regression; thepr_review_buffer.cjs:554predicate fix remains unconfirmed in production.Recurring clusters — status today
Low-severity observation — metrics blind spot
The logs aggregator reported
total_safe_items: 0/ "0 write runs, 41 read-only", yet 61 messages were processed incl. dozens of real writes (8 PRs, 6 issues, 5 discussions, 15 comments, 1 push, 1 PR update, 1 label set). Cause: today's workflows emit via thebash_safeoutputsCLI wrapper, which the actuation counter doesn't attribute as a write — under-reporting volume by ~100%. Not a failure; worth aligning the counter. (Note: all::error::lines elsewhere were agent/detection-job concerns — out of scope; their safe-output jobs processedFailed: 0.)Recommendations & Work Items
WI-1 (High) —
assign_to_agenttemporary-id + soft-skip. Eliminate the "Could not resolve to an Issue" hard failures.#aw_<temporaryId>inassign_to_agent.issue_number, never guessed numbers.safe_output_handler_manager.cjs— (a) resolve#aw_refs onissue_number; (b) on unresolvable target emit##[warning]soft-skip, not a job-failing##[error]; (c) list the run's created-issue numbers in the error.WI-2 (Medium) — Validate
review_path_unresolved_422Path-variant fix. Confirmpr_review_buffer.cjs:554matches both"Line could not be resolved"and"Path could not be resolved"; add a Path-variant unit test mirroring the Line test. Path 422 has not been organically exercised in 4 audits — consider a targeted smoke test. Effort: Small.WI-3 (Low) — Align observability actuation counter with
bash_safeoutputsCLI-wrapper writes (metrics accuracy).Historical Context
Trend: After a clean 05-30, one new per-message cluster appeared; reliability stays high (96.7%). The cross-reference/target-resolution family remains the dominant theme — today's
assign_to_agentnumber-guess is a sibling of thetarget_star_*clusters. Review path is healthy (Line fallback validated 4×; 3/4 reviewers clean). Issue Monster also usedassign_to_agenttoday with 3/3 success, confirming the handler works when given valid numbers.References:
Beta Was this translation helpful? Give feedback.
All reactions