[safe-output-health] Safe Output Health Report - 2026-05-22 (95.8% success, 1 new handler-inconsistency cluster) #33948
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Safe Output Health Monitor. A newer discussion is available at Discussion #34174. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Reviewed all 65 agentic workflow runs in
github/gh-awfrom the last 24 hours (2026-05-21 ~05:30 UTC → 2026-05-22 ~05:30 UTC). Of those, 24 runs reached the Process Safe Outputs step and collectively handled approximately 105 safe-output messages. 103 messages succeeded, 2 failed (both in a single run), and 1 run hit the same 422 review-path pattern from yesterday but recovered cleanly via the new body-only fallback. Overall safe-output run-level success rate: 95.8% (23 of 24 runs with zero failed messages).The headline of yesterday's audit —
review_path_unresolved_422— appears to be remediated: the handler now catches the 422 and retries as a body-only review. Today surfaces a new, narrower handler inconsistency that explains the single failed run.Summary
Smoke Clauderun §26269897290)Smoke Copilotrun §26269860137)target_star_review_comment_no_pr_number_fallback), 1 recurring-but-now-recovering (review_path_unresolved_422)Safe-Output Job Statistics
Critical Issues
Cluster 1 (NEW):
target_star_review_comment_no_pr_number_fallbackcreate_pull_request_review_commentProcess Safe Outputs.txtfrom §26269897290, step9_Process Safe Outputs, line 409-410):target: "*"auto-resolution across handlers. In the same run,update_pull_request(msg 4) andsubmit_pull_request_review(msg 7) both auto-resolvedtarget: "*"to triggering PR Add Pi inference request diagnostics to provider logging #33886 (Resolved target pull request #33886 (target config: *)andSet review context from triggering PR: github/gh-aw#33886). Butcreate_pull_request_review_comment(msgs 5 and 6) rejected with no fallback attempt — even though that fallback does exist in this handler on another code path (it worked in §26269860246 and §26267820166 where the same handler loggedFetched full pull request details for PR #xxxxx).target: "*"review comments without an explicitpull_request_number.Recovered Patterns (no action required — keep monitoring)
Cluster 2 (REMEDIATED from 2026-05-21):
review_path_unresolved_422submit_pull_request_reviewProcess Safe Outputs.txtfrom §26269860137, step9_Process Safe Outputs, lines 553-556):Failed: 0. Yesterday this same pattern hard-failed the run; today the handler catches the 422 and falls back to a body-only review. Yesterday's primary recommendation is implemented.Root Cause Analysis
Handler-level inconsistency (target resolution)
Three handlers all support
target: "*", but their treatment of the case "item lackspull_request_number" diverges:target: "*"+ nopull_request_numberon itemupdate_pull_requestsubmit_pull_request_reviewcreate_pull_request_review_commenttarget: "*"; resolves to triggering PR when the item omitstargetand inherits the handler-level default. This branch-conditional behavior is the bug.API/permission warnings (informational — no escalation)
Skipping resolve_pull_request_review_thread for PRRT_*: Resource not accessible by integration— recurring across Smoke Claude runs. Config issue:GITHUB_TOKENdoes not havepull_requests: writeGraphQL scope for resolving review threads. Surfaces as an informative skip, not a failure.##[warning][renderMarkdownTemplate] Fence count mismatch: input had N fence marker(s), output has M— cosmetic; occurs whenever conditional blocks containing fences are stripped. No functional impact.Recommendations
Immediate Actions
None — no production safe-output failures in the last 24 hours. The single failed run is a smoke test exercising a real handler gap.
Bug Fix Required
target: "*"resolution across safe-output handlersactions/safe-outputs/safe_output_handler_manager.cjs(and individual handler modules)create_pull_request_review_commenthas two code paths fortarget: "*"; only one falls back to the triggering PR.resolveTriggeringPullRequest({ item, handlerConfig, env })utility used bycreate_pull_request_review_comment,submit_pull_request_review,update_pull_request,add_reviewer,push_to_pull_request_branch, etc.create_pull_request_review_comment(others already do the right thing)Process Improvement
Target is "*" but no pull_request_number specified— no item index, no path, no diff between "agent emitted bad data" vs "handler couldn't auto-resolve".message_index,path, and a hint likeSet explicit pull_request_number or remove the target field to use handler defaultin the error so agents can self-correct on retry and humans can read the log faster.Work Item Plans
Work Item 1: Fix
target: "*"fallback inconsistency increate_pull_request_review_commenttarget: "*"and nopull_request_numbereven when the triggering PR context is available (and sibling handlers in the same run use it). Refactor to share oneresolveTriggeringPullRequestutility across handlers.target: "*", nopull_request_number) and asserts the handler resolves to the triggering PRFailed: 0on a representative samplesafe_output_handler_manager.cjs, identify per-handler resolution paths, extract shared utility. Add unit tests covering: explicitpull_request_number,target: "*"+ no number (should auto-resolve),target: "owner/repo#N"+ no number, missing triggering-PR env (should reject with clearer error).Work Item 2: Add per-item index to safe-output rejection errors
Message Nindex and any item-level fields (path,line,pull_request_number) to the error.##[error]✗ Message N (handler) failed:lines include enough context for the user (or agent) to identify the bad item without reading earlier linesHistorical Context
review_path_unresolved_422(hard fail)target_star_review_comment_no_pr_number_fallback(NEW)Trend: Yesterday's headline pattern is now recovering automatically (one occurrence today, fully self-healed via body-only retry). Today's new pattern is a related but narrower handler-inconsistency cluster — easy to fix at the source. Absolute count of failed safe-output runs holds at 1/day across both audits; the affected workflow in both cases is a smoke/test workflow exercising edge cases, not a production data-flow workflow.
Metrics and KPIs
create_issue,add_comment,add_labels,update_pull_request,add_reviewer,push_to_pull_request_branch,create_code_scanning_alert,comment_memory— 100% across all observed runscreate_pull_request_review_comment— 80% (2 failures of ~10 invocations, all in 1 smoke run)review_path_unresolved_422— soft recovery confirmed workingNext Steps
target: "*"resolution)target_star_review_comment_no_pr_number_fallbackover the next 1-2 daily audits to confirm whether the smoke test deterministically reproduces it (rather than being intermittent agent-output noise)review_path_unresolved_422recurrences to ensure body-only fallback continues to land cleanlyReferences:
create_pull_request_review_commentfailuresBeta Was this translation helpful? Give feedback.
All reactions