[safe-output-health] 🏥 Safe Output Health Report - 2026-05-30 #35865
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Safe Output Health Monitor. A newer discussion is available at Discussion #36066. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Clean day for safe-output jobs. Across 17 agentic runs (excluding this monitor), zero safe-output job hard failures were observed. All 11 runs that produced real GitHub side-effects succeeded, and 4 runs correctly emitted
noop(no action needed). This is a return to health after the 2026-05-27 regression day (94.6%), though note 2026-05-28 and 2026-05-29 were not audited (gap).The one caveat worth tracking: all four recurring clusters were not exercised today, so the high-value
review_path_unresolved_422Path-variant fix (pr_review_buffer.cjs:554) remains unvalidated in production — both PR reviewers used body-only reviews and emitted no line/path-anchored review comments.Safe Output Job Statistics
Successful artifacts: issues #35861, #35859, #35858 · discussions #35862, #35860 · PR #35855 (created) · PR updates #35855, #35853, #35857 · reviews #4394560553, #4394557075.
Error Clusters
None. No safe-output job produced a hard failure today. No API errors, parsing errors, validation errors, or permission errors were detected in any safe-output job.
Recurring Cluster Watch
All four tracked clusters were not exercised today — none reproduced, but none were validated either.
review_path_unresolved_422target_star_add_comment_no_item_number_fallbackpr_number, so resolution was finetarget_star_review_comment_no_pr_number_fallbackcancellation_counter_mislabeled_code_push_failedNew Observations (low severity)
Observation 1 — PR Sous Chef misleading "blocked" noop alongside successful writes
Run: §26675557007 · PR Sous Chef · copilot/gpt-5-mini
The agent emitted a
noopreading "No GitHub write action completed: safeoutputs and GitHub CLI write attempts were blocked in this environment" — and then successfully buffered 1update_pull_request+ 4add_comment(deduped to 2 unique) to PR #35836 via the safeoutputs MCP, each returning{result:success}.gh/git attempts (firewall.blocked_requests=26to an unknown host) with the safeoutputs MCP being unavailable.safe_outputsjob was stillstatus=in_progresswhen logs were captured (job started 05:29:04Z; agent finished 05:27:56Z). The absentsafe-output-items.jsonlis a snapshot-timing artifact, not a job failure.Impact: log-clarity / agent-behavior only. A misleading "blocked" noop in the buffer would alarm a future investigator.
Observation 2 — Two scheduled runs emitted empty agent_output (no safe-output tool called)
Runs: §26675076543 (Copilot CLI Deep Research Agent) and §26675034531 (jsweep – JavaScript Unbloater), both copilot.
Both finished with
agent_output.json = {"items":[]}— no safe-output tool (not evennoop) was ever called, violating the "must call at least one safe-output tool" contract. The safeoutputs MCP server started cleanly but received zero calls. This is an agent-completion gap, not a safe-output job failure — flagged for context only.Recommendations
Watch (highest priority, carried from 2026-05-27)
review_path_unresolved_422Path-variant fix. The one-line predicate fix atpr_review_buffer.cjs:554(match both"Line could not be resolved"and"Path could not be resolved") has not been exercised in production since the 2026-05-27 regression. Add/confirm a unit test mirroring the existing Line-variant test with the Path variant to lock it in regardless of production traffic.Minor (agent-side, low severity)
2. PR Sous Chef noop semantics. Tighten the agent guidance so it does not emit a "writes blocked"
noopwhen it subsequently uses the safeoutputs MCP successfully; the firewall blocking directgh/git is expected and is not a safeoutputs outage. Separately, confirm downstream handler precedence when a buffer contains both anoopand real write messages (expected: process writes, ignore noop). Verify PR #35836 received the two nudge comments (downstream job was mid-flight at capture).3. Empty-agent_output conformance. Deep Research and jsweep completed without calling any safe-output tool. Consider a lightweight reminder/guard so scheduled runs always emit at least a
noop.Work Item Plans
Work Item 1 — Lock in the review_path_unresolved_422 Path-variant fix with a test
"Line could not be resolved". The 2026-05-27 regression was a"Path could not be resolved"422. The fix is a one-line predicate change; it has not been validated against live traffic since.pr_review_buffer.cjs:554matches both"Line could not be resolved"and"Path could not be resolved".pr_review_buffer.test.cjsmirrors the Line-variant test using the Path variant.Work Item 2 — Make agents stop mislabeling firewall blocks as safeoutputs outages
gh/git (expected) from a safeoutputs MCP outage, and avoid emitting a contradictory noop.Historical Context
review_path_unresolved_422Path regressionTrend: Error rate returned to 0 after the 2026-05-27 regression day. The Path-variant fix remains the top open item to validate. Note the dataset today was small (~1 hour of morning scheduled/PR runs) and a 2-day audit gap precedes it.
Metrics and KPIs
Next Steps
review_path_unresolved_422Path-variant fix and validate on the next line-anchored review runagent_outputconformance gap on Deep Research and jsweepReferences:
Beta Was this translation helpful? Give feedback.
All reactions