[safe-output-health] 🏥 Safe Output Health Report - 2026-05-30 #35865

2026-05-30T05:40:04Z

github-actions[bot]
Bot May 30, 2026

Executive Summary

Clean day for safe-output jobs. Across 17 agentic runs (excluding this monitor), zero safe-output job hard failures were observed. All 11 runs that produced real GitHub side-effects succeeded, and 4 runs correctly emitted noop (no action needed). This is a return to health after the 2026-05-27 regression day (94.6%), though note 2026-05-28 and 2026-05-29 were not audited (gap).

The one caveat worth tracking: all four recurring clusters were not exercised today, so the high-value review_path_unresolved_422 Path-variant fix (pr_review_buffer.cjs:554) remains unvalidated in production — both PR reviewers used body-only reviews and emitted no line/path-anchored review comments.

Period: Last 24h window (actual runs 04:35Z–05:27Z, 2026-05-30)
Runs Analyzed: 18 (17 excluding self-monitor)
Runs with real safe-output side-effects: 11 (all succeeded)
noop-only (clean) runs: 4
Safe-Output Jobs Failed: 0
Success Rate: 100%
Error Clusters (new hard failures): 0

Safe Output Job Statistics

Job Type	Executions	Success Rate
create_issue	3	100%
create_discussion	2	100%
create_pull_request	1	100%
update_pull_request	3	100%
submit_pull_request_review	2	100%
create_check_run	1	100%
noop (expected)	4+	100%
Total (real side-effects)	11	100%

Successful artifacts: issues #35861, #35859, #35858 · discussions #35862, #35860 · PR #35855 (created) · PR updates #35855, #35853, #35857 · reviews #4394560553, #4394557075.

Error Clusters

None. No safe-output job produced a hard failure today. No API errors, parsing errors, validation errors, or permission errors were detected in any safe-output job.

Recurring Cluster Watch

All four tracked clusters were not exercised today — none reproduced, but none were validated either.

Cluster	Status today	Risk
`review_path_unresolved_422`	Not exercised	⚠️ Path-variant fix still unvalidated — reviewers used body-only reviews
`target_star_add_comment_no_item_number_fallback`	Not exercised	PR Sous Chef add_comments carried explicit `pr_number`, so resolution was fine
`target_star_review_comment_no_pr_number_fallback`	Not exercised	Latent
`cancellation_counter_mislabeled_code_push_failed`	Not exercised	No WTD3 abort occurred

New Observations (low severity)

Observation 1 — PR Sous Chef misleading "blocked" noop alongside successful writes

Run: §26675557007 · PR Sous Chef · copilot/gpt-5-mini

The agent emitted a noop reading "No GitHub write action completed: safeoutputs and GitHub CLI write attempts were blocked in this environment" — and then successfully buffered 1 update_pull_request + 4 add_comment (deduped to 2 unique) to PR #35836 via the safeoutputs MCP, each returning {result:success}.

The "blocked" claim is false for safeoutputs. Root cause: the agent conflated 26 firewall-blocked direct gh/git attempts (firewall.blocked_requests=26 to an unknown host) with the safeoutputs MCP being unavailable.
The downstream safe_outputs job was still status=in_progress when logs were captured (job started 05:29:04Z; agent finished 05:27:56Z). The absent safe-output-items.jsonl is a snapshot-timing artifact, not a job failure.

Impact: log-clarity / agent-behavior only. A misleading "blocked" noop in the buffer would alarm a future investigator.

Observation 2 — Two scheduled runs emitted empty agent_output (no safe-output tool called)

Runs: §26675076543 (Copilot CLI Deep Research Agent) and §26675034531 (jsweep – JavaScript Unbloater), both copilot.

Both finished with agent_output.json = {"items":[]} — no safe-output tool (not even noop) was ever called, violating the "must call at least one safe-output tool" contract. The safeoutputs MCP server started cleanly but received zero calls. This is an agent-completion gap, not a safe-output job failure — flagged for context only.

Recommendations

Watch (highest priority, carried from 2026-05-27)

Validate the review_path_unresolved_422 Path-variant fix. The one-line predicate fix at pr_review_buffer.cjs:554 (match both "Line could not be resolved" and "Path could not be resolved") has not been exercised in production since the 2026-05-27 regression. Add/confirm a unit test mirroring the existing Line-variant test with the Path variant to lock it in regardless of production traffic.

Minor (agent-side, low severity)
2. PR Sous Chef noop semantics. Tighten the agent guidance so it does not emit a "writes blocked" noop when it subsequently uses the safeoutputs MCP successfully; the firewall blocking direct gh/git is expected and is not a safeoutputs outage. Separately, confirm downstream handler precedence when a buffer contains both a noop and real write messages (expected: process writes, ignore noop). Verify PR #35836 received the two nudge comments (downstream job was mid-flight at capture).
3. Empty-agent_output conformance. Deep Research and jsweep completed without calling any safe-output tool. Consider a lightweight reminder/guard so scheduled runs always emit at least a noop.

Work Item Plans

Work Item 1 — Lock in the review_path_unresolved_422 Path-variant fix with a test

Type: Bug Fix + Test
Priority: High
Description: The body-only fallback predicate historically matched only "Line could not be resolved". The 2026-05-27 regression was a "Path could not be resolved" 422. The fix is a one-line predicate change; it has not been validated against live traffic since.
Acceptance Criteria:
- pr_review_buffer.cjs:554 matches both "Line could not be resolved" and "Path could not be resolved".
- New unit test in pr_review_buffer.test.cjs mirrors the Line-variant test using the Path variant.
Effort: Small

Work Item 2 — Make agents stop mislabeling firewall blocks as safeoutputs outages

Type: Enhancement (agent guidance / log clarity)
Priority: Low
Description: PR Sous Chef emitted a noop claiming safeoutputs was blocked, then used it successfully. Update prompt guidance to distinguish firewall-blocked direct gh/git (expected) from a safeoutputs MCP outage, and avoid emitting a contradictory noop.
Acceptance Criteria:
- Agent does not emit a "blocked" noop when safeoutputs calls in the same run succeed.
- Downstream precedence of noop-vs-writes in one buffer is documented/confirmed.
Effort: Small

Historical Context

Date	Runs	Failed runs	Success rate	Headline
2026-05-24	13	0	100%	Clean
2026-05-26	64	0	100%	Clean (1 soft recovery)
2026-05-27	81	3	94.6%	`review_path_unresolved_422` Path regression
2026-05-28	—	—	—	not audited (gap)
2026-05-29	—	—	—	not audited (gap)
2026-05-30	17	0	100%	Clean; clusters not exercised

Trend: Error rate returned to 0 after the 2026-05-27 regression day. The Path-variant fix remains the top open item to validate. Note the dataset today was small (~1 hour of morning scheduled/PR runs) and a 2-day audit gap precedes it.

Metrics and KPIs

Overall Safe Output Success Rate: 100% (0/11 real side-effects failed)
Most reliable: all handler types — clean sweep
Most problematic: none today
Soft recoveries: 0 (none needed)

Next Steps

Land + test the review_path_unresolved_422 Path-variant fix and validate on the next line-anchored review run
Verify PR Default Pi workflows to CLI proxy mode and relax Smoke Pi tool/file restrictions and runtime settings #35836 received PR Sous Chef's nudge comments (downstream job was in-progress at capture)
Tighten agent guidance on firewall-block-vs-safeoutputs-outage noop wording
Investigate empty agent_output conformance gap on Deep Research and jsweep

References:

§26675557007 — PR Sous Chef (premature blocked-noop observation)
§26675076543 — Deep Research (empty agent_output)
§26675034531 — jsweep (empty agent_output)

Generated by 🔒 Safe Output Health Monitor · opus48 3.1M · ◷

expires on May 31, 2026, 5:40 AM UTC

2026-05-31T05:58:30Z

github-actions[bot]
Bot May 31, 2026
Author

This discussion has been marked as outdated by Safe Output Health Monitor.

A newer discussion is available at Discussion #36066.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[safe-output-health] 🏥 Safe Output Health Report - 2026-05-30 #35865

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[safe-output-health] 🏥 Safe Output Health Report - 2026-05-30 #35865

Uh oh!

github-actions[bot] Bot May 30, 2026

Executive Summary

Safe Output Job Statistics

Error Clusters

Recurring Cluster Watch

New Observations (low severity)

Recommendations

Work Item Plans

Historical Context

Metrics and KPIs

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 31, 2026 Author

github-actions[bot]
Bot May 30, 2026

github-actions[bot]
Bot May 31, 2026
Author