Add safe-output-health workflow to monitor safe output job failures by Copilot · Pull Request #2525 · github/gh-aw

Copilot · 2025-10-26T10:31:29Z

Adds daily monitoring for safe output job health across all agentic workflows. Safe output jobs (create_discussion, create_issue, add_comment, create_pull_request, etc.) run after agent execution to write GitHub API outputs, and failures in these jobs can cause silent data loss.

Workflow Design

Runs daily at midnight UTC or on manual dispatch
Uses gh-aw MCP server to fetch 24h of workflow logs
Scopes analysis exclusively to safe output job failures (not agent or detection failures)
Clusters errors by type: API failures, parsing errors, validation issues, permission problems
Stores patterns in cache-memory for historical trend analysis
Outputs structured discussion reports in "audits" category with:
- Job success/failure rates by type
- Root cause analysis for error clusters
- Actionable work item plans with priority assessment

Implementation

Created .github/workflows/safe-output-health.md with:

Claude engine with 30min timeout
Imports shared/mcp/gh-aw.md for MCP server setup
Instructions to parse workflow-logs directories for job-specific errors
Template for comprehensive audit reports with KPIs and recommendations

The workflow explicitly excludes agent/detection job analysis to avoid overlap with existing monitoring workflows (audit-workflows.md, ci-doctor.md).

Original prompt

Add a new agentic workflow, "safe-output-health" that scans for safe output job errors and generates a report.

use shared workflow gh-aw to load agentic workflow MCP

get 24h of logs and look for errors in the safe output jobs (not agent or detection)

cluster and analyze potential root cause

suggest potential work item / issues plans to address problems

Runs daily or on dispatch. Creates discussion.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2025-10-26T10:50:29Z

Agentic Changeset Generator triggered by this pull request.

Copilot

Pull Request Overview

This PR adds a new agentic workflow for daily monitoring of safe output job health across all workflows. The workflow addresses a gap in observability by specifically tracking failures in jobs that write GitHub API outputs (discussions, issues, comments, PRs) after agent execution completes. These failures can cause silent data loss if not monitored.

Key changes:

Creates a daily automated health check that analyzes 24 hours of safe output job execution logs
Implements error clustering and root cause analysis specifically scoped to output job failures
Generates structured audit reports with actionable work items and recommendations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-26T10:50:53Z

.github/workflows/safe-output-health.md

+engine: claude
+tools:
+  cache-memory: true
+  timeout: 300


The timeout: 300 field is not a valid frontmatter field. This appears to be intended as a step-level timeout but is placed at the top-level. According to the schema, workflows use timeout_minutes at the top level (which is already set to 30 on line 22), and individual steps can have timeout configurations within their definition.

Suggested change

timeout: 300

Copilot · 2025-10-26T10:50:53Z

.github/workflows/safe-output-health.md

+    category: "audits"
+    max: 1
+timeout_minutes: 30
+strict: true


The strict: true field is not a valid frontmatter field according to the workflow schema. This field does not exist in the allowed frontmatter fields (on, permissions, engine, tools, steps, safe-outputs, timeout_minutes, imports, etc.) and will cause validation errors during compilation.

Suggested change

strict: true

Copilot · 2025-10-26T10:50:53Z

.github/workflows/safe-output-health.md

+    env:
+      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+    run: ./gh-aw logs --start-date -1d -o /tmp/gh-aw/aw-mcp/logs


The step attempts to run ./gh-aw binary directly, but this conflicts with line 46 which states 'DO NOT ATTEMPT TO USE GH AW DIRECTLY, it is not authenticated. Use the MCP server instead.' Either remove this step and rely on the MCP server's logs tool, or update the documentation to clarify when direct binary usage is appropriate.

Suggested change

env:

GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

run: ./gh-aw logs --start-date -1d -o /tmp/gh-aw/aw-mcp/logs

run: mcp logs --start-date -1d -o /tmp/gh-aw/aw-mcp/logs

Copilot · 2025-10-26T10:50:54Z

.github/workflows/safe-output-health.md

+permissions:
+  contents: read
+  actions: read
+engine: claude


[nitpick] The workflow uses engine: claude with tools.cache-memory: true, but according to the coding guidelines, cache-memory is only documented to work with Claude and Custom engines. While this is valid, it would be clearer to use the object notation engine: { id: claude } for consistency with other engine configurations in the codebase.

Suggested change

engine: claude

engine:

id: claude

Initial plan

1d8653c

Copilot AI assigned Copilot and pelikhan Oct 26, 2025

Copilot started work on behalf of pelikhan October 26, 2025 10:32 View session

Add safe-output-health agentic workflow

36b1467

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add new agentic workflow for safe output health scanning~~ Add safe-output-health workflow to monitor safe output job failures Oct 26, 2025

Copilot AI requested a review from pelikhan October 26, 2025 10:45

Copilot finished work on behalf of pelikhan October 26, 2025 10:45

pelikhan marked this pull request as ready for review October 26, 2025 10:50

Copilot AI review requested due to automatic review settings October 26, 2025 10:50

pelikhan merged commit a354027 into main Oct 26, 2025
5 checks passed

pelikhan deleted the copilot/add-safe-output-health-workflow branch October 26, 2025 10:50

Copilot AI reviewed Oct 26, 2025

View reviewed changes

github-actions bot mentioned this pull request Mar 7, 2026

Workflow Health Dashboard - 2026-03-07 #19935

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add safe-output-health workflow to monitor safe output job failures#2525

Add safe-output-health workflow to monitor safe output job failures#2525
pelikhan merged 2 commits intomainfrom
copilot/add-safe-output-health-workflow

Copilot AI commented Oct 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Oct 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 26, 2025

Uh oh!

Copilot AI Oct 26, 2025

Uh oh!

Copilot AI Oct 26, 2025

Uh oh!

Copilot AI Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Workflow Design

Implementation

Uh oh!

Uh oh!

github-actions bot commented Oct 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Oct 26, 2025 •

edited

Loading