Add 6-hour [aw] failure investigation workflow#26694
Conversation
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7dd1d687-03bb-4dfd-aff6-daf96101fee7 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/7dd1d687-03bb-4dfd-aff6-daf96101fee7 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
Contribution Check: 🟢 AlignedGreat work on this PR! The failure investigator workflow is well-structured — the markdown definition, auto-compiled lock file, and MCP config are all cohesive and follow the project's conventions exactly. Checklist summary:
One note on tests: The checklist flagged the absence of test file changes. For these three files (workflow markdown, compiled lock file, MCP config), traditional Go unit tests don't directly apply — these are infrastructure/workflow-level artifacts. That said, if there are integration or workflow-level test patterns in the project (e.g.,
|
There was a problem hiding this comment.
Pull request overview
Adds a scheduled Agentic Workflow that periodically investigates the last 6 hours of [aw] failures and files a parent report issue with linked fix sub-issues.
Changes:
- Introduces a new workflow spec at
.github/workflows/aw-failure-investigator.mdscheduled every ~6 hours and runnable via manual dispatch. - Adds the compiled workflow lock file
.github/workflows/aw-failure-investigator.lock.yml. - Adds an MCP host configuration file
.github/mcp.jsonfor runninggh aw mcp-server.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/aw-failure-investigator.md | Defines the investigation workflow prompt, schedule, tools, and safe-output constraints. |
| .github/workflows/aw-failure-investigator.lock.yml | Generated compiled workflow YAML corresponding to the new workflow spec. |
| .github/mcp.json | Configures an MCP server entry to run gh aw mcp-server. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
.github/workflows/aw-failure-investigator.md:29
- As written, the workflow instructs the agent to call
noopwhen there are no actionable failures, andnoopis configured with defaults (which per repo docs default toreport-as-issue: true). On a 6-hour schedule this can create a steady stream of “no action needed” issues. Consider settingsafe-outputs.noop.report-as-issue: false(or changing the decision rule to emit no safe outputs when nothing actionable is found) to avoid issue spam.
max: 20
noop:
timeout-minutes: 60
- Files reviewed: 3/3 changed files
- Comments generated: 3
| title-prefix: "[aw-failures] " | ||
| labels: [agentic-workflows, automation, cookie] | ||
| max: 8 | ||
| group: true |
There was a problem hiding this comment.
safe-outputs.create-issue.group: true will cause created issues to be grouped under an auto-generated parent issue (group identifier is the workflow ID). Since this workflow already instructs the agent to create an explicit per-run parent report issue (and then link sub-issues to it), keeping group: true is likely to create an extra/unintended parent grouping issue. Consider removing group: true (or setting it to false) and relying on temporary_id + parent (or link-sub-issue) to build the desired parent/sub-issue structure.
This issue also appears on line 27 of the same file.
| group: true |
|
|
||
| - **Repository**: `${{ github.repository }}` | ||
| - **Lookback window**: last 6 hours | ||
| - **Issue query to inspect first**: <https://github.com/github/gh-aw/issues?q=is%3Aissue%20state%3Aopen%20label%3Aagentic-workflows> |
There was a problem hiding this comment.
The “Issue query to inspect first” link is hard-coded to github/gh-aw, while the workflow otherwise references ${{ github.repository }}. If this workflow is reused in another repo or run from a fork, the link will point to the wrong place. Consider building the URL from ${{ github.server_url }} + ${{ github.repository }} so it always targets the current repo.
| - **Issue query to inspect first**: <https://github.com/github/gh-aw/issues?q=is%3Aissue%20state%3Aopen%20label%3Aagentic-workflows> | |
| - **Issue query to inspect first**: <${{ github.server_url }}/${{ github.repository }}/issues?q=is%3Aissue%20state%3Aopen%20label%3Aagentic-workflows> |
| { | ||
| "mcpServers": { | ||
| "github-agentic-workflows": { | ||
| "command": "gh", | ||
| "args": ["aw", "mcp-server"] | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The PR description doesn’t mention adding .github/mcp.json. If this file is intentionally introduced (e.g., via gh aw init / MCP host configuration), it would help to call that out in the PR summary; otherwise, consider dropping it from this change to keep scope aligned with the stated goals.
Summary
.github/workflows/aw-failure-investigator.mdagentic-workflowsissues, analyze last-6h failures withlogs,audit, andaudit-diff, and create a parent report issue plus linked fix sub-issues.github/workflows/aw-failure-investigator.lock.ymlValidation
make recompilemake agent-finish(fails due to pre-existing unrelated testifylint issues inpkg/stats/spec_test.goandpkg/testutil/spec_test.go)parallel_validation(code review comments were unrelated to this change; CodeQL scan timed out)