Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 75 additions & 20 deletions .github/workflows/aw-failure-investigator.lock.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 17 additions & 13 deletions .github/workflows/aw-failure-investigator.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Investigates [aw] failures from the last 6 hours, correlates with open agentic-workflows issues, and opens a parent report with fix sub-issues
description: Investigates [aw] failures from the last 6 hours, correlates with open agentic-workflows issues, closes fixed issues, and opens focused fix sub-issues when needed
on:
schedule:
- cron: "every 6h"
Expand All @@ -22,10 +22,13 @@ safe-outputs:
expires: 7d
title-prefix: "[aw-failures] "
labels: [agentic-workflows, automation, cookie]
max: 8
max: 2
group: true
update-issue:
target: "*"
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update-issue is intended to close fixed/stale issues, but the frontmatter configuration does not enable status changes (it only sets target and max). In compiled workflows, allowing closure typically requires declaring status: under safe-outputs.update-issue, which propagates to allow_status: true in the lock config. Add status: (and recompile the lock file) so the workflow can actually close issues as instructed.

Suggested change
target: "*"
target: "*"
status: "*"

Copilot uses AI. Check for mistakes.
max: 10
link-sub-issue:
max: 20
max: 10
noop:
timeout-minutes: 60
imports:
Expand All @@ -49,7 +52,7 @@ Investigate agentic workflow failures from the last 6 hours and produce actionab
1. Find recent failures from agentic workflows in the last 6 hours.
2. Correlate findings with currently open `agentic-workflows` issues.
3. Perform large-scale failure analysis using logs + audit + audit-diff.
4. Create one parent report issue and linked sub-issues proposing concrete fixes.
4. Close fixed/stale issues first, then create only the minimum necessary linked fix sub-issues.

## Required Investigation Steps

Expand Down Expand Up @@ -91,16 +94,15 @@ Use `agentic-workflows` MCP `audit-diff` to compare:

Identify regressions and deltas (metrics/tooling/firewall/MCP behavior) that support fix recommendations.

### 5) Create parent report issue + sub-issues
### 5) Close fixed issues first, then add focused sub-issues

Create a **single parent report issue** with a temporary ID (format `aw_` + 3-8 alphanumeric characters) summarizing:
- observed failure clusters in last 6h
- links to analyzed run IDs
- evidence from logs/audit/audit-diff
- mapping to existing open issues (duplicate / related / new)
- prioritized fix plan
First, identify currently open `agentic-workflows` issues that are now fixed, stale, or no longer actionable based on fresh evidence, and close them using `update-issue`.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step instructs closing issues via update-issue, but the safe-output schema for this workflow uses issue_number + status: "closed" (not state). Consider explicitly stating the required payload fields (and whether you want the body updated when closing) to avoid the agent emitting an invalid update payload.

Suggested change
First, identify currently open `agentic-workflows` issues that are now fixed, stale, or no longer actionable based on fresh evidence, and close them using `update-issue`.
First, identify currently open `agentic-workflows` issues that are now fixed, stale, or no longer actionable based on fresh evidence, and close them using `update-issue`.
When closing an issue with `update-issue`, use the safe-output fields `issue_number` and `status: "closed"`. Do **not** use `state`. Do not update the issue body unless you are intentionally revising it as part of the close action.

Copilot uses AI. Check for mistakes.

Then create **sub-issues** (linked to the parent) for concrete fixes. Each sub-issue must include:
Then, if new uncovered work remains, add **sub-issues** for concrete fixes to the **most recent open parent report issue** instead of creating a new parent by default.

Only create a new parent report issue (temporary ID format `aw_` + 3-8 alphanumeric characters) when **P0 failures have no existing tracking coverage**.

Each new sub-issue must include:
- clear problem statement
- affected workflows and run IDs
- probable root cause
Expand Down Expand Up @@ -128,7 +130,9 @@ Include these sections:
## Decision Rules

- If there are **no failures** in the last 6h, or no actionable delta vs existing issues, call `noop` with a concise reason.
- If failures exist but are already fully tracked, update by creating a minimal parent report that links to existing issues and only create new sub-issues for uncovered gaps.
- If failures exist but are already fully tracked, prefer closing stale/fixed issues and avoid creating new issues.
- Only create a new parent report issue when P0 failures have no existing tracking coverage.
- Prefer closing stale/fixed issues over creating new issues when issue volume is high.
- Always be explicit about confidence and unknowns.

**Important**: If no action is needed after completing your analysis, you **MUST** call the `noop` safe-output tool with a brief explanation.
Expand Down
Loading