Skip to content

[aw-failures] Failure Analysis Report - 2026-05-04 (6h window) #30042

@github-actions

Description

@github-actions

Executive Summary

44 workflow runs analyzed over the 6-hour window (2026-05-03T19:18Z – 2026-05-04T01:18Z). 15 failures identified across 6 failure clusters. The dominant pattern is the GitHub API Consumption Report Agent failing pre-agent on every push (10 failures, no engine configured). Two genuine agentic failures require immediate action: Daily Model Inventory Checker silently crashes on startup (P0, 100% failure rate), and Smoke Gemini is blocked by firewall at 95% request block rate (P1).

Failure Clusters

Priority Workflow Failures Root Cause Status
P0 Daily Model Inventory Checker 2/2 (100%) Copilot CLI silent exit code 1 (no output) Untracked
P1 Smoke Gemini 1 95% firewall block rate (localhost:8080) Untracked
P1 Smoke Claude 1 APM bundle unpack failure (PR-specific) PR merged
P2 GitHub API Consumption Report Agent 10 Pre-flight failure, no engine configured Likely known
P2 Design Decision Gate 1 push_to_pull_request_branch status unknown Single occurrence
INFO Smoke CI 1 cancelled Superseded by newer push (normal) Expected

Cost & Scale: 7 errors, $7.02 total cost, 17.9M tokens across 44 runs, 215 action-minutes consumed.

Evidence

P0: Daily Model Inventory Checker — Silent Startup Crash

Affected runs: §25294739769 (schedule), §25294350506 (workflow_dispatch)
Engine: GitHub Copilot CLI v1.0.40, model claude-sonnet-4.6

  • All data-collection jobs succeed (collect_anthropic_models, collect_openai_models, collect_gemini_models, collect_copilot_models)
  • models.json produced: only 54 bytes (effectively empty)
  • Agent job duration: ~1 minute, then exits with code 1, zero stdout/zero stderr
  • Harness message: "no output produced — not retrying (possible causes: binary not found, permission denied, auth failure, or silent startup crash)"
  • 2 network requests to api.githubcopilot.com:443 (allowed), 0 blocked — agent never made substantive calls
  • 0 turns, 0 tool calls

Pattern: Consistent across both schedule and workflow_dispatch triggers on main. No code changes between the two failures.

P1: Smoke Gemini — Firewall Blocking MCP Bridge

Affected run: §25295890959 (pull_request on copilot/add-default-agent-harness)
Engine: Google Gemini CLI

  • 320 total network requests; 304 blocked (95% block rate)
  • localhost:8080: 288 blocked requests — Gemini harness attempting to reach a local MCP bridge/tool server
  • 172.30.0.30:10003: 15 blocked requests
  • play.googleapis.com:443: 16 allowed requests
  • Two Gemini client error JSON files captured (50 KB + 34 KB)
  • 0 agent turns; agent job ran 5.9m before failing

Pattern: Gemini engine requires localhost:8080 for MCP tooling. This endpoint is not in the firewall allowlist.

P1: Smoke Claude — APM Bundle Unpack Failure

Affected run: §25295890954 (pull_request on copilot/add-default-agent-harness, now merged as 3a4fe48)
Engine: Claude Code v2.1.126

Error: APM action failed: apm unpack failed for bundle 1 of 1 (path: /tmp/gh-aw/apm-bundles/apm-default.tar.gz, exit code: 1)
Step: "Restore APM packages (all bundles)" — agent never ran (0 turns, 0 tokens)

audit-diff vs last successful Smoke Claude run (25263690532, 2026-05-02):

  • Duration: 12m 20s → 3m 14s (agent never reached execution)
  • Tokens: 1,873,538 → 0 (100% regression)
  • All 21 MCP tools absent in failure (agent never started)

The apm-prep job succeeded; the bundle was fetched but failed to unpack in the agent job. The PR that triggered this was merged — verify whether the APM unpack issue persists on main.

P2: GitHub API Consumption Report Agent — Pre-Flight Failures (10 runs)

Pattern: Fires on every push to copilot/* and main branches. All 10 failures are instant (created_at = updated_at), no jobs ran, no engine configured, no artifacts preserved.
Sample runs: 25296428563, 25295436032, 25294865032, 25294795992, 25294764258

This workflow appears to be triggered broadly but fails before any job starts — likely missing required secrets, a misconfigured trigger condition, or intentionally disabled without removing the trigger. The high frequency (10 in 6h) generates noise and inflates the failure count.

P2: Design Decision Gate — Safe Outputs Push Unconfirmed

Affected run: §25293685460 (pull_request on copilot/mark-experiments-feature-as-experimental)
Engine: Claude Code v2.1.126, 21 turns, 899K tokens (~$0.52)

  • Agent completed reasoning and attempted 3× push_to_pull_request_branch MCP calls
  • All 3 calls returned status: unknown
  • safe_outputs job was skipped
  • Workflow concluded as failure despite agent believing it had written outputs
  • Full stdio log available: /tmp/gh-aw/aw-mcp/logs/run-25293685460/agent-stdio.log (169 KB)

Isolated occurrence; may be transient MCP connectivity issue.

Existing Issue Correlation

GitHub issue list was unavailable (local API proxy returned 403 during this run). Issue correlation is based on failure pattern analysis only. Sub-issue #aw_fail504 created for P0 Daily Model Inventory Checker.

Proposed Fix Roadmap

Priority Item Owner Signal Effort
P0 Daily Model Inventory Checker: investigate Copilot CLI silent crash Infra/harness Medium
P1 Smoke Gemini: add localhost:8080 + 172.30.0.30:10003 to firewall allowlist Platform/firewall Low
P1 Smoke Claude: verify APM unpack issue on main post-merge of 3a4fe48 APM/harness Low
P2 GitHub API Consumption Report Agent: review trigger config, remove broad push triggers or add guard Workflow owner Low
P2 Design Decision Gate: investigate transient safe_outputs MCP push failures MCP/harness Low

Sub-Issues Created

  • #aw_fail504_p0 — Daily Model Inventory Checker: Copilot CLI silent startup crash

References:

Generated by [aw] Failure Investigator (6h) · ● 594.5K ·

  • expires on May 11, 2026, 1:33 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions