Skip to content

Workflow Health Dashboard — 2026-04-22 #27820

@github-actions

Description

@github-actions

Overview

  • Total workflows: 197
  • Lock files present: 197/197 ✅
  • Stale lock files: 23 ⚠️ (need make recompile)
  • Today's success rate: 90% (27/30 scheduled runs)
  • Overall health score: 69/100 (↓1 from yesterday)
  • P0 issues: 0 | P1 issues: 9 | P2 issues: 6

Critical Issues 🚨

Stale Lock Files / Codex 401 (P1) — #27724 + #27731

Design Decision Gate (P1) — #27756 + #27470

  • Status: Two compounding failures: (1) max_turns=5 makes ADR generation structurally impossible, (2) push_to_pull_request_branch: Failed to apply bundle
  • Impact: Design Decision Gate workflow cannot successfully complete; ~50% run failure rate
  • Fix: Increase max_turns to ≥6; investigate bundle apply failure

Safe Outputs Session Timeout (P1) — #27755 + #23153

  • Status: Safe outputs MCP server returns "session not found" at ~37 minutes (confirmed shorter than previously thought 1h threshold)
  • Impact: All workflows running longer than ~37 min cannot deliver safe outputs
  • Affected: Long-running analysis workflows, design decision gate, lean-squad

Smoke Tests (P1) — #27030 + #27028

Warnings ⚠️

Daily Documentation Updater — Protected Files (P2) — #27801

  • Status: Workflow tried to push changes to .github/aw/ agent instruction files — blocked by protected-files policy
  • Fix: Add protected-files: fallback-to-issue to daily-doc-updater.md frontmatter

GPU Runner — node not found (P1) — #27534

  • Status: node: command not found on aw-gpu-runner-T4; Daily Issues Report recurring failure
  • Fix: Add Node.js PATH setup to GPU runner workflows

GitHub App Rate Limit (P1) — #27251

  • Status: Multiple workflows co-scheduled at 23:44 UTC; hitting GitHub App rate limits
  • Fix: Stagger schedule times across Codex+MCP workflows

Healthy Workflows ✅

174 workflows operating normally with no issues detected (based on today's run sample).

All Stale Lock Files (23 workflows)

Workflows where .md was modified after last compilation. Non-Codex workflows may still run, but with outdated config:

  • copilot-agent-analysis (claude)
  • copilot-pr-merged-report (copilot)
  • daily-astrostylelite-markdown-spellcheck (claude)
  • daily-cli-tools-tester (copilot)
  • daily-doc-updater (claude)
  • daily-mcp-concurrency-analysis (copilot)
  • daily-news, daily-regulatory, daily-semgrep-scan
  • daily-workflow-updater (copilot)
  • dependabot-go-checker, issue-monster, repo-audit-analyzer, spec-enforcer
  • developer-docs-consolidator (claude)
  • example-workflow-analyzer (claude)
  • go-logger (claude)
  • org-health-report (copilot)
  • q (copilot)
  • semantic-function-refactor (claude)
  • smoke-claude
  • typist (claude)
  • video-analyzer (copilot)

Systemic Issues

Codex openai-proxy Lock File Drift

  • Affected: All Codex-engine workflows (count: 15+ including AI Moderator, Daily Observability Report, Duplicate Code Detector, Daily Fact)
  • Root cause: PR Codex: inject openai-proxy provider in generated config when API proxy is enabled #27711 added openai-proxy config to compiler template, but make recompile was not run → old lock files route directly to api.openai.com → 401
  • Recommendation: Add CI check to block merges when lock files are stale

Safe Outputs MCP Session Expiry

  • Affected: Any workflow running longer than ~37 minutes
  • Pattern: Session not found errors on safe_outputs MCP calls after ~37 min
  • Recommendation: Investigate MCP server session TTL configuration; add session refresh logic

Recommendations

High Priority

  1. Run make recompile to fix 23 stale lock files → unblock Codex 401 failures (P1 - immediate)
  2. Fix Design Decision Gate max_turns + push bundle failure (P1)
  3. Investigate safe outputs session timeout at 37min (P1)
  4. Fix GPU runner Node.js PATH for Daily Issues Report (P1)

Medium Priority

  1. Add protected-files: fallback-to-issue to daily-doc-updater.md (P2)
  2. Stagger 23:44 UTC co-scheduled workflows (P2)
  3. Add CI lint check for stale lock files (prevent future drift)

Low Priority

  1. Review SEC-004 safe outputs conformance ([Safe Outputs Conformance] SEC-004: Multiple handlers have body fields without content sanitization #27235)
  2. Reduce Copilot reviewer fan-out ([copilot-opt] Reduce 6-reviewer fan-out per Copilot PR push to cut action_required noise by 50%+ #27130)

Trends

  • Overall health score: 69/100 (↓1 from 70 Apr 21, ↓4 from 73 Apr 20)
  • Stale lock files: 23 (↑8 from 15 yesterday; root cause: no make recompile after workflow edits)
  • Today's failure rate: 10% (↑ from ~5% on healthy days)
  • P0 issues: 0 (stable)

Actions Taken This Run

References:

Note

🔒 Integrity filter blocked 2 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #24961 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #26069 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Workflow Health Manager - Meta-Orchestrator · ● 2.7M ·

  • expires on Apr 23, 2026, 12:21 PM UTC

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions