Agent Performance Report — Week of April 13, 2026 #25981

2026-04-13T04:58:49Z

github-actions[bot]
bot Apr 13, 2026

Executive Summary

Workflows analyzed: ~187 compiled workflows (25 scheduled)
Outputs reviewed: 20 PRs merged Apr 12–13; 5 new issues Apr 13; ~25 scheduled runs
Quality score: 73/100 ↑3 from last week (70)
Effectiveness score: 65/100 ↑5 from last week (60)
Top performers: CLI Version Checker, Copilot Coding Agent, Issue Monster, Agentic Maintenance
Needs improvement: Smoke Claude (schedule), Smoke Gemini, Smoke Cross-Repo PR, Daily Semgrep Scan

Performance Rankings

Top Performing Agents 🏆

CLI Version Checker (Quality: 90/100, Effectiveness: 92/100)
- Consistently produces well-structured, actionable version tracking output
- Created issue #25978 with clear version bump table (Copilot 1.0.24, Claude Code 2.1.104, Codex 0.120.0, MCP Gateway v0.2.18)
- 100% scheduled run success rate
Copilot Coding Agent (Quality: 85/100, Effectiveness: 88/100)
- Exceptional output volume: 20 PRs merged in 2 days (Apr 12–13)
- Diverse, targeted fixes: OTel span events, SEC-004 sanitization, cache-memory cleanup, multiple agent assignment fix, detection squid crash fix
- Notable: Small, focused PRs with clear commit messages and conventional commits style
- Example outputs: #25972, #25971, #25968, #25960
Issue Monster (Quality: 87/100, Effectiveness: 90/100)
- 5/5 successful scheduled runs — best reliability rate this week
- Consistent structured output
- No duplication or scope creep observed
Agentic Maintenance (Quality: 83/100, Effectiveness: 82/100)
- 2/2 successful runs
- Added cleanup-cache-memory job to workflow (#25908) — self-improving behavior
- Updated maintenance docs (#25919)
- Fixed push_repo_memory gate condition (#25960)
Smoke Copilot (Quality: 82/100, Effectiveness: 80/100) 🎉
- FULLY RECOVERED — passing scheduled runs as of Apr 13
- Was at 21/30 failure rate as recently as Apr 11; now 100% success
- Recovery attributed to v1.0.21 stability improvements and --no-ask-user flag

Agents Needing Improvement 📉

Smoke Claude (Quality: 40/100, Effectiveness: 35/100)
- Environment discrepancy detected: Fails on daily scheduled runs, but passes on PR-triggered runs (run §24322418440)
- This pattern suggests an environment-specific issue (different container state, timing, or context) rather than a fundamental engine failure
- Open tracking: #25727
- Recommendation: Investigate scheduler-specific environment differences; add diagnostic logging on failure
Smoke Gemini (Quality: 10/100, Effectiveness: 10/100)
- 100% failure rate — Gemini CLI 0.37.0 compatibility issue
- Tracking: #25216
- Stale issue (opened Apr 8) — no fix progress visible
- Recommendation: Either update Gemini CLI version pin or add compatibility shim
Smoke Create/Update Cross-Repo PR (Quality: 15/100, Effectiveness: 15/100)
- Both still failing (#25221, #25217)
- Issues opened Apr 8 — no resolution in 5 days
- These block validation of multi-repo PR workflows
- Recommendation: Assign to Copilot coding agent for root cause analysis
Documentation Unbloat (Quality: 50/100, Effectiveness: 45/100)
- Improved from 0% → ~50% success this cycle (1/2 runs)
- Still inconsistent — high cost (~$55/week Claude) for uncertain output
- Recommendation: Add exit criteria check; ensure safe output tool is always called
Daily Semgrep Scan (Quality: N/A, Effectiveness: 0/100)
- New failure observed Apr 13 (0/1 runs)
- No tracking issue yet
- Recommendation: Investigate run; open tracking issue if recurring

Inactive / Stale Issues

8 open smoke test failure issues from Apr 8 still unresolved (#25371, #25372, #25374, #25395, #25415)
- Most underlying issues now resolved (Copilot, Codex, Multi PR, Container smoke all passing) → these can be closed
- Gemini and Cross-Repo still active failures

Quality Analysis

Output Quality Distribution

Score Range	Count	Workflows
Excellent (80–100)	5	CLI Version Checker, Issue Monster, Agentic Maintenance, Smoke Copilot, Schema Consistency
Good (60–79)	8	Auto-Triage, PR Triage Agent, Bot Detection, Lockfile Stats, Workflow Normalizer, Safe Output Optimizer, Observability Report, License Compliance
Fair (40–59)	4	Smoke Claude, Documentation Unbloat, Contribution Check (recovering), Agent Persona Explorer
Poor (<40)	3	Smoke Gemini, Smoke Cross-Repo PR (×2), GitHub Remote MCP Auth Test

Common Quality Issues

Schedule vs PR environment inconsistency (Smoke Claude): 1 agent
- Passing in one context, failing in another indicates environment-specific configuration gap
Stale failure tracking: 8+ open issues from a single batch (Apr 8) with no resolution
- Issue Monster and similar agents are creating good quality issues but the resolution loop is slow
Zero-output consistency (Documentation Unbloat): Agent runs but output quality varies greatly
- Need better exit-condition checks

Effectiveness Analysis

Task Completion Rates

Category	Completion Rate	Notes
Scheduled maintenance (Issue Monster, Agentic Maintenance, etc.)	~90%	High reliability
Smoke tests (core engines)	Copilot ✅ Codex ✅ Claude ⚠️	Claude schedule-specific
Smoke tests (integrations)	Gemini ❌ Cross-Repo ❌	Persistent failures
Code quality agents (normalizer, schema, license)	~95%	Excellent
Meta-orchestrators	Moderate	This workflow running

PR Activity (Apr 12–13)

20 PRs merged in 2 days by Copilot coding agent:

Security: SEC-004 subprocess output sanitization
Observability: OTel exception span events (2 PRs)
Infrastructure: Docker daemon fix, ubuntu-latest runner fix
Features: cache-memory cleanup, upload-artifact skip-archive, multi-agent assignment
Fixes: detection squid crash, push_repo_memory gating, temp ID references
Documentation: agentic maintenance docs, slash command docs

All PRs: small, focused, conventional commit format, high merge rate (100% this cycle)

Behavioral Patterns

Productive Patterns ✅

Copilot Coding Agent self-improvement loop: Agentic Maintenance identified issues → Copilot bot fixed them → merged same day. This virtuous cycle is working well.
CLI Version Checker → Version bump tracking: Creates well-structured issues that feed directly into release management.
Smoke tests as early warning: Despite failures, smoke tests are correctly surfacing real issues (Gemini CLI compat, Cross-Repo auth).

Problematic Patterns ⚠️

Stale issue accumulation: Smoke test failures from Apr 8 remain open despite underlying issues being resolved. Need a cleanup pass.
Shared alert metadata drift: The shared-alerts.md reference to #25548 DDG (Design Decision Gate) was incorrect — issue feat: collect Docker operational logs on failure for AWF diagnostics #25548 is actually a Docker diagnostic logs feature request. This indicates alert metadata can become stale and misleading.
Schedule-vs-PR environment gap (Smoke Claude): An agent failing only on scheduled runs but passing on PR runs is a diagnostic dead end without extra telemetry. The environment difference needs investigation.

Coverage Analysis

Well-Covered Areas

✅ Core engine validation (Copilot, Codex)
✅ Version/dependency tracking
✅ Code quality and schema consistency
✅ Issue and PR triage
✅ Observability and firewall monitoring
✅ License compliance

Coverage Gaps

⚠️ Gemini engine: No working validation (Smoke Gemini 100% fail)
⚠️ Cross-repo workflows: No working validation (both smoke tests failing)
⚠️ Semgrep security scanning: New failure today, uncertain reliability
⚠️ Design Decision Gate resolution: Previous alert referenced but issue appears untracked

Recommendations

High Priority

Investigate Smoke Claude schedule/PR discrepancy
- Capture additional diagnostics on scheduler-triggered failures
- Compare runner config between schedule and PR triggers
- Expected impact: Restore Claude scheduled validation (+10 quality points)
Close resolved smoke test issues from Apr 8
- Issues #25371, #25372, #25374, #25395, #25415 are stale (underlying agents now passing)
- Reduces issue noise, improves signal quality
Fix Smoke Gemini (#25216)
- Update Gemini CLI version or add compatibility layer
- Estimated effort: 1–2 hours
- Expected impact: Restore Gemini engine validation
Investigate Smoke Cross-Repo PR failures (#25221, #25217)
- Assign to Copilot coding agent for root cause analysis
- 5 days without fix suggests this needs active attention

Medium Priority

Add exit-condition guard to Documentation Unbloat
- Ensure agent always calls a safe-output tool with reasoning when no docs changes found
- Reduces wasted ~$55/week Claude cost
Reconcile shared-alerts.md DDG reference
- The "feat: collect Docker operational logs on failure for AWF diagnostics #25548 DDG (Design Decision Gate)" reference is incorrect
- Locate actual DDG tracking issue and update alerts metadata
Investigate Daily Semgrep Scan failure
- New failure today — check if one-off or recurring
- If recurring, open tracking issue

Trends

Metric	This Week	Last Week	Change
Overall quality score	73/100	70/100	↑ +3
Effectiveness score	65/100	60/100	↑ +5
Scheduled workflow success rate	~76% (19/25)	~68%	↑ +8%
PR merge rate (Copilot bot)	100% (20/20)	~95%	→ stable
Smoke test pass rate	5/8	4/8	↑ +1
Open failure issues	~18	~18	→ stable

Key trend: Copilot engine fully recovered and highly productive. Smoke multi-engine coverage still incomplete (Gemini, Cross-Repo). Coding agent output velocity at peak levels.

Actions Taken This Run

Updated agent-performance-latest.md in shared memory
Updated shared-alerts.md with corrected metadata (fixed stale DDG reference)
Generated this performance report discussion
Identified 7 recommendations across high/medium priority

Analysis period: April 11–13, 2026 | Next report: ~April 15, 2026
Engine: GitHub Copilot CLI v1.0.21 | Run: §24326084430

References:

§24326084430 — This run
§24274655501 — Previous run (Apr 11)
§24322418440 — Smoke Claude PR pass (Apr 13)

Note

🔒 Integrity filter blocked 14 items

The following items were blocked because they don't meet the GitHub integrity level.

bug: GOROOT not propagated into agent container despite --env-all and spec §8.5 #25946 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
bug: gemini API key rejected by proxy sidecar despite valid key #25944 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
bug: Copilot CLI 1.0.21 added a startup model validation step: when COPILOT_MODEL is set #25593 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Latest Copilot CLI v1.0.22 blocks safeoutputs MCP server #25550 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
Enabling workflow-wide Docker-in-Docker configuration breaks gh aw workflows #25511 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
'ready_for_review' state not supported for 'pull_request_target' #25436 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
conclusion job uses static concurrency group, causing random cancellations in batch dispatches #25420 list_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25946 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25944 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25593 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25550 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25511 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25436 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
#25420 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Agent Performance Analyzer - Meta-Orchestrator · ● 1.9M · ◷

expires on Apr 14, 2026, 4:58 AM UTC

2026-04-14T05:27:31Z

github-actions[bot]
bot Apr 14, 2026
Author

This discussion was automatically closed because it expired on 2026-04-14T04:58:49.452Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of April 13, 2026 #25981

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of April 13, 2026 #25981

Uh oh!

github-actions[bot] bot Apr 13, 2026

Executive Summary

Performance Rankings

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / Stale Issues

Quality Analysis

Effectiveness Analysis

Behavioral Patterns

Productive Patterns ✅

Problematic Patterns ⚠️

Coverage Analysis

Recommendations

High Priority

Medium Priority

Trends

Actions Taken This Run

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 14, 2026 Author

github-actions[bot]
bot Apr 13, 2026

github-actions[bot]
bot Apr 14, 2026
Author