[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-31 #36083

2026-05-31T08:04:13Z

github-actions[bot]
Bot May 31, 2026

🤖 Copilot Agent Session Analysis — 2026-05-31

Executive Summary

Second consecutive recovery day (completion 26% → 28%), with a structural twist: for the first time in the 12-day window the dominant outcome flipped from action_required to failure. But all 24 failures were zero-duration smoke-CI runs (smoke-water/trigger/multi-caller, 8 each) bursting on one branch — CI-gate breakage, not agent failure. Decomposed by mode, the substantive agent work was clean: all 14 successes ran 3–14 min and there were zero agent-behavioral failures. Activity was the most concentrated ever seen — 47/50 (94%) on copilot/agentic-token-optimizer. Orphan escalations: 0.

⚠️ Conversation transcripts empty for the 8th consecutive day — analysis is metadata-only. Loop detection, tool-usage, and context-confusion metrics are N/A.

Key Metrics

Metric	Value	Trend (vs 05-30)
Total Sessions	50	→
Successful Completions	14 (28%)	↑ (26%)
Failures	24 (48%)	↑↑ (was 18%)
Action Required (awaiting approval)	9 (18%)	↓ (22%)
Skipped / Cancelled	2 / 1	↓
Average Duration	2.70 min	↑ (2.23)
Loop Detection / Context Issues	N/A (no transcripts)	—

📈 Trends

Completion rate (blue) continues its rebound off the 05-25 trough to 28%. The red failure line spikes to 24 (every prior day ≤9) from the smoke-CI burst, while action_required recedes.

Median stays at 0 min (33/50 finish under 30 s — smoke failures + skipped/action_required), average ticks up to 2.70 min. The 14 substantive (>5 min) sessions — highest since the 05-23/26 peaks — were all successes, cleanly separating fast CI gates from real agent work.

Success Factors ✅

Substantive sessions convert cleanly — all 14 sessions >5 min ended in success (3.15–14.05 min, avg 8.16).
Cloud-agent reliability holds — Running Copilot cloud agent succeeded at 14.05 min.
Review/quality workflows succeed — PR Code Quality Reviewer, Matt Pocock Skills Reviewer, Test Quality Sentinel, Design Decision Gate, Running Copilot Code Review all green.
Comment-addressing loop healthy — 3/3 Addressing comment on PR #36075 succeeded (6.1–13.1 min).

Failure Signals ⚠️

Smoke-CI gate burst (dominant) — 24 failures, all zero-duration, three smoke workflows on one branch. Instant + identical-triplet structure ⇒ CI config/gate breakage, not agent behavior.
Extreme branch monoculture — 94% on one branch; its CI health now dominates the whole fleet metric (fragility risk).
Lingering approval friction — 9 (18%) action_required (Q, Agentic Commands, PR Description Updater, Label Closed PRs) await first-party approval.

Prompt Quality Analysis 📝

Per-Prompt Breakdown

⚠️ No transcripts (8th day) ⇒ no interpretation-based scoring. Inferred from names/outcomes:

Higher-converting: review/quality gates and comment-addressing tasks (narrow, concrete) — ~100% success.
Lower-converting: smoke triplets (0/24, but CI breakage not prompt quality); command/approval workflows (Q, Agentic Commands) blocked on approval, not clarity.

Orphaned Branch Escalation Alerts 🚨

Orphaned Branches Today: 0 of 12 open PRs (0%)
Historical Baseline: ~40% — Status: ✅ NORMAL (well below 50% elevated threshold)

Escalation Candidate Details

✅ No orphaned branches exceed the threshold today. All 3 in-progress runs are on main (Failure Investigator, Agentic Workflow Audit Agent, Copilot Session Insights) — no PR branch has active gate firings, so zero branches meet gate-count ≥5.

7 of 12 open PRs are unassigned but idle (0 gate firings): 5× chaos/*, 1× docs/update-dictation-skill, 1× signed/jsweep/.... The 5 Copilot-assigned PRs all have an agent attached.

CI Waste Estimate: ~0 CI-minutes wasted to orphaning (no active gates on unassigned branches).

Notable Observations

Loop Detection, Tool Usage & Failure-Mode Decomposition

Loop detection: N/A (no transcripts). Proxy: 0 sessions >15 min, max 14.05 min — no runaway loops.
Tool usage: N/A (no transcripts).
Context issues: N/A; the 9 action_required are a gating signal, not confusion.

Failure type	Count	Median dur	Interpretation
CI-gate (smoke-*)	24	0 s	Pipeline/config breakage
Agent-behavioral	0	—	No reasoning failures

Experimental Analysis

Strategy: Failure-Mode Decomposition (CI-gate vs agent-behavioral). Instead of reading all 24 failure conclusions as agent failures (a misleading 48% "failure day"), this run split failures by duration + signature: 100% were zero-duration smoke-CI gate failures on one branch; agent-behavioral failures were 0.

Raw failure rate badly misrepresents agent health amid CI-gate bursts.
Duration discriminates: 0 s ⇒ CI/config; multi-minute ⇒ genuine agent execution.
94% monoculture means one branch's CI can swing the whole fleet metric.

Effectiveness: High. Recommendation: Keep — fold duration-based failure decomposition into the standard metric set.

Actionable Recommendations

For Users

Keep tasks narrow/concrete (review + comment-addressing hit ~100% today). Before: "improve the optimizer." After: "address review comment on PR Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits #36075: cut token usage in the setup-gate path."
Check execution duration before reading smoke-red as agent failure.

For System

Tag zero-duration CI failures distinctly so completion metrics reflect agent health. (High)
Cap per-branch concentration in sampling — 94% share makes the metric hostage to one branch. (Medium)
Investigate smoke- breakage* on copilot/agentic-token-optimizer (24/24 instant fails). (High)

For Tools

Restore conversation-transcript fetch — blocked 8 consecutive days; the single highest-value fix (unlocks loop/tool/context analysis). (Every run since 2026-05-24)

Historical Trends & Statistical Summary

Completion rate: 0% (05-25) → 46% → 22% → 28% → 14% → 26% → 28% — 2nd straight recovery day.
Outcome mix: first failure-dominant day; action_required at a recent low (18%).
Branch concentration: 68% (05-29) → 74% (05-30) → 94% (05-31).

Total Sessions:        50
Success:               14 (28%)
Failure:               24 (48%)  [100% zero-duration smoke-CI gates]
Action Required:        9 (18%)
Skipped / Cancelled:    2 / 1
Avg Duration:        2.70 min   Median: 0.00 min
Longest: 14.05 min (cloud agent)   Shortest: 0.00 min
Sessions >5 min:       14 (all succeeded)   >15 min: 0   <30 s: 33
Branch concentration:  47/50 (94%) copilot/agentic-token-optimizer
Orphan escalations:     0   Open PRs: 12 (7 unassigned but idle)
Loop / Context:        N/A (no transcripts, 8th day)
Agent-behavioral failures: 0

Next Steps

Investigate smoke-* CI gate breakage on copilot/agentic-token-optimizer
Restore conversation-transcript fetch (8 days blocked)
Tag CI-gate failures separately from agent outcomes
Monitor branch-concentration drift (now 94%)
Follow-up analysis 2026-06-01

References:

§26705509144 — cloud agent (success, 14.05 min)
§26706406391 — Addressing comment on PR Optimize PR Sous Chef token usage with early-exit setup gating and tighter processing limits #36075 (success)
§26706525425 — smoke-water.yml (zero-duration CI failure)

Generated 2026-05-31 · Run ID 26706959577 · Workflow: Copilot Session Insights

Generated by 📊 Copilot Session Insights · opus48 2.4M · ◷

expires on Jun 1, 2026, 8:04 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-31 #36083

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-31 #36083

Uh oh!

github-actions[bot] Bot May 31, 2026

🤖 Copilot Agent Session Analysis — 2026-05-31

Executive Summary

Key Metrics

📈 Trends

Success Factors ✅

Failure Signals ⚠️

Prompt Quality Analysis 📝

Orphaned Branch Escalation Alerts 🚨

Notable Observations

Experimental Analysis

Actionable Recommendations

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 31, 2026