[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-17 #32778

2026-05-17T07:42:40Z

github-actions[bot]
Bot May 17, 2026

Executive Summary

Sessions Analyzed: 50 (workflow runs sampled from the recent action queue)
Analysis Period: 2026-05-17 05:51:37Z → 06:17:40Z (~26 min wall window)
Completion Rate: 8% (4 success / 46 action_required)
Copilot Cloud Agent Runs: 4 — all success, average duration 13.5 min
Orphan Escalations: 0 (no PR branch has ≥5 simultaneous in-progress gates)
Experimental Strategy: none this run (standard analysis only)
Data Quality: infrastructure-only — no conversation transcripts available for the 9th consecutive run; reasoning-level analysis was not possible. See "Data Limitations" below.

Key Metrics

Metric	Value	Trend vs 2026-05-16
Total Sessions	50	→
Successful (`success`)	4 (8%)	→
Stuck on Approval (`action_required`)	46 (92%)	→
Failed / Abandoned	0 (0%)	→
Copilot agent runs	4	→
Copilot agent success rate	100%	→
Avg Copilot agent duration	13.5 min	↑ (was ~11.5 min)
Unique active branches	6	→
Top branch share of queue	22% (11/50)	↓ (was 34%)
Orphaned branches	0	→

📈 Session Trends Analysis

Completion Patterns

The "action_required" share has held near 92% for two consecutive runs (05-16 and 05-17), matching the dominant pattern of the last week. The 05-13 zero-success day remains the visible anomaly. Today returns to the 4-success / 8% completion-rate baseline seen on 05-10, 05-11, and 05-16.

Duration & Efficiency

Average Copilot agent duration climbed to 13.5 min — the highest in the 7-day window — with all four runs spread between 8 and 20 minutes. Every run stayed in the >5-minute "high-success" duration band that prior analysis correlated with 100% completion. No loop or retry patterns were detectable from infrastructure logs.

Success Factors ✅

Inferred from run metadata (transcripts unavailable):

Long, single-pass agent runs: All 4 successful "Running Copilot cloud agent" runs took 8–20 minutes. Per the historical-trend strategy in session-analysis-strategies, durations >5 min correlate with ~100% success.
PR-anchored agent invocations: The single "Addressing comment on PR feat: surface denied commands and fix prompt in agent failure reports #32759" run succeeded — agent runs anchored to a concrete PR comment have a higher completion rate than open-ended invocations in this dataset.
Multiple parallel agent branches: Today 4 distinct copilot/* branches each completed an agent run successfully, with no cross-branch interference visible.

Failure Signals ⚠️

Stuck-on-approval is the dominant outcome (92%): 46 of 50 runs ended action_required, not failure. The bottleneck is the approval gate, not agent quality. Same dominant signal as 05-16.
Workflow concentration on approval-required workflows: Agentic Commands (12), Q (12), and Smoke CI (7) make up 62% of all queued runs and all of them currently require manual approval.
No fast-feedback path: Smallest action_required workflow took the same approval queue as the largest — there is no per-workflow risk-tiered approval policy visible in the data.

Prompt Quality Analysis 📝

Limitation: No conversation transcripts were delivered to /tmp/gh-aw/session-data/logs/ (9th consecutive run with this gap). Direct prompt-quality scoring was not possible. The observations below are inferred from branch names and PR titles only.

Inferred prompt patterns from branch naming

copilot/investigate-safe-output-issue and copilot/investigate-safe-output-issue-again — verb-led, specific subsystem named ("safe-output"); the "-again" suffix implies a retry from a prior partial outcome. Both ran to success today.
copilot/scan-repeated-permission-denied-issues — verb + specific error class; success on follow-up comment run.
copilot/add-shared-agentic-workflow — verb + clear deliverable. Success.
copilot/grafana-otel-advisor-otlp-export-failure-improveme (truncated branch name) — long, telemetry-flavored; remained in approval-required queue for 5 of today's gate runs. Suggests a complex task surface that may benefit from being scoped down.

These are surface signals only and should not be used to score individual sessions until transcripts are available.

Orphaned Branch Escalation Alerts 🚨

Summary

Orphaned Branches Today: 0 out of 10 open PRs (0%)
Historical Baseline: ~40% (per spec)
Status: ✅ NORMAL — orphan rate well below baseline (now 6 consecutive days)

Escalation Candidates

✅ No orphaned branches exceed the escalation threshold today.

Why 0 orphans? Detection-logic context

The orphan-escalation rule (≥5 in-progress gate firings + no copilot-swe-agent assignee + >1h wait) is purpose-built for the case where CI is actively burning on a branch that has no one to drive it. Today's snapshot of status=in_progress runs found only 2 — both on main and triggered by automation (this workflow and Failure Investigator). No PR branch has any in-progress gates right now.

Separately, the 92% action_required share is a different problem: completed runs queued at the approval gate. Those are tracked by the "Failure Signals" section above, not by the orphan-escalation rule. Consider adding an "approval-bottleneck" severity tier to capture this in future runs.

The 3 chaos/* PRs (created 05:54Z, ~95 min ago) and the signed/jsweep/* PR have no assignee, but they trigger zero in-progress gates and so are below the threshold.

CI Waste Estimate

Orphaned gate-hours today: 0
Recoverable capacity: N/A — no orphaned CI capacity to recover this cycle

Notable Observations

Branch Concentration

Per-branch run breakdown

Branch	Runs	action_required	success
`copilot/scan-repeated-permission-denied-issues`	11	10	1
`copilot/add-shared-agentic-workflow`	9	8	1
`copilot/investigate-safe-output-issue`	9	8	1
`copilot/investigate-safe-output-issue-again`	9	8	1
`copilot/sergo-adopt-ispermissionerror-helper`	7	7	0
`copilot/grafana-otel-advisor-otlp-export-failure-improveme`	5	5	0

Queue is flatter than 05-16 (top branch holds only 22% today vs 34% yesterday).

Workflow Fingerprint

Most-fired workflows: Agentic Commands (12), Q (12), Smoke CI (7), CJS (4), AI Moderator (3), CGO (3), Content Moderation (3), Running Copilot cloud agent (3)
All 4 successful runs are Copilot cloud agent or "Addressing comment on PR" workflows
"Running Copilot cloud agent" success rate: 3/3 = 100%

Sweep-After-Success Pattern (recurring)

Three of the four success runs are tightly co-located in time with their action_required gate bursts on the same branch (within ~6 min). This matches the same sweep-after-success pattern recorded on 2026-05-16 — agent finishes, full gate set fires, every gate parks on approval.

Experimental Analysis

This run did NOT include an experimental strategy — random roll = 80 (threshold <30). Standard analysis only.

Actionable Recommendations

For Users Writing Task Descriptions

Anchor agent runs to a concrete PR or comment when possible — every "Running Copilot cloud agent" success today was tied to a specific PR branch. Untargeted invocations are not visible in this dataset, suggesting the productive flow is comment-anchored.
Keep duration > 5 min in mind: Tasks the agent finishes too fast are correlated with low success in historical data. If the agent finishes in under 5 minutes, treat the result as a partial answer pending verification rather than a completed task.

For System Improvements

Add an "approval-bottleneck" severity tier: The current orphan-detection rule misses today's dominant failure mode (92% of runs stuck at approval). Potential impact: High — every daily report has flagged 0 orphans but a 90%+ action_required share for the last week.
Per-workflow approval policy review: Smoke CI, CJS, and AI Moderator all enter the same approval queue as higher-risk workflows. Potential impact: Medium — risk-tiering could let trivial workflows auto-merge while keeping security-sensitive ones gated.
Reduce sweep-after-success churn: After a Copilot agent success run, a full gate set re-fires on the same branch within minutes and parks on approval. Consider a short cool-down or merging the agent run + post-agent gates into a single approval. Potential impact: Medium — would directly reduce action_required volume.

For Tool Development

Conversation transcript ingestion: For the 9th run in a row, /tmp/gh-aw/session-data/logs/ arrived empty. Frequency: every run since 2026-05-06. Without transcripts, behavioral analysis (loop detection, prompt-quality scoring, error-recovery analysis) cannot run. This is the highest-leverage tooling fix on the open list.

Trends Over Time

Completion rate: 4% → 8% → 8% → 16% → 0% → 8% → 8% across the 7-day window. Stable baseline 8% with the 05-13 zero-day as the only outlier.
Orphan rate: 0% every day for 6 consecutive runs. Baseline of 40% per spec is now clearly stale; recommend lowering it or replacing it with the approval-bottleneck metric.
Copilot agent quality: 100% success rate on the runs that surface in metadata. Average duration drifted up to 13.5 min today (was 10–12 in prior days), all within the high-success duration band.

Statistical Summary

Total Sessions Analyzed:     50
Successful Completions:       4 (8.0%)
Action-required (gate-stuck): 46 (92.0%)
Failed Sessions:              0 (0.0%)
Abandoned Sessions:           0 (0.0%)
In-progress Sessions:         0 (0.0% — only main-branch runs are in-progress)

Copilot Cloud Agent Runs:     4
Copilot Agent Success Rate:   100%
Avg Copilot Agent Duration:   13.5 min
Median Copilot Agent Duration: 13.0 min
Longest Agent Run:            20 min
Shortest Agent Run:           8 min

Unique Active Branches:       6
Top Branch Share of Queue:    22% (11/50)
Orphaned Branches:            0

Data Limitations

Conversation transcripts (/tmp/gh-aw/session-data/logs/*-conversation.txt) are not present. As a result, every analysis section depending on agent internal monologue (loop detection, prompt clarity scoring, error-recovery analysis, planning quality) is inferred from run metadata only, not measured.
Orphan-escalation logic uses status=in_progress runs from the last 6 hours. Today's only in-progress runs were on main, so no PR branch could escalate. The 92% action_required share is the real waste signal but is outside the current escalation rule.

Next Steps

Track approval queue depth as a first-class metric in the next run.
Investigate why /tmp/gh-aw/session-data/logs/ is consistently empty (9 consecutive runs).
Consider adding an "approval-bottleneck" severity tier to the escalation rule.
Re-evaluate the 40% orphan baseline — historical data has been at 0% for 6 days running.

References:

§25984654289 — this workflow run
§25982925625 — longest successful Copilot agent run today (20 min, copilot/add-shared-agentic-workflow)
§25983029890 — Copilot agent run on copilot/investigate-safe-output-issue-again

Generated by 📊 Copilot Session Insights · ● 12.6M · ◷

expires on May 18, 2026, 7:42 AM UTC

2026-05-18T07:58:46Z

github-actions[bot]
Bot May 18, 2026
Author

This discussion was automatically closed because it expired on 2026-05-18T07:42:40.675Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-17 #32778

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-session-insights] Daily Copilot Agent Session Analysis — 2026-05-17 #32778

Uh oh!

github-actions[bot] Bot May 17, 2026

Executive Summary

Key Metrics

📈 Session Trends Analysis

Completion Patterns

Duration & Efficiency

Success Factors ✅

Failure Signals ⚠️

Prompt Quality Analysis 📝

Orphaned Branch Escalation Alerts 🚨

Summary

Escalation Candidates

CI Waste Estimate

Notable Observations

Branch Concentration

Workflow Fingerprint

Sweep-After-Success Pattern (recurring)

Experimental Analysis

Actionable Recommendations

For Users Writing Task Descriptions

For System Improvements

For Tool Development

Trends Over Time

Statistical Summary

Data Limitations

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 18, 2026 Author

github-actions[bot]
Bot May 17, 2026

github-actions[bot]
Bot May 18, 2026
Author