[copilot-cli-research] Copilot CLI Deep Research - 2026-04-15 #26497

2026-04-15T21:20:40Z

github-actions[bot]
bot Apr 15, 2026

Analysis Date: 2026-04-15 | Repository: github/gh-aw | Run: §24478512391
Previous Analysis: 2026-04-14 (§24423068207)

📊 Executive Summary

Scope: 191 total workflows — 101 explicit Copilot (91 string form + 10 block form) + 22 default (no engine specified) = ~123 effective Copilot workflows

Key Finding: The repository has a rich set of Copilot CLI capabilities but most advanced features remain consistently unused across all three days of tracking. Three persistent gaps stand out: zero version pinning (0% for 3+ days), minimal autopilot adoption (2%), and 8 of 11 custom agent files unwired. Meanwhile, two positive trends emerge: playwright adoption jumped from 4% → 10% and cache-memory usage grew from 19% → 29%.

Primary Recommendation: Enable engine.version pinning for at least critical production workflows to prevent silent regressions when the Copilot CLI auto-updates.

🔴 Critical Findings (High Priority)

1. Zero Version Pinning — Persistent 3-Day Gap

Every single Copilot workflow runs with version: latest (implicit default). With 101 Copilot workflows and no version pins, any breaking Copilot CLI release affects all workflows simultaneously.

# Recommended for critical workflows:
engine:
  id: copilot
  version: "1.0.21"  # Current default version

2. Bash Wildcard Overuse — Security Surface

37 of 191 workflows (19%) use bash: ["*"] which compiles to --allow-all-tools, granting the agent unrestricted shell access. Every such workflow is more permissive than necessary.

🟡 Medium Priority Opportunities

3. `max-continuations` Autopilot at 2% (Copilot-Exclusive Feature)

Only 2 Copilot workflows use this feature (smoke-copilot with max-continuations: 2, test-quality-sentinel with max-continuations: 40). This is a Copilot-exclusive capability enabling multi-phase autonomous runs — ideal for complex daily analysis or multi-step code improvement workflows.

4. 8 of 11 Custom Agent Files Unused

.github/agents/ contains 11 specialized agent files, but only 2 are wired via engine.agent:

Status	Agent File
✅ Active	`ci-cleaner` (`hourly-ci-cleaner.md`)
✅ Active	`technical-doc-writer` (`technical-doc-writer.md`)
❌ Unused	`adr-writer`
❌ Unused	`agentic-workflows`
❌ Unused	`contribution-checker`
❌ Unused	`create-safe-output-type`
❌ Unused	`custom-engine-implementation`
❌ Unused	`grumpy-reviewer`
❌ Unused	`interactive-agent-designer`
❌ Unused	`w3c-specification-writer`
❌ Unused	`developer.instructions`

Each unused agent file represents a specialized Copilot persona that could enable better focused behavior in matching workflows.

1️⃣ Feature Inventory & Usage Matrix

Feature	Available	Used	Usage Rate	vs Previous
`engine.version`	✅	0/101	0%	= (persistent)
`engine.model`	✅	~9/191	5%	↓ from 7%
`engine.agent`	✅	1/101	1%	↓ from 3%
`engine.api-target`	✅	0/191	0%	= (persistent)
`engine.args`	✅	0/191	0%	=
`engine.env`	✅	few	~2%	-
`engine.token-weights`	✅	0/191	0%	=
`engine.bare`	✅	2/191	1%	=
`max-continuations`	✅ (Copilot only)	2/101	2%	= (persistent)
`sandbox.agent: awf`	✅	14/191	7%	↓ from 14%
`bash: ["*"]` wildcard	✅	37/191	19%	new metric
`web-fetch`	✅	20/191	10%	↑ from 8%
`playwright`	✅	20/191	10%	↑↑↑ from 4%
`cache-memory`	✅	55/191	29%	↑ from 19%
`safe-outputs`	✅	161/191	84%	=
`features.copilot-requests`	✅	46/191	24%	-
`features.mcp-gateway`	✅	0/191	0%	=
`features.copilot-integration-id`	✅	0/191	0%	=
`tools.timeout`	✅	5/191	3%	new metric
`strict: true`	✅	111/191	58%	new metric
Custom MCP via `shared/mcp/`	✅	28/191	15%	new metric

2️⃣ Detailed Missed Opportunities

🔴 High Priority Opportunities

Opportunity 1: Engine Version Pinning

What: No workflow pins engine.version for Copilot CLI
Why It Matters: Silent breakage when Copilot CLI auto-updates; no reproducibility for debug
Affected Workflows: All 101 Copilot workflows

How to Implement:

engine:
  id: copilot
  version: "1.0.21"  # Current stable version

Recommendation: At minimum pin smoke test workflows and high-traffic production workflows

Opportunity 2: Bash Wildcard Security Surface

What: 37 workflows use bash: ["*"] → compiles to --allow-all-tools (unrestricted shell)
Why It Matters: Violates principle of least privilege; any prompt injection gets full shell
Where: craft.md, smoke-copilot.md, smoke-copilot-arm.md, and 34 others

How to Implement: Replace "*" with specific commands:

tools:
  bash:
    - "git"        # compiles to shell(git:*)
    - "gh"         # compiles to shell(gh:*)
    - "jq *"       # specific command pattern
    - "cat *"      # specific command pattern

🟡 Medium Priority Opportunities

Opportunity 3: Autopilot (`max-continuations`) for Multi-Phase Workflows

What: Only 2 of 101 Copilot workflows use the Copilot-exclusive autopilot mode
Why It Matters: Complex daily/weekly workflows could run more phases autonomously
Candidate Workflows:
- daily-repo-chronicle.md — multi-phase analysis workflow
- weekly-blog-post-writer.md — multi-step creative workflow
- repository-quality-improver.md — iterative improvement workflow

How to Implement:

engine:
  id: copilot
  max-continuations: 3   # up to 3 consecutive autopilot runs
timeout-minutes: 60       # increase budget accordingly

Opportunity 4: Wire Up Custom Agent Files

What: 8 of 11 .github/agents/ files never referenced via engine.agent
Why It Matters: These files define specialized agent personas and behaviors that could improve output quality

Specific Recommendations:

Agent File	Best Matching Workflow(s)
`contribution-checker.agent.md`	`contribution-check.md`
`adr-writer.agent.md`	Any architecture workflow
`grumpy-reviewer.agent.md`	`pr-nitpick-reviewer.md`
`w3c-specification-writer.agent.md`	`spec-extractor.md`, `spec-librarian.md`
`agentic-workflows.agent.md`	`craft.md`, `workflow-generator.md`

How to Implement:

engine:
  id: copilot
  agent: contribution-checker   # references .github/agents/contribution-checker.agent.md

Opportunity 5: Model Selection for Cost Optimization

What: Only ~9 of 191 workflows (5%) explicitly select a model; all others use org default
Why It Matters: Simple read-only workflows (e.g., daily summaries, health checks) could use cheaper/faster models; complex coding tasks could use more capable ones
Current Model Usage: gpt-5.1-codex-mini (×5), gpt-5 (×1), gpt-4.1-mini (×1), claude-haiku-4-5 (×1)
Recommendation:
- Light analysis workflows → gpt-4.1-mini or gpt-5.1-codex-mini
- Code generation, refactoring → default or gpt-5
```
engine:
  id: copilot
  model: gpt-4.1-mini   # faster/cheaper for read-only analysis
```

Opportunity 6: Sandbox (AWF) Adoption for Internet-Facing Workflows

What: Only 14/191 (7%) workflows enable network firewall via sandbox.agent: awf
Why It Matters: Workflows fetching external data (news, APIs, web search) run without network restrictions by default
Candidate Workflows: Any workflow with web-fetch, playwright, external API MCP servers (Tavily, Brave, DeepWiki)

How to Implement:

sandbox:
  agent: awf
network:
  allowed:
    - defaults
    - github
    - tavily.com   # only what's needed

🟢 Low Priority Opportunities

Opportunity 7: `token-weights` for Accurate Cost Tracking (0% adoption)

What: No workflow customizes token cost multipliers
Why It Matters: When using non-default models, effective token calculations may be inaccurate

How to Implement:

engine:
  id: copilot
  token-weights:
    multipliers:
      gpt-4.1-mini: 0.3   # cheaper model, lower weight

Opportunity 8: `tools.timeout` for Long-Running Tool Calls (3% adoption)

What: Only 5 workflows set per-tool-call timeout; most workflows have no per-call limit
Why It Matters: A single hung bash command can exhaust the entire timeout-minutes budget
Candidate Workflows: Any workflow running make build, make test, or complex scripts

How to Implement:

tools:
  timeout: 300   # 5 minutes max per tool call

Opportunity 9: `features.mcp-gateway` (0% adoption)

What: MCP gateway feature flag completely unused despite being available
Why It Matters: Provides a centralized proxy for MCP traffic with auth enforcement
How to Implement:
```
features:
  mcp-gateway: true
```

Opportunity 10: `bare: true` for Narrow-Scope Workflows

What: Only 2 workflows disable automatic custom instruction loading (--no-custom-instructions)
Why It Matters: Workflows with narrow, well-defined tasks don't need AGENTS.md context (reduces prompt tokens, prevents behavior drift)
Candidate Workflows: firewall.md, smoke-* tests, validation workflows
How to Implement:
```
engine:
  id: copilot
  bare: true
```

3️⃣ Workflow-Specific Recommendations

View Workflow Recommendations

`contribution-check.md` — Wire Up Matching Agent File

Current State: Uses engine: copilot with GitHub MCP
Recommended Change: Add engine.agent: contribution-checker
Benefit: The contribution-checker.agent.md file in .github/agents/ is purpose-built for this workflow

`craft.md` — Tighten Tool Permissions

Current State: Uses bash: ["*"] (unrestricted shell)
Recommended Change: Replace with specific commands like git, gh, cat, find
Benefit: Reduce security surface while maintaining functionality

`daily-repo-chronicle.md` — Enable Autopilot

Current State: Single-run workflow that does complex multi-phase analysis
Recommended Change: Add max-continuations: 2-3 with timeout-minutes: 90
Benefit: Can complete more thorough analysis with continuation budget

`research.md` — Add Version Pin

Current State: Uses engine: copilot with no version pinning
Recommended Change: Pin version for reproducible research results

`daily-*` analysis workflows — Model Selection

Current State: All use default org model
Recommended Change: Use model: gpt-4.1-mini for simple read + report workflows
Benefit: Faster execution and lower token costs for routine daily runs

4️⃣ Trends & Historical Insights

View 3-Day Trend Analysis

Metric	2026-04-13	2026-04-14	2026-04-15	Trend
Total workflows	190	191	191	stable
Copilot workflows	~92	~106	101	fluctuating (counting method)
engine.version pinning	0%	0%	0%	❌ Persistent gap
max-continuations	2%	2%	2%	❌ Persistent gap
engine.agent used	4%	3%	1%	❌ Declining
web-fetch	17%	8%	10%	📈 Recovering
playwright	12%	4%	10%	📈 Recovering strongly
cache-memory	41%	19%	29%	📈 Growing
safe-outputs	81%	84%	84%	✅ Stable high
AWF sandbox	14%	14%	7%	📉 Declining
copilot-requests	49%	49%	24%	📉 Declining (recounting)

Observation: The fluctuations in some metrics suggest different counting methodologies across runs (total Copilot pool vs explicit-only). The persistent gaps (version pinning, autopilot, custom agents) are the most reliable signals requiring action.

Notable positive trend: Playwright adoption recovery (+6pp day-over-day) suggests new visual testing workflows are being added.

5️⃣ Best Practice Guidelines

Based on 3 days of analysis, here are recommended best practices for Copilot workflows:

Always specify a model for production workflows — Don't leave model selection to org defaults; choose based on complexity
Pin engine.version for smoke tests — Catch Copilot CLI regressions before they affect production workflows
Prefer specific bash commands over wildcards — bash: ["git", "gh"] instead of bash: ["*"]
Use max-continuations for complex multi-phase workflows — This is a Copilot-exclusive feature that enables autonomous multi-run pipelines
Wire up agent files — Match .github/agents/*.agent.md files to their corresponding workflows
Enable strict: true broadly — Already at 58% adoption, should be standard for all non-trivial workflows
Add tools.timeout — Prevent single hung commands from consuming the entire job budget

6️⃣ Action Items

Immediate (this week):

Add engine.version: "1.0.21" to at least the 3 smoke test workflows
Wire contribution-checker.agent.md to contribution-check.md
Replace bash: ["*"] in craft.md with specific command list

Short-term (this month):

Audit all 37 workflows using bash wildcard and tighten permissions
Add max-continuations to 2-3 complex daily workflows
Set tools.timeout: 300 on workflows that run builds/tests

Long-term (this quarter):

Establish org-wide model selection policy (light tasks → smaller models)
Enable AWF sandbox for all web-fetch/playwright workflows
Create a shared import for standard Copilot configuration

View Methodology & References

Research Methodology

Source files: All *.md files in .github/workflows/ (191 total)
Engine files examined: pkg/workflow/copilot_engine.go, copilot_engine_execution.go, copilot_engine_tools.go, copilot_mcp.go
Feature constants: pkg/constants/engine_constants.go, pkg/constants/feature_constants.go
Documentation: docs/src/content/docs/reference/engines.md
Repo memory: Previous 2 days of analysis loaded for trend comparison
Analysis approach: grep-based feature counting, frontmatter pattern analysis, codebase review

References

Engine Documentation: docs/src/content/docs/reference/engines.md
Copilot Engine Go Source: pkg/workflow/copilot_engine*.go
Workflow Configuration Reference: .github/aw/github-agentic-workflows.md

References:

§24478512391 — This run
§24423068207 — Previous run (2026-04-14)
§24367037044 — First run (2026-04-13)

Generated by Copilot CLI Deep Research Agent · ● 4.9M · ◷

expires on Apr 16, 2026, 9:20 PM UTC

2026-04-16T21:15:02Z

github-actions[bot]
bot Apr 16, 2026
Author

This discussion has been marked as outdated by Copilot CLI Deep Research Agent.

A newer discussion is available at Discussion #26727.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-04-15 #26497

Uh oh!

{{title}}

Uh oh!

Opportunity 1: Engine Version Pinning

Opportunity 2: Bash Wildcard Security Surface

Opportunity 3: Autopilot (`max-continuations`) for Multi-Phase Workflows

Opportunity 4: Wire Up Custom Agent Files

Opportunity 5: Model Selection for Cost Optimization

Opportunity 6: Sandbox (AWF) Adoption for Internet-Facing Workflows

Opportunity 7: `token-weights` for Accurate Cost Tracking (0% adoption)

Opportunity 8: `tools.timeout` for Long-Running Tool Calls (3% adoption)

Opportunity 9: `features.mcp-gateway` (0% adoption)

Opportunity 10: `bare: true` for Narrow-Scope Workflows

`contribution-check.md` — Wire Up Matching Agent File

`craft.md` — Tighten Tool Permissions

`daily-repo-chronicle.md` — Enable Autopilot

`research.md` — Add Version Pin

`daily-*` analysis workflows — Model Selection

Research Methodology

References

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-04-15 #26497

Uh oh!

github-actions[bot] bot Apr 15, 2026

📊 Executive Summary

🔴 Critical Findings (High Priority)

1. Zero Version Pinning — Persistent 3-Day Gap

2. Bash Wildcard Overuse — Security Surface

🟡 Medium Priority Opportunities

3. max-continuations Autopilot at 2% (Copilot-Exclusive Feature)

4. 8 of 11 Custom Agent Files Unused

1️⃣ Feature Inventory & Usage Matrix

2️⃣ Detailed Missed Opportunities

Opportunity 1: Engine Version Pinning

Opportunity 2: Bash Wildcard Security Surface

Opportunity 3: Autopilot (max-continuations) for Multi-Phase Workflows

Opportunity 4: Wire Up Custom Agent Files

Opportunity 5: Model Selection for Cost Optimization

Opportunity 6: Sandbox (AWF) Adoption for Internet-Facing Workflows

Opportunity 7: token-weights for Accurate Cost Tracking (0% adoption)

Opportunity 8: tools.timeout for Long-Running Tool Calls (3% adoption)

Opportunity 9: features.mcp-gateway (0% adoption)

Opportunity 10: bare: true for Narrow-Scope Workflows

3️⃣ Workflow-Specific Recommendations

contribution-check.md — Wire Up Matching Agent File

craft.md — Tighten Tool Permissions

daily-repo-chronicle.md — Enable Autopilot

research.md — Add Version Pin

daily-* analysis workflows — Model Selection

4️⃣ Trends & Historical Insights

5️⃣ Best Practice Guidelines

6️⃣ Action Items

Research Methodology

References

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

github-actions[bot]
bot Apr 15, 2026

3. `max-continuations` Autopilot at 2% (Copilot-Exclusive Feature)

Opportunity 3: Autopilot (`max-continuations`) for Multi-Phase Workflows

Opportunity 7: `token-weights` for Accurate Cost Tracking (0% adoption)

Opportunity 8: `tools.timeout` for Long-Running Tool Calls (3% adoption)

Opportunity 9: `features.mcp-gateway` (0% adoption)

Opportunity 10: `bare: true` for Narrow-Scope Workflows

`contribution-check.md` — Wire Up Matching Agent File

`craft.md` — Tighten Tool Permissions

`daily-repo-chronicle.md` — Enable Autopilot

`research.md` — Add Version Pin

`daily-*` analysis workflows — Model Selection

github-actions[bot]
bot Apr 16, 2026
Author