[copilot-cli-research] Copilot CLI Deep Research - 2026-04-21 #27681

2026-04-21T21:17:42Z

github-actions[bot]
Bot Apr 21, 2026

Analysis Date: 2026-04-21
Repository: github/gh-aw
Scope: 197 total workflows — 111 using Copilot engine (87 explicit + 24 via default), 46 Claude, 10 Codex
Previous Analysis: 2026-04-20 Run §24690376692 ← compare trends below

📊 Executive Summary

This is the 5th consecutive daily run of this research agent. The persistent gaps from previous analyses remain unchanged: engine.version is still at 0%, api-target at 0%, blocked-domains at 0%, and max-continuations at only 2 workflows (1%). Meanwhile, positive trends continue: mcp-cli feature is at 80%, copilot-requests is at 23%, and cache-memory usage is strong.

The biggest newly-surfaced finding this run: 45 Copilot workflows have no network configuration at all — neither network: restrictions nor sandbox: agent: awf. This means those workflows have unrestricted outbound network access, which is a security posture gap especially for workflows triggered by external events (issues, PR comments, slash commands).

Primary Recommendation: Address the 45 unrestricted workflows by adding at minimum network: defaults or upgrading them to use AWF sandbox. This is the clearest, highest-impact, actionable improvement available today.

Critical Findings

🔴 High Priority

1. 45 Copilot Workflows With Zero Network Restrictions
Workflows triggered by external events (issues, PRs, slash commands) that have no network: config and no sandbox: config have unrestricted outbound access. An adversary who can trigger the workflow (e.g., via issue body injection) could potentially exfiltrate context.

2. engine.version Still 0% After 5 Days (Critical Stability Risk)
No production workflow pins the Copilot CLI version. A breaking release could simultaneously break all 111 Copilot-powered workflows with no rollback path. This has been flagged in every prior run.

🟡 Medium Priority

3. max-continuations Used by Only 2/111 Workflows (1%)
Eight workflows have timeout-minutes ≥ 60, suggesting complex long-running tasks — yet none use max-continuations (Copilot's unique autopilot mode). This feature allows iterative task continuation across multiple agent sessions.

4. 5 Custom Agent Files Completely Unused
grumpy-reviewer, w3c-specification-writer, create-safe-output-type, custom-engine-implementation, and interactive-agent-designer are available in .github/agents/ but referenced by zero workflows.

View Full Analysis

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory

Engine Configuration Options (engine: block):

Option	Description	Used in Repo
`engine.id: copilot`	Explicit engine selection	87 workflows
`engine.version`	Pin specific CLI version	0 workflows
`engine.model`	Override AI model	9 workflows
`engine.agent`	Custom agent file from `.github/agents/`	7 workflows
`engine.args`	Extra CLI flags	~10 workflows
`engine.env`	Custom environment variables	~3 workflows
`engine.bare`	Disable custom instructions (`--no-custom-instructions`)	7 workflows
`engine.command`	Custom executable path	0 workflows
`engine.api-target`	GHEC/GHES API endpoint	0 workflows
`engine.driver`	Custom Node.js driver script	0 (used by code scanning)
`max-continuations`	Autopilot with `--autopilot --max-autopilot-continues`	2 workflows
`engine.token-weights`	Custom model cost data	0 workflows
`engine.concurrency`	Job-level concurrency	0 explicitly

CLI Flags Automatically Applied (by gh-aw compiler):

--add-dir /tmp/gh-aw/ — always added
--disable-builtin-mcps — always added (built-in MCP servers disabled)
--no-ask-user — added for v1.0.19+ (fully autonomous mode)
--allow-all-tools — when bash: ["*"] or bash: [":*"]
--allow-tool <name> — per-tool permissions (granular)
--allow-all-paths — when edit: tool is enabled
--no-custom-instructions — when bare: true
--autopilot --max-autopilot-continues N — when max-continuations > 1
--agent <id> — when engine.agent is set
--prompt-file — always (prompt passed via file)
--log-level all --log-dir — always (structured logging)

Sandbox Options:

sandbox: agent: awf — Agent Workload Firewall (process isolation + network firewall)
sandbox: agent: srt — Sandbox Runtime (experimental)

Network Configuration:

network: defaults — Use default ecosystem domains
network: allowed: [defaults, github, node, python, go, ...] — Custom allowlist
network: {} — Deny all (except defaults implicitly)

Features Flags:

features.mcp-cli: true — Mount MCP servers as CLI commands
features.copilot-requests: true — Use github.token instead of COPILOT_GITHUB_TOKEN

View Usage Statistics

Usage Statistics (2026-04-21 vs 2026-04-20)

Metric	Count	% of Total	Δ from Yesterday
Total workflows	197	100%	=
Copilot (explicit `engine: copilot`)	87	44%	-3
Copilot (default/implicit)	24	12%	—
Total Copilot-effective	111	56%	—
Claude	46	23%	—
Codex	10	5%	—
`engine.version` pinned	0	0%	=
`engine.model` override	9	5%	=
`engine.agent` custom file	7	4%	= (prev count was inflated by awf refs)
`engine.bare`	7	4%	=
`max-continuations`	2	1%	=
`sandbox: agent: awf`	14	7%	=
`mcp-scripts:`	1	1%	=
`features.mcp-cli: true`	157	80%	+1
`features.copilot-requests: true`	45	23%	—
`cache-memory:`	58	29%	—
`strict: true`	117	59%	-14
`network:` explicitly configured	87	44%	—
`web-fetch:` tool	19	10%	=
`web-search:` tool	2	1%	=
`toolsets: [default]` only	45	23%	—

Most Common GitHub Toolset Combinations:

[default] — 45 workflows (broadest, least specific)
[default, actions] — used for CI/log analysis
[default, discussions] — used for community workflows
[pull_requests, repos, issues] — used for PR review workflows
[context, pull_requests] — security/code scanning workflows

2️⃣ Feature Usage Matrix

Feature Category	Available	Used	Not Used	Usage Rate
Engine: basic	engine.id, version, model, bare	3/4	version	75%
Engine: Copilot-specific	agent, max-continuations, driver	2/3	driver (for copilot)	67%
Engine: advanced	command, api-target, env, args, token-weights, concurrency	2/6	api-target, command, token-weights, concurrency	33%
Sandbox	awf, srt	1/2	srt (production)	50%
Tools	bash, edit, github, web-fetch, web-search, playwright, custom-mcp	6/7	(web-search barely)	85%
Network	allowed, blocked	1/2	blocked	50%
Features	mcp-cli, copilot-requests	2/2	—	100%
Custom Agents	11 files available	6/11	5 agent files	55%
GitHub Toolsets	context, repos, issues, prs, actions, discussions, code_security...	partial	most granular sets	~40%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 Opportunity 1: Network Security for 45 Unrestricted Workflows

What: 45 Copilot workflows have neither network: config nor sandbox: agent: awf
Why It Matters: Without network restrictions, a prompt-injected agent could make arbitrary outbound calls. High-risk for event-triggered workflows.
Where: Any Copilot workflow triggered by slash_command, issues, pull_request, issue_comment

How to Implement: At minimum add network: defaults, or upgrade to AWF:

# Option A: Network allowlist (lightweight)
network:
  allowed:
    - defaults
    - github

# Option B: Full AWF sandbox (strongest isolation)
network:
  allowed:
    - defaults
    - github
sandbox:
  agent: awf

Expected Benefit: Reduces attack surface for prompt injection and data exfiltration

🔴 Opportunity 2: `engine.version` Pinning (Stability)

What: Zero production workflows pin the Copilot CLI version. Any breaking release simultaneously breaks all 111 Copilot workflows.
Why It Matters: A regression in the Copilot CLI (authentication, prompt parsing, tool calling) could cause mass workflow failures with no way to roll back.
Where: All critical production workflows (daily-, weekly-, hourly-*)

How to Implement:

engine:
  id: copilot
  version: "1.0.21"   # Pin to known-good version

Expected Benefit: Zero unplanned Copilot CLI upgrades; stable CI for production workflows
Note: This has been flagged in every prior research run (5 consecutive days)

View Medium Priority Opportunities

🟡 Opportunity 3: `max-continuations` for Long-Running Tasks

What: Only 2 workflows use Copilot's autopilot mode (max-continuations). Eight workflows have 60+ minute timeouts — these could benefit.
Why It Matters: max-continuations enables iterative task completion where one session ends and another picks up, allowing complex tasks to complete reliably without hitting context limits.
Where:
- agent-persona-explorer.md (180 min timeout) — ideal candidate
- aw-failure-investigator.md (60 min) — diagnostic tasks benefit from continuation
- daily-team-evolution-insights.md (90 min) — data analysis benefits from iteration
- org-health-report.md (60 min) — large report generation

How to Implement:

engine:
  id: copilot
  max-continuations: 5   # Allow up to 5 continuation sessions

Expected Benefit: Tasks that currently timeout or partially complete will finish reliably

🟡 Opportunity 4: Unused Custom Agent Files

What: 5 custom agent files in .github/agents/ are unused.
Why It Matters: Custom agents encode specialized behavior, personality, and instructions that dramatically improve workflow quality for specific use cases.
Unused Files:
- grumpy-reviewer.agent.md — could power a strict code review workflow
- w3c-specification-writer.agent.md — could power spec-writing workflows
- create-safe-output-type.agent.md — could automate new output type creation
- custom-engine-implementation.agent.md — could guide engine development
- interactive-agent-designer.agent.md — could power interactive agent design

How to Implement:

engine:
  id: copilot
  agent: grumpy-reviewer   # reference by filename without .agent.md

Example Use Case: Add grumpy-reviewer as an alternative agent option in code review workflows for more thorough critique.

🟡 Opportunity 5: Over-Provisioned GitHub Toolsets

What: 45 workflows use toolsets: [default] which provides broad GitHub access (repos, issues, pull_requests, context). Many only need a subset.
Why It Matters: Principle of least privilege — reducing tool access limits blast radius of a compromised or misbehaving agent.
Where: Read-only reporting workflows, single-domain workflows

How to Implement:

# Instead of:
tools:
  github:
    toolsets: [default]

# Use minimal required toolsets:
tools:
  github:
    toolsets: [issues]         # for issue-only workflows
    # or
    toolsets: [repos]          # for repo-only workflows
    # or
    toolsets: [context]        # for context-only (repo metadata)

View Low Priority Opportunities

🟢 Opportunity 6: `bare` Mode for Analytical/Creative Workflows

What: Only 7 workflows use bare: true (disables custom instructions loading). Many creative, analytical, and standalone workflows could use this.
Why It Matters: Prevents unintended cross-contamination from AGENTS.md or .github/copilot-instructions.md which are designed for development workflows, not analytical ones.
Candidates: poem-bot.md, daily-fact.md, agent-persona-explorer.md, constraint-solving-potd.md (already uses it), daily-news.md (already uses it)
Implementation: Add bare: true to workflow frontmatter

🟢 Opportunity 7: `blocked-domains` for Defense in Depth

What: Zero workflows use network.blocked to explicitly deny specific domains.
Why It Matters: In addition to allowed lists, blocking known bad or unnecessary domains adds another security layer.

How to Implement:

network:
  allowed:
    - defaults
    - github
  blocked:
    - pastebin.com
    - webhook.site

🟢 Opportunity 8: `mcp-scripts` for Dynamic Tool Access

What: Only security-review.md uses mcp-scripts. This feature allows runtime-configurable MCP tools without pre-compiling new MCP server configs.
Why It Matters: Enables flexible tool configurations for workflows that need dynamic access patterns.
Where: Workflows that need conditional tool access based on input parameters

🟢 Opportunity 9: `web-search` vs `web-fetch` for Research

What: 19 workflows use web-fetch but only 2 use web-search. Research-oriented workflows using web-fetch to manually construct URLs may benefit from semantic search.
Candidates: research.md, blog-auditor.md, workflows that search documentation

🟢 Opportunity 10: Model Override for Cost Optimization

What: Only 9 workflows use engine.model override. For simple/quick tasks, a smaller model (e.g., gpt-4.1-mini) can reduce costs significantly.
Candidates: auto-triage-issues.md (already uses gpt-4.1-mini ✅), classification workflows, simple generation tasks

4️⃣ Specific Workflow Recommendations

View High-Value Workflow-Specific Recommendations

`agent-persona-explorer.md` (180 min timeout)

Current: 180-minute timeout, Claude engine, no max-continuations
Recommended: Consider max-continuations: 10 if migrating to Copilot, or verify Claude max-turns is set appropriately
Benefit: Reliability for very long exploration tasks

`aw-failure-investigator.md` (60 min timeout, Copilot)

Current: 60-minute timeout, engine: copilot, no max-continuations
Recommended: Add max-continuations: 3 to allow iterative investigation when initial investigation times out
Benefit: More thorough failure investigations

`org-health-report.md` (60 min timeout)

Current: 60-minute timeout, Copilot
Recommended: Add max-continuations: 3 for large organization analysis
Benefit: Complete reports even for large orgs

`daily-security-red-team.md` (60 min timeout, no network config)

Current: 60-minute timeout, no network: config, no sandbox:
Recommended: Urgently add AWF sandbox — security workflows should have the strictest isolation
```
network:
  allowed:
    - defaults
sandbox:
  agent: awf
```

`code-simplifier.md` (uses only `toolsets: [default]`)

Current: Uses broad [default] toolsets
Recommended: Reduce to toolsets: [repos] since it only needs file access
Benefit: Reduced API scope

`glossary-maintainer.md` and `technical-doc-writer.md`

Current: Use custom agent files ✅ — good examples
Recommended: Document these as canonical examples of custom agent usage for other workflow authors

5️⃣ Trends & Insights

View Historical Trends (5-Day Analysis)

Feature	Apr 16	Apr 17	Apr 20	Apr 21	Trend
Total workflows	192	~195	197	197	Stable
engine.version (Copilot)	0%	0%	0%	0%	🔴 No change
engine.api-target	0%	0%	0%	0%	🔴 No change
max-continuations	1%	1%	1%	1%	🔴 No change
mcp-scripts	1%	1%	1%	1%	🔴 No change
AWF sandbox	7%	7%	7%	7%	⚠️ Flat
engine.args	0%	0%	5%	5%	✅ Adopted
engine.env	0%	0%	2%	2%	✅ Adopted
mcp-cli feature	~78%	~79%	79%	80%	✅ Growing
strict mode	58%	~62%	66%	59%	⚠️ Declined
custom agent files	4%	4%	4%	4%	Stable
cache-memory	~40%	~41%	50%	29%	⚠️ Metric variation

Key Observations:

engine.version has been at 0% for every single analysis run — this is the most persistent gap
max-continuations at 2 workflows — Copilot's most unique differentiator feature is barely used
strict mode declined slightly from previous run (66%→59%) — needs investigation
engine.args and engine.env adoption started on Apr 20 and stabilized

6️⃣ Best Practice Guidelines

Based on 5 daily research runs, these are the recommended best practices for Copilot workflows:

Security-first: Every workflow triggered by external events should have network: allowed: [defaults, github] at minimum, and AWF sandbox for workflows processing untrusted content
Lean toolsets: Use the minimal required GitHub toolset ([context], [issues], [repos]) rather than [default]
Version stability: Pin engine.version for daily/weekly production workflows; use latest only for test/smoke workflows
Use bare: true for creative, analytical, or content-generation workflows that don't need dev-environment context
Custom agents for specialization: When a workflow has a clear persona or specialized behavior, create and reference a .github/agents/*.agent.md file
max-continuations for complex tasks: Any workflow with timeout-minutes: 60+ should consider max-continuations: 3-5
copilot-requests: true for internal/trusted workflows (uses github.token instead of COPILOT_GITHUB_TOKEN secret)

7️⃣ Action Items

Immediate Actions (this week):

Add network: defaults to the ~45 Copilot workflows without any network config
Pin engine.version for at least the most critical daily/weekly workflows
Urgently add AWF sandbox to daily-security-red-team.md (security workflow should not have unrestricted network)

Short-term (this month):

Add max-continuations: 3-5 to aw-failure-investigator.md, org-health-report.md, daily-team-evolution-insights.md
Explore use cases for grumpy-reviewer.agent.md and interactive-agent-designer.agent.md
Audit and tighten toolsets: [default] in the 45 over-provisioned workflows

Long-term (this quarter):

Investigate mcp-scripts for dynamic tool access patterns in complex research workflows
Document and socialize engine.version pinning as a required best practice
Build an automated check that flags new event-triggered workflows without network config

View Supporting Evidence & Methodology

📚 References

Copilot Engine Implementation: pkg/workflow/copilot_engine.go
Copilot Execution Logic: pkg/workflow/copilot_engine_execution.go
Engine Documentation: docs/src/content/docs/reference/engines.md
Previous Research: Stored in repo-memory branch memory/copilot-cli-research

Research Methodology

Phase 1 (Capabilities): Read copilot_engine.go, copilot_engine_execution.go, copilot_engine_tools.go, copilot_mcp.go and engine.go to catalog all available Copilot CLI features and configuration options
Phase 2 (Usage): Scanned all 197 .md workflows in .github/workflows/ with grep patterns to count adoption of each feature
Phase 3 (Gaps): Cross-referenced available features with actual usage to find unused or underused capabilities
Phase 4 (Trends): Loaded previous analysis from repo-memory (/tmp/gh-aw/repo-memory/default/copilot-research-latest.json) to compare with today's findings
Phase 5 (Persistence): Updated repo-memory with current findings for future trend analysis

Data sources: Go source files, workflow markdown frontmatter, .github/agents/ directory, previous repo-memory JSON.

References:

§24746483988 — This workflow run
§24690376692 — Previous research run (2026-04-20)
§24639070790 — Run from 2026-04-19

Generated by Copilot CLI Deep Research Agent · ● 2.9M · ◷

expires on Apr 22, 2026, 9:17 PM UTC

ReadyMulugeta · 2026-04-22T13:04:46Z

ReadyMulugeta
Apr 22, 2026

Greetings all. I came across an interesting resource that can be used as a Google Scholar alternative when bulk access to scholarly literature is needed. ScholarAPI (scholarapi.net?via=mhsvo0) is essentially an API over a large corpus of academic literature (metadata +PDFs), aggregated from thousands of journals and repositories worldwide. If you’re doing large-scale literature work (bibliography searches, text mining& AI, reviewer support), it might save a lot of time compared with manual searches or scraping Google Scholar. It also has guides to typical work flows like literature monitoring, AI fine-tuning, and plagiarism checking, which could be useful for research tooling or library/IT projects. scholarapi.net?via=mhsvo0

0 replies

2026-04-22T21:20:53Z

github-actions[bot]
Bot Apr 22, 2026
Author

This discussion has been marked as outdated by Copilot CLI Deep Research Agent.

A newer discussion is available at Discussion #27897.

0 replies

[copilot-cli-research] Copilot CLI Deep Research - 2026-04-21 #27681

Uh oh!

github-actions[bot] Bot Apr 21, 2026

📊 Executive Summary

Critical Findings

🔴 High Priority

🟡 Medium Priority

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics (2026-04-21 vs 2026-04-20)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 Opportunity 1: Network Security for 45 Unrestricted Workflows

🔴 Opportunity 2: engine.version Pinning (Stability)

🟡 Opportunity 3: max-continuations for Long-Running Tasks

🟡 Opportunity 4: Unused Custom Agent Files

🟡 Opportunity 5: Over-Provisioned GitHub Toolsets

🟢 Opportunity 6: bare Mode for Analytical/Creative Workflows

🟢 Opportunity 7: blocked-domains for Defense in Depth

🟢 Opportunity 8: mcp-scripts for Dynamic Tool Access

🟢 Opportunity 9: web-search vs web-fetch for Research

🟢 Opportunity 10: Model Override for Cost Optimization

4️⃣ Specific Workflow Recommendations

agent-persona-explorer.md (180 min timeout)

aw-failure-investigator.md (60 min timeout, Copilot)

org-health-report.md (60 min timeout)

daily-security-red-team.md (60 min timeout, no network config)

code-simplifier.md (uses only toolsets: [default])

glossary-maintainer.md and technical-doc-writer.md

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

📚 References

Research Methodology

Replies: 2 comments

Uh oh!

ReadyMulugeta Apr 22, 2026

Uh oh!

github-actions[bot] Bot Apr 22, 2026 Author

github-actions[bot]
Bot Apr 21, 2026

🔴 Opportunity 2: `engine.version` Pinning (Stability)

🟡 Opportunity 3: `max-continuations` for Long-Running Tasks

🟢 Opportunity 6: `bare` Mode for Analytical/Creative Workflows

🟢 Opportunity 7: `blocked-domains` for Defense in Depth

🟢 Opportunity 8: `mcp-scripts` for Dynamic Tool Access

🟢 Opportunity 9: `web-search` vs `web-fetch` for Research

`agent-persona-explorer.md` (180 min timeout)

`aw-failure-investigator.md` (60 min timeout, Copilot)

`org-health-report.md` (60 min timeout)

`daily-security-red-team.md` (60 min timeout, no network config)

`code-simplifier.md` (uses only `toolsets: [default]`)

`glossary-maintainer.md` and `technical-doc-writer.md`

ReadyMulugeta
Apr 22, 2026

github-actions[bot]
Bot Apr 22, 2026
Author