Skip to content

Hosted-mode (premium request): agent loop does not autonomously call write/edit tools — even with permissions auto-allowed #3209

@davidatorres

Description

@davidatorres

Summary

The Copilot CLI's hosted-mode (no BYOK; default Copilot Enterprise
premium-request path) authenticates and runs the agent loop
correctly, but does not autonomously call write / edit /
create_file tools
even when the prompt explicitly asks for code
to be generated and the workspace is empty. The agent makes
exploratory tool calls (view, glob, shell, grep) and emits
text responses, then closes the session without writing any files.

This makes the Copilot CLI unsuitable for unattended autonomous
codegen workloads (e.g. CI pipelines, factory tiers) regardless of
which model is selected.

Environment

  • copilot CLI: 0.0.353 / 1.0.36-0
  • github-copilot-sdk: 0.3.0 (Python wrapper)
  • Auth: GitHub fine-grained PAT (github_pat_*) with Copilot scope
    in COPILOT_GITHUB_TOKEN env var
  • Models tried (selected via SDK model= parameter):
    claude-sonnet-4.5, gpt-5

Reproducer

import asyncio
from copilot import CopilotClient, SubprocessConfig

async def main() -> None:
    client = CopilotClient(
        config=SubprocessConfig(
            github_token="<github_pat_with_copilot_scope>",
            use_logged_in_user=True,
        ),
    )
    session = await client.create_session(
        model="claude-sonnet-4.5",                    # also seen with gpt-5
        working_directory="/tmp/empty-dir",
        github_token="<github_pat_with_copilot_scope>",
        on_permission_request=async_handler_returning_allow,
    )
    # Realistic codegen prompt — full NLSpec/LangSpec inlined,
    # detailed task description, target package layout, acceptance
    # criteria, the works. Same prompt that produces 14 passing
    # files via Anthropic's Claude Agent SDK.
    response = await session.send_and_wait(prompt, timeout=60 * 30)
    print(await session.get_messages())  # zero write tool calls

asyncio.run(main())

Expected behavior

The agent — given an explicit prompt asking for code to be written
to a workspace, with permissions auto-granted — calls write_file
/ edit_file / create_file (or whatever tool name the CLI's tool
catalog uses) to produce the requested artifacts. Same prompt
produces working code via Anthropic's Claude Agent SDK
(claude_agent_sdk.query) and OpenAI's Codex SDK
(codex_app_server.Codex).

Actual behavior

  • 12 successful LLM round-trips against claude-sonnet-4.5 over
    ~110 seconds, ~12 premium requests consumed.
  • Tool calls observed in the Copilot CLI log: view ×4, glob
    ×3, shell ×2, grep ×1. Zero write / edit /
    create_file calls.
  • Session closes cleanly without errors. Workspace stays empty.
  • Same outcome with gpt-5 instead of Claude Sonnet 4.5
    (different scaffolding test under BYOK→OpenAI direct).
  • Same outcome with qwen3-coder-next:51b under BYOK→Ollama (where
    the CLI auto-allows but the model still doesn't commit Writes).

The pattern is consistent: the Copilot CLI's agent loop appears
biased against autonomous file writes regardless of the underlying
model. Likely the CLI's system prompt or tool-use posture is tuned
for interactive developer assistance ("answer the question,
suggest, let the human review") rather than autonomous codegen.

Why this matters

We're standing up a code-agent abstraction layer above Microsoft
Agent Framework where each vendor's CLI / SDK is a peer adapter.
Anthropic's Claude Agent SDK (claude-sonnet-4.5) and OpenAI's
Codex SDK (gpt-5.4 via Foundry) both produce passing test suites
on the same prompt that yields zero files via Copilot CLI. The
Copilot CLI is the outlier; the model is fine.

For organizations with GitHub Copilot Enterprise quota who want to
use Copilot CLI for unattended autonomous codegen (CI runners,
factory tiers, batch jobs), this behavior gap is a hard block —
even though the auth, billing, and wire layers all work.

Suggested fixes

  1. Add a permission_mode: "bypassPermissions"-style option
    that biases the agent loop toward autonomous file writes when
    the calling environment is non-interactive, mirroring the Claude
    Agent SDK's permission_mode="bypassPermissions". The
    on_permission_request callback handles the wire-level
    approval; the agent loop's "should I write or just suggest"
    disposition is separate and currently leans "suggest."
  2. Document the autonomous-codegen posture clearly so operators
    know the CLI is interactive-first and wouldn't expect it to
    write files autonomously without explicit prompting tweaks.
  3. Surface a system-prompt override / agent-mode selector so
    the SDK caller can opt the agent loop into
    "implement-and-write" mode (the current behavior would remain
    the default for human-in-the-loop usage).

Workarounds we don't think work cleanly

  • Prompt engineering ("you MUST write the files using write_file
    tool, ABSOLUTELY DO NOT just describe what you would do"): the
    agent's posture appears system-prompt-rooted, not user-prompt
    steerable in our experiments.
  • Pre-creating empty stub files in the workspace: the agent calls
    view to inspect them but still doesn't edit to populate.
  • Different model selection: tested with claude-sonnet-4.5,
    gpt-5, and qwen3-coder-next:51b (BYOK Ollama). Same outcome
    across all three.


Companion issue: #

Related (different scope):

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agentsSub-agents, fleet, autopilot, plan mode, background agents, and custom agentsarea:toolsBuilt-in tools: file editing, shell, search, LSP, git, and tool call behavior

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions