Skip to content

System prompt labels shell commands as tools, causing invalid tool calls and autopilot cost #3043

@qoli

Description

@qoli

Describe the bug

In Copilot CLI non-interactive/autopilot sessions, the init system prompt labels shell commands as "Available tools":

<environment_context>
...
* Available tools: git, curl, gh
</environment_context>

In the observed session, curl was available as a shell command, but not as a callable Copilot tool/function. The model treated it as a callable tool and emitted:

tool= curl args= {'url': 'https://app.notion.com/p/...'}

The runtime then returned:

Tool 'curl' does not exist.

This looks like prompt-surface ambiguity rather than only a model mistake: the system prompt uses the word "tools" for shell commands while callable tools are also exposed to the model as tools.

Related schema confusion observed in the same testing work

The same ambiguity showed up around other tool workflows:

  1. Shell/session tools were called without the required prior shell state:
Multiple validation errors:
- "shellId": Required
- "delay": Required
  1. Sub-agent/task invocation was attempted with an incomplete schema:
{name: "task", arguments: {agent_type: "translation-validator", description: "Validate changelog files", prompt: "...", mode: "background"}}

Runtime response:

"name": Required

The model then switched to background agent + read_agent, which worked but added extra turns and runtime cost.

Why this matters

Autopilot mode absorbs these validation errors and keeps going, but each invalid tool call costs extra model requests, tokens, and wall time. This is especially visible with smaller/local models that are more sensitive to tool-surface ambiguity.

In our prompt optimization test, making the prompt explicitly avoid the ambiguous paths removed the validation errors across 10/10 successful runs:

  • Require direct file write, not shell input/bash write
  • Require translation-validator with name=translation-validator
  • Require synchronous validator, not background/read_agent
  • Require validator to read only output files, not Notion/web/curl
  • Use events.jsonl session.task_complete as completion signal, not stdout

Affected version

Observed in Copilot CLI session metadata:

copilotVersion: 1.0.39
model: Qwen3.6-35B-A3B-bf16
mode: --autopilot, non-interactive

Expected behavior

The init system prompt should clearly distinguish callable tools from shell commands. For example:

* Available shell commands: git, curl, gh

instead of:

* Available tools: git, curl, gh

It would also help if the tool guidance made preconditions and required fields harder to miss, especially for:

  • shell/session output tools that require shellId
  • task / sub-agent tools that require name
  • background agent flows that require read_agent

Suggested fixes

  • Rename Available tools: git, curl, gh to Available shell commands: git, curl, gh in the system prompt.
  • Explicitly state that shell commands must be run through the bash/shell tool and are not callable tool names.
  • Add stronger schema guidance for task / sub-agent invocation, including required name and when not to use background/read_agent.
  • Add stronger precondition wording for shell output/input tools that require an existing shellId.

Additional context

This is related in impact to task completion/output reliability issues, but it is a distinct problem: tool-surface ambiguity in the init prompt causes invalid tool calls and unnecessary autopilot cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:configurationConfig files, instruction files, settings, and environment variablesarea:non-interactiveNon-interactive mode (-p), CI/CD, ACP protocol, and headless automation

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions