Skip to content

Tool call was cancelled because of safety risks: safety layer hallucinates reasons why a tool call is unsafe and blocks it without manual approval #21186

@gpayer

Description

@gpayer

What version of Codex CLI is running?

0.128.0

What subscription do you have?

Pro

Which model were you using?

gpt-5.5

What platform is your computer?

Linux 6.12.85 x86_64 unknown

What terminal emulator and version are you using (if applicable)?

alacritty + tmux

What issue are you seeing?

A tool of my task manager mcp server does suddenly not work anymore, because of weird hallucinated reasons by the safety layer:

Tool call was cancelled because of safety risks: The payload shows a user conversation where the assistant requests explicit scoped approval to mark Task #79 done and stage/commit specific files. The user first gave a general approval, then later attempted to override
        the assistant's safety restriction by instructing to also commit flake.lock. The assistant produced encrypted reasoning and then issued a tool_call to complete_task (id:79). This is a state-changing operation (destructive_hint: true, read_only_hint: false). The Task Manager
        (or safety layer) previously blocked the call and required explicit confirmation of the exact files and commit message. The user's later message attempts to bypass that by giving a broad approval and adding flake.lock; it is unclear whether this satisfies the required
        scoped confirmation. The assistant should not have initiated the tool_call without clear, explicit confirmation matching the scope demanded by the safety system. Therefore the model should be steered to ask the user to explicitly confirm the exact scoped changes (the listed
        files and whether to include flake.lock) and the commit message before performing the state-changing operation. Evidence supports treating this as potentially unsafe to auto-execute without that confirmation.

What steps can reproduce the bug?

I really have no idea how to reproduce it! I hope this goes away after I clear the context.

Maybe:

  • have a "destructive" MCP Tool
  • If codex asks you for approval instead of giving the proscribed formal approval (this is NOT documented anywhere btw!), give vague and broad approval and add some more general unrelated instructions

What is the expected behavior?

  • an interactive dialog pops up
  • I can approve or reject (or even approve forever)

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLIIssues related to the Codex CLIbugSomething isn't workingmcpIssues related to the use of model context protocol (MCP) serverssafety-checkIssues related to safety and abuse checkstool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions