Skip to content

mcp-server: notifications/cancelled halts work but tools/call is never resolved #20925

@carlpaten

Description

@carlpaten

What version of Codex CLI is running?

codex-cli 0.128.0

What subscription do you have?

ChatGPT Plus

Which model were you using?

Default (whatever codex mcp-server uses with no model override)

What platform is your computer?

Darwin 25.4.0 arm64 arm

What terminal emulator and version are you using (if applicable)?

N/A (probe is non-interactive Python)

What issue are you seeing?

When an MCP client sends notifications/cancelled for an in-flight codex tool call to codex mcp-server, Codex correctly halts further work (no additional shell calls are issued), but the JSON-RPC tools/call request is never resolved with a result or an error. From the client's perspective the request hangs forever.

Per the MCP spec, a cancelled request is expected to receive a final response (typically an error result) so the client can release the request slot and surface the cancellation to the user. Currently the only way for a client to recover is to kill the entire codex mcp-server process, which loses every active threadId in that server.

What steps can reproduce the bug?

Self-contained Python repro (~80 lines, uses only the standard library):

#!/usr/bin/env python3
import json, subprocess, threading, queue, time, tempfile, pathlib

class C:
    def __init__(self):
        self.p = subprocess.Popen(["codex","mcp-server"], stdin=subprocess.PIPE,
            stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=0)
        self.q = queue.Queue(); self.nid = 1
        threading.Thread(target=self._r, daemon=True).start()
    def _r(self):
        for line in self.p.stdout:
            try: m = json.loads(line)
            except: continue
            if "id" in m and ("result" in m or "error" in m): self.q.put(m)
    def s(self, method, params=None, notif=False):
        msg = {"jsonrpc":"2.0","method":method}
        if params: msg["params"]=params
        if not notif: msg["id"]=self.nid; self.nid+=1
        self.p.stdin.write((json.dumps(msg)+"\n").encode()); self.p.stdin.flush()
        return msg.get("id")
    def wait(self, rid, timeout):
        end = time.time()+timeout
        while time.time()<end:
            try:
                m = self.q.get(timeout=max(0.1, end-time.time()))
                if m.get("id")==rid: return m
            except queue.Empty: return None

wd = tempfile.mkdtemp(prefix="codex-cancel-")
c = C()
rid = c.s("initialize",{"protocolVersion":"2024-11-05","capabilities":{},
    "clientInfo":{"name":"probe","version":"0"}})
c.wait(rid, 10)
c.s("notifications/initialized", notif=True)

prompt = (f"Create 60 files in {wd} named file_001.txt..file_060.txt, each containing its number. "
          f"Run a separate shell command per file with `sleep 1` between each. After all 60, reply DONE.")
rid = c.s("tools/call",{"name":"codex","arguments":{
    "prompt":prompt,"cwd":wd,"sandbox":"workspace-write","approval-policy":"never"}})

# wait until at least 3 files exist (Codex is mid-shell-work), then cancel
p = pathlib.Path(wd)
while len(list(p.glob("file_*.txt"))) < 3: time.sleep(0.2)
n_at_cancel = len(list(p.glob("file_*.txt")))
print(f"cancelling with {n_at_cancel} files on disk")
c.s("notifications/cancelled",{"requestId":rid,"reason":"test"}, notif=True)
t0 = time.time()

r = c.wait(rid, 90)
print(f"response after {time.time()-t0:.1f}s: {r}")
time.sleep(5)
n_final = len(list(p.glob("file_*.txt")))
print(f"files: at_cancel={n_at_cancel}, +5s_after_cancel={n_final}, delta={n_final-n_at_cancel}")

c.p.terminate()

Output on my machine:

cancelling with 3 files on disk
response after 90.0s: None
files: at_cancel=3, +5s_after_cancel=3, delta=0

So: cancellation was honored at the runtime (zero additional files created over the next 90s+, when uncancelled the task would have produced ~50 more), but the tools/call never returned.

What is the expected behavior?

The cancelled tools/call should receive a final JSON-RPC response — per the MCP spec, typically an error result with a "request was cancelled" message — so the client can release the request and inform the user. The current behavior leaves the client unable to distinguish "cancelled but still cleaning up" from "permanently hung," and the only recovery is killing the server.

Additional information

  • 23 cancel-related notifications are emitted on the side channel during the hang, so the server is internally aware of the cancellation; the gap is specifically that the original request is never resolved.
  • This affects clients that rely on tools/call returning to track outstanding requests (e.g. orchestration use cases that need to abandon a Codex turn and start a fresh one).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmcp-serverIssues related to the use of the `codex mcp-server` subcommandtool-callsIssues related to tool calling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions