Skip to content

[Bug]: Thread becomes permanently unusable if claudeAgent CLI session is killed before writing its first turn (dangling resume_cursor_json at turnCount=0) #2336

@shardmods

Description

@shardmods

Before submitting

  • I searched existing issues and did not find a duplicate.
  • I included enough detail to reproduce or investigate the problem.

Area

apps/server

Steps to reproduce

  1. Open an existing thread that uses the claudeAgent provider (one with prior turns already stored in projection_thread_messages).
  2. Send a message. T3 spawns a new claude CLI process and passes --resume <new_session_id> to continue the thread. Immediately after startup T3 writes provider_session_runtime.resume_cursor_json = {"resume":"<new_session_id>","turnCount":0}.
  3. Kill the CLI process before it flushes the first turn to ~/.claude/projects/<project>/<new_session_id>.jsonl. Easy ways to hit this:
    • Force-quit T3 (or the server child) while the first assistant turn is still streaming.
    • Put the laptop to sleep mid-turn, or lose network on a remote MCP that blocks the first response.
    • CLI process dies for any reason (OOM, crash, user kills it) before the first write.
  4. Reopen T3 and try to use the thread.

Observed in the wild on 4 separate threads in my local DB, all with turnCount: 0, each pointing at a Claude CLI session ID whose .jsonl file never got created. The companion ~/.claude/session-env/<new_session_id>/ directory exists and is empty, and MCP logs confirm the CLI booted far enough to initialize MCP servers but never persisted a message.

Expected behavior

The thread should be usable again. Either:

  • Detect that the --resume target is missing and fall back to starting a fresh CLI session for the thread (T3 already has all the conversation text in projection_thread_messages, so no user-visible history is lost), or
  • Only persist resume_cursor_json after the CLI has successfully written its first turn to disk, so a crash before the first flush leaves the previous (working) cursor intact.

Actual behavior

The thread is permanently bricked. Every subsequent attempt to send a message runs claude --resume <dead_session_id>, which exits 1 with:

No conversation found with session ID: <dead_session_id>

T3 surfaces this via a toast:

Claude Code returned an error result: No conversation found with session ID: <dead_session_id>

projection_thread_sessions.status flips to stopped/error and provider_session_runtime.resume_cursor_json is never cleared, so the broken cursor is retried on every open. The thread's history is still visible in the UI, but it's read-only for the user — no way to recover from inside the app.

Impact

Blocks work completely

Version or commit

T3 Code (Alpha) 0.0.21

Environment

macOS 26.4.1 (25E253), Apple Silicon T3 Code (Alpha) 0.0.21 (/Applications/T3 Code (Alpha).app) Claude Code CLI 2.1.119 (~/.local/share/claude/versions/2.1.119) Provider: claudeAgent, runtime_mode: full-access

Logs or stack traces

From `~/.t3/userdata/logs/server.trace.ndjson.5`, at the moment the thread was reopened (UUIDs are mine, reproduced verbatim):

    name: startSession
    attributes:
      provider.kind: claudeAgent
      provider.thread_id: b5344815-1a23-4b77-9f1e-c8110768a8c0
      claude.resume.source: resume-session
      claude.resume.thread_id: b5344815-1a23-4b77-9f1e-c8110768a8c0
      claude.resume.session_id: 693f907b-53d6-4cd6-8c21-e43234b9d922
      claude.resume.session_at: e7ae65c1-795d-42ad-b03b-58e0d0f14168
      claude.resume.turn_count: 0
      claude.query.resume: 693f907b-53d6-4cd6-8c21-e43234b9d922
    exit: Success

    name: sendTurn
    exit: Failure
    cause: ProviderAdapterRequestError: Provider adapter request failed (claudeAgent) for turn/setPermissionMode:
           Claude Code returned an error result: No conversation found with session ID: 693f907b-53d6-4cd6-8c21-e43234b9d922
             at toRequestError$1 (app.asar/apps/server/dist/bin.mjs:28770:9)
             at catch        (app.asar/apps/server/dist/bin.mjs:30273:22)

DB state at the time (redacted to the relevant columns):

    -- provider_session_runtime
    thread_id                              = b5344815-1a23-4b77-9f1e-c8110768a8c0
    status                                 = running
    resume_cursor_json                     = {"threadId":"b5344815-…","resume":"693f907b-…","resumeSessionAt":"e7ae65c1-…","turnCount":0}

    -- projection_thread_sessions
    provider_session_id                    = NULL
    last_error                             = Claude Code returned an error result: No conversation found with session ID: 693f907b-…

    -- projection_thread_messages
    SELECT COUNT(*) WHERE thread_id = 'b5344815-…';   -- 124   (T3-side history fully intact)

    -- filesystem
    ~/.claude/projects/-Users-henryw-Coding-notes/693f907b-….jsonl   -- does not exist
    ~/.claude/session-env/693f907b-…/                                -- exists, empty

Reproduction across my DB — every broken thread has the same signature (`turnCount: 0`, session_id not on disk):

    thread_id                               resume (missing cli session)        turnCount
    b5344815-1a23-4b77-9f1e-c8110768a8c0    693f907b-53d6-4cd6-8c21-e43234b9d922  0
    603c2806-0637-44fb-af3a-56a890020064    88036aa5-565a-4d34-834b-377d86b3174a  0
    c00a985b-a0d4-4e2f-92db-75e8af8c5d34    2d963278-b869-4f73-a727-565dcc2d644a  0
    7384c19e-2eff-4c90-8b08-22b2d13222f4    7eb55bb0-6ebf-4170-a6f4-9427b686d62e  0

Screenshots, recordings, or supporting files

No response

Workaround

Quit T3, then null out the bad cursors directly. The chat history is unaffected — only the CLI-side continuation context is dropped:

sqlite3 ~/.t3/userdata/state.sqlite \
  "UPDATE provider_session_runtime SET resume_cursor_json = NULL
   WHERE json_extract(resume_cursor_json, '$.turnCount') = 0
     AND json_extract(resume_cursor_json, '$.resume') IS NOT NULL;"

On next open T3 starts a fresh claudeAgent session for the thread and everything works again. "Restart provider session" from the UI does not fix it — it re-reads the same broken cursor.

Suggested real fix: write resume_cursor_json only after the CLI's first turn has been flushed to its .jsonl, or, on --resume failure, transparently retry without --resume and mark the thread as a fresh continuation.

Addendum: nulling resume_cursor_json unsticks the composer but Claude loses
all prior context — --resume is the only mechanism that hydrates the CLI's
conversation memory, and T3's projection_thread_messages is never fed back
into the new session. The UI still renders the history, so the regression is
invisible until you ask the model what it just said.

In my local DB three of four broken threads turned out to have fully intact
CLI sessions on disk under a different session ID than the one T3's cursor
was recorded with (e.g. thread c00a985b had 367 assistant turns preserved
under session 6565eb2d, but its cursor pointed at the dead 2d963278).
That strongly suggests T3 replaces the cursor every time a new CLI
subprocess spawns, even when the prior cursor was pointing at a still-valid
jsonl — so transient failures during session spin-up stomp a good resume
target with a bad one. A safer policy: only update resume_cursor_json
after the new session has confirmed at least one turn on disk.

Proper fix options, in order of preference:

  1. Don't overwrite a known-good cursor until the replacement has flushed ≥1 turn.
  2. On --resume failure, walk backwards through prior session IDs the thread
    has used (these are already in the event log) and try the newest one that
    still exists on disk before giving up.
  3. As a last resort, rehydrate a fresh CLI session from
    projection_thread_messages so Claude at least has the text context.

Written by Claude Opus 4.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is broken or behaving incorrectly.needs-triageIssue needs maintainer review and initial categorization.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions