[Bug]: Thread becomes permanently unusable if claudeAgent CLI session is killed before writing its first turn (dangling resume_cursor_json at turnCount=0)

### Before submitting

- [x] I searched existing issues and did not find a duplicate.
- [x] I included enough detail to reproduce or investigate the problem.

### Area

apps/server

### Steps to reproduce

1. Open an existing thread that uses the `claudeAgent` provider (one with prior turns already stored in `projection_thread_messages`).
2. Send a message. T3 spawns a new `claude` CLI process and passes `--resume <new_session_id>` to continue the thread. Immediately after startup T3 writes `provider_session_runtime.resume_cursor_json = {"resume":"<new_session_id>","turnCount":0}`.
3. Kill the CLI process before it flushes the first turn to `~/.claude/projects/<project>/<new_session_id>.jsonl`. Easy ways to hit this:
   - Force-quit T3 (or the server child) while the first assistant turn is still streaming.
   - Put the laptop to sleep mid-turn, or lose network on a remote MCP that blocks the first response.
   - CLI process dies for any reason (OOM, crash, user kills it) before the first write.
4. Reopen T3 and try to use the thread.

Observed in the wild on 4 separate threads in my local DB, all with `turnCount: 0`, each pointing at a Claude CLI session ID whose `.jsonl` file never got created. The companion `~/.claude/session-env/<new_session_id>/` directory exists and is empty, and MCP logs confirm the CLI booted far enough to initialize MCP servers but never persisted a message.


### Expected behavior

The thread should be usable again. Either:
- Detect that the `--resume` target is missing and fall back to starting a fresh CLI session for the thread (T3 already has all the conversation text in `projection_thread_messages`, so no user-visible history is lost), or
- Only persist `resume_cursor_json` *after* the CLI has successfully written its first turn to disk, so a crash before the first flush leaves the previous (working) cursor intact.

### Actual behavior

The thread is permanently bricked. Every subsequent attempt to send a message runs `claude --resume <dead_session_id>`, which exits 1 with:

    No conversation found with session ID: <dead_session_id>

T3 surfaces this via a toast:

    Claude Code returned an error result: No conversation found with session ID: <dead_session_id>

`projection_thread_sessions.status` flips to `stopped`/`error` and `provider_session_runtime.resume_cursor_json` is never cleared, so the broken cursor is retried on every open. The thread's history is still visible in the UI, but it's read-only for the user — no way to recover from inside the app.

### Impact

Blocks work completely

### Version or commit

T3 Code (Alpha) 0.0.21

### Environment

macOS 26.4.1 (25E253), Apple Silicon T3 Code (Alpha) 0.0.21  (/Applications/T3 Code (Alpha).app) Claude Code CLI 2.1.119 (~/.local/share/claude/versions/2.1.119) Provider: claudeAgent, runtime_mode: full-access

### Logs or stack traces

```shell
From `~/.t3/userdata/logs/server.trace.ndjson.5`, at the moment the thread was reopened (UUIDs are mine, reproduced verbatim):

    name: startSession
    attributes:
      provider.kind: claudeAgent
      provider.thread_id: b5344815-1a23-4b77-9f1e-c8110768a8c0
      claude.resume.source: resume-session
      claude.resume.thread_id: b5344815-1a23-4b77-9f1e-c8110768a8c0
      claude.resume.session_id: 693f907b-53d6-4cd6-8c21-e43234b9d922
      claude.resume.session_at: e7ae65c1-795d-42ad-b03b-58e0d0f14168
      claude.resume.turn_count: 0
      claude.query.resume: 693f907b-53d6-4cd6-8c21-e43234b9d922
    exit: Success

    name: sendTurn
    exit: Failure
    cause: ProviderAdapterRequestError: Provider adapter request failed (claudeAgent) for turn/setPermissionMode:
           Claude Code returned an error result: No conversation found with session ID: 693f907b-53d6-4cd6-8c21-e43234b9d922
             at toRequestError$1 (app.asar/apps/server/dist/bin.mjs:28770:9)
             at catch        (app.asar/apps/server/dist/bin.mjs:30273:22)

DB state at the time (redacted to the relevant columns):

    -- provider_session_runtime
    thread_id                              = b5344815-1a23-4b77-9f1e-c8110768a8c0
    status                                 = running
    resume_cursor_json                     = {"threadId":"b5344815-…","resume":"693f907b-…","resumeSessionAt":"e7ae65c1-…","turnCount":0}

    -- projection_thread_sessions
    provider_session_id                    = NULL
    last_error                             = Claude Code returned an error result: No conversation found with session ID: 693f907b-…

    -- projection_thread_messages
    SELECT COUNT(*) WHERE thread_id = 'b5344815-…';   -- 124   (T3-side history fully intact)

    -- filesystem
    ~/.claude/projects/-Users-henryw-Coding-notes/693f907b-….jsonl   -- does not exist
    ~/.claude/session-env/693f907b-…/                                -- exists, empty

Reproduction across my DB — every broken thread has the same signature (`turnCount: 0`, session_id not on disk):

    thread_id                               resume (missing cli session)        turnCount
    b5344815-1a23-4b77-9f1e-c8110768a8c0    693f907b-53d6-4cd6-8c21-e43234b9d922  0
    603c2806-0637-44fb-af3a-56a890020064    88036aa5-565a-4d34-834b-377d86b3174a  0
    c00a985b-a0d4-4e2f-92db-75e8af8c5d34    2d963278-b869-4f73-a727-565dcc2d644a  0
    7384c19e-2eff-4c90-8b08-22b2d13222f4    7eb55bb0-6ebf-4170-a6f4-9427b686d62e  0
```

### Screenshots, recordings, or supporting files

_No response_

### Workaround

Quit T3, then null out the bad cursors directly. The chat history is unaffected — only the CLI-side continuation context is dropped:

    sqlite3 ~/.t3/userdata/state.sqlite \
      "UPDATE provider_session_runtime SET resume_cursor_json = NULL
       WHERE json_extract(resume_cursor_json, '$.turnCount') = 0
         AND json_extract(resume_cursor_json, '$.resume') IS NOT NULL;"

On next open T3 starts a fresh claudeAgent session for the thread and everything works again. "Restart provider session" from the UI does **not** fix it — it re-reads the same broken cursor.

Suggested real fix: write `resume_cursor_json` only after the CLI's first turn has been flushed to its `.jsonl`, or, on `--resume` failure, transparently retry without `--resume` and mark the thread as a fresh continuation.


Addendum: nulling `resume_cursor_json` unsticks the composer but Claude loses
all prior context — `--resume` is the *only* mechanism that hydrates the CLI's
conversation memory, and T3's `projection_thread_messages` is never fed back
into the new session. The UI still renders the history, so the regression is
invisible until you ask the model what it just said.

In my local DB three of four broken threads turned out to have fully intact
CLI sessions on disk under a *different* session ID than the one T3's cursor
was recorded with (e.g. thread c00a985b had 367 assistant turns preserved
under session 6565eb2d, but its cursor pointed at the dead 2d963278).
That strongly suggests T3 *replaces* the cursor every time a new CLI
subprocess spawns, even when the prior cursor was pointing at a still-valid
jsonl — so transient failures during session spin-up stomp a good resume
target with a bad one. A safer policy: only update `resume_cursor_json`
after the new session has confirmed at least one turn on disk.

Proper fix options, in order of preference:
  1. Don't overwrite a known-good cursor until the replacement has flushed ≥1 turn.
  2. On `--resume` failure, walk backwards through prior session IDs the thread
     has used (these are already in the event log) and try the newest one that
     still exists on disk before giving up.
  3. As a last resort, rehydrate a fresh CLI session from
     `projection_thread_messages` so Claude at least has the text context.


Written by Claude Opus 4.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Thread becomes permanently unusable if claudeAgent CLI session is killed before writing its first turn (dangling resume_cursor_json at turnCount=0) #2336

Before submitting

Area

Steps to reproduce

Expected behavior

Actual behavior

Impact

Version or commit

Environment

Logs or stack traces

Screenshots, recordings, or supporting files

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Thread becomes permanently unusable if claudeAgent CLI session is killed before writing its first turn (dangling resume_cursor_json at turnCount=0) #2336

Description

Before submitting

Area

Steps to reproduce

Expected behavior

Actual behavior

Impact

Version or commit

Environment

Logs or stack traces

Screenshots, recordings, or supporting files

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions