Skip to content

Session.send/3 multi-turn broken on Codex 0.119+: looks for session_id, CLI emits thread_id #40

@joshrotenberg

Description

@joshrotenberg

Summary

CodexWrapper.Session.send/3 tries to thread session continuity across turns, but against codex-cli ≥ 0.119.0 it silently never updates session_id — so the second and subsequent turns are always dispatched as fresh sessions instead of resuming the previous one.

Discovered while fixing the separate (but related) dialyzer issue on the chore/add-dialyxir branch — see the commit fix: Session.stream/3 returns the session unchanged. That fix intentionally scoped around this bug; filing it separately so it gets a dedicated runtime fix.

Root cause

Session.execute_turn/3 (first-turn branch) calls extract_session_id/1 on the result events:

defp extract_session_id(events) do
  Enum.find_value(events, fn event ->
    JsonLineEvent.get(event, "session_id")
  end)
end

But Codex 0.119+ does not emit a "session_id" field anywhere in its stream-json output. It emits the thread identifier as "thread_id" in the first thread.started event:

{"type":"thread.started","thread_id":"019d799c-09f1-7ea0-8e5a-7b42a640c103"}
{"type":"turn.started"}
{"type":"item.completed","item":{"id":"item_0","type":"agent_message","text":"pong"}}
{"type":"turn.completed","usage":{"input_tokens":13957,"output_tokens":17}}

So extract_session_id/1 always returns nil, and:

new_session_id = extract_session_id(events) || session.session_id

falls through to session.session_id, which is also nil on the first turn. The returned session has session_id: nil. The next call to Session.send/3 hits the session_id: nil branch of execute_turn/3 again and dispatches a fresh Exec command (not an ExecResume), losing all conversation history.

Repro

config = CodexWrapper.Config.new([])
session = CodexWrapper.Session.new(config, sandbox: :read_only, skip_git_repo_check: true)

{:ok, session, _} = CodexWrapper.Session.send(session, "Remember the number 42. Respond with 'ok'.")
IO.inspect(session.session_id)  # => nil (should be the thread_id)

{:ok, _, result} = CodexWrapper.Session.send(session, "What number did I ask you to remember?")
IO.inspect(result.stdout)
# => Codex has no memory of the previous turn; it starts fresh.

Suggested fix

Update extract_session_id/1 to look for "thread_id" instead (or in addition to) "session_id". Specifically, prefer thread_id from the thread.started event, since that's where Codex emits it:

defp extract_session_id(events) do
  Enum.find_value(events, fn event ->
    JsonLineEvent.get(event, "thread_id") || JsonLineEvent.get(event, "session_id")
  end)
end

Keeping the session_id fallback preserves compatibility with older Codex versions that may have used a different field name. The || order matters — modern Codex uses thread_id.

Workaround / background

A GenAgent backend adapter (gen_agent_codex) built on top of this package bypasses CodexWrapper.Session entirely and manages thread_id threading itself by calling Exec.execute_json/2 on the first turn and ExecResume.execute_json/2 on subsequent turns, capturing thread_id from the thread.started event. That pattern works against 0.119+ and could be adapted here — or more minimally, just fix the field-name lookup as above.

Test coverage gap

There are no tests in test/codex_wrapper/session_test.exs that exercise an actual multi-turn roundtrip against a real CLI. A live integration test that sends turn 1 → inspects captured session_id → sends turn 2 → verifies the CLI actually resumed would have caught this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions