Skip to content

Thread navigation/loading slows from unbounded metadata and eager large-history hydration #21211

@clairernovotny

Description

@clairernovotny

Supersedes #21154. The original root cause was threads.title becoming the full first user message, but the local diagnostics now point to a broader performance problem in thread navigation and thread loading:

  • unbounded thread metadata can bloat the SQLite thread-list/navigation path
  • a SQLite-only trim is temporary because reconciliation can repopulate titles from JSONL history
  • thread/list can still be huge when first_user_message is treated as a full prompt/history field instead of a bounded preview
  • opening a large thread can still be slow because the UI path eagerly hydrates/replays too much history

User impact

Codex Desktop becomes sluggish when switching chats and when opening/loading large threads. The visible UX impact is not just a database query delay: the app can appear stuck while the renderer parses, allocates, reconciles, and renders large thread payloads.

In local testing, trimming pathological titles helped chat switching a lot. Loading large threads was still slow afterward, which suggests there are at least two hot paths:

  1. list/navigation metadata payload size
  2. initial thread-open history hydration/rendering

Title bloat impact

I benchmarked an affected local SQLite DB backup against the same row set after shortening only pathological active titles. This isolates title length as the variable.

Active rows

DB Active rows Active title chars Active first_user_message chars Max title chars Active titles > 120
Bad backup 134 14,610,549 14,614,033 675,773 94
Same rows after title repair 134 3,588 14,614,033 120 0
Current repaired DB 158 5,154 14,614,486 120 0

Thread-list style query

Query approximating the active thread navigation/list path that needs titles:

SELECT id, title, source, cwd, updated_at_ms
FROM threads
WHERE COALESCE(archived,0)=0
ORDER BY updated_at_ms DESC, id DESC
LIMIT 200;

Measured over 80 iterations:

DB Rows Result payload bytes SQLite query median SQLite query p95 JSON encode median
Bad backup 134 15,605,594 8.12 ms 10.30 ms 46.34 ms
Same rows after title repair 134 25,074 0.15 ms 0.19 ms 0.14 ms
Current repaired DB 158 34,343 0.17 ms 0.19 ms 0.17 ms

Impact from title repair on the same rows:

  • Result payload: 15.6 MB -> 25 KB, about 622x smaller.
  • SQLite read median: 8.12 ms -> 0.15 ms, about 54x faster.
  • JSON encode median: 46.34 ms -> 0.14 ms, about 331x faster.

This is before Electron IPC, UI reconciliation, rendering, and any extra chat-switching work.

Full list item including preview

If the list path also includes first_user_message, the duplicated title roughly doubles the heavy payload:

SELECT id, title, first_user_message, source, cwd, updated_at_ms
FROM threads
WHERE COALESCE(archived,0)=0
ORDER BY updated_at_ms DESC, id DESC
LIMIT 200;
DB Rows Result payload bytes SQLite query median SQLite query p95 JSON encode median
Bad backup 134 31,196,414 14.70 ms 17.75 ms 90.73 ms
Same rows after title repair 134 15,615,894 7.12 ms 8.55 ms 46.22 ms
Current repaired DB 158 15,626,192 7.08 ms 8.50 ms 45.40 ms

So the title bug doubled the heavy list item payload: one copy in first_user_message, another copy in title.

Reconciliation can repopulate bad titles

SQLite is not the only source of truth. Codex can rebuild or reconcile thread metadata from rollout JSONL files and session_index.jsonl.

The local failure mode appears to be:

  • the state extractor reads rollout events
  • the first EventMsg::UserMessage can populate both first_user_message and fallback title
  • later ThreadNameUpdated events can overwrite the title
  • reconciliation/upsert writes the resulting metadata back into SQLite

That means a DB-only trim can be undone later if the underlying JSONL history has no later good ThreadNameUpdated event. In practice, the durable local repair needed both:

  • trim threads.title in SQLite
  • append a later thread-name update to the affected JSONL histories so reconciliation keeps the bounded title

thread/list can still be too large after title repair

Bounding titles helps, but thread/list can still pay for full prompt-sized first_user_message values when those are mapped into Thread.preview.

One local active DB snapshot after repopulation showed:

active rows: 158
active title chars: 13,980,313
active first_user_message chars: 14,614,486
active titles >120 chars: 67
active first_user_message >120 chars: 105
active first_user_message >10k chars: 74
max first_user_message: 675,773 chars

This was not only RepoPrompt. Grouped by source:

vscode rows=86 first_user_message_chars=13,981,761 max=675,773 gt120=75 gt10k=54
exec   rows=22 first_user_message_chars=627,261    max=44,430  gt120=22 gt10k=20

Direct app-server probe using /Applications/Codex.app/Contents/Resources/codex app-server:

thread/list archived=false modelProviders=[] useStateDbOnly=true  limit=20   2.56s response=4,505,704 bytes
thread/list archived=false modelProviders=[] useStateDbOnly=true  limit=100  7.95s response=14,628,806 bytes
thread/list archived=false modelProviders=[] useStateDbOnly=false limit=100  8.26s response=14,627,363 bytes

This suggests first_user_message should either be bounded as a preview field for list/read summaries, or the full field should not be selected/sent in thread/list.

Opening large threads is a separate hot path

Direct app-server timings show that metadata-only reads are fast, but full turn reads can be very expensive:

48.7 MB rollout, image-heavy:
thread/read includeTurns=false  62.8 ms     response=850 bytes
thread/read includeTurns=true   11.65 s     response=20,654,619 bytes

45.9 MB rollout, compaction-heavy:
thread/read includeTurns=false  27.8 ms     response=875 bytes
thread/read includeTurns=true   3.39 s      response=6,363,354 bytes

1.4 MB rollout with giant first user message/title/preview:
thread/read includeTurns=false  415.9 ms    response=704,374 bytes
thread/read includeTurns=true   814.0 ms    response=1,414,448 bytes

The large rollouts are not all prompt-import cases. Large contributors included image generation events, compaction payloads, function/tool outputs, MCP tool results, and exec output.

Separate CDP profiling inside Codex Desktop showed the UI layer can still be the bottleneck even when app-server calls are quick:

tested thread switch: about 13.4s to settle in the UI
direct app-server calls for comparable tested thread data: tens of ms
renderer heap spike during switch: >200 MB transient
long tasks during switch: repeated ~1.8-2.0s tasks

So the remaining thread-open UX issue appears to be renderer hydration/reconciliation/allocation, not only DB or app-server I/O.

UX requirement

The fix should keep the current user experience or improve it:

  • switching to a thread should show usable content quickly
  • older turns should continue loading automatically
  • users should not have to click an extra "load older messages" control just to make opening a thread fast
  • full history should remain available for scrollback, search, copy, and context inspection
  • active assistant output and status should not be delayed behind background hydration

Suggested direction

I do not want to over-specify the implementation, but the fixes likely need to cover these areas:

  • enforce a single bounded display-title invariant across all title producers and SQLite write boundaries
  • make reconciliation preserve bounded titles instead of restoring the first-message fallback
  • keep thread/list payloads bounded by treating previews as previews, not full prompt/history fields
  • make initial thread open staged/paged: metadata plus newest visible turns first, older history loaded automatically afterward
  • avoid sending heavyweight replay payloads in the initial UI load when a placeholder/expand-on-demand representation would preserve the UX
  • add performance regression tests or fixtures for large titles, large previews, and large rollout histories

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appbugSomething isn't workingperformancesessionIssues involving session (thread) management, resuming, forking, naming, archiving

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions