Skip to content

Desktop: orphaned codex_chronicle after killed app causes lock-wait loop and high CPU #19516

@markmdev

Description

@markmdev

What issue are you seeing?

After Codex Desktop was force-killed and restarted, an orphaned codex_chronicle process remained alive under launchd and continued holding the Chronicle lock. The restarted Desktop app repeatedly spawned a new Chronicle sidecar, which logged Waiting to acquire lock (owned by pid=...) over and over while the app/server processes showed high CPU and memory churn.

Killing only the orphaned codex_chronicle process immediately removed that lock-wait loop and substantially reduced the runaway CPU behavior. No user threads were archived or deleted.

This seems like a Desktop lifecycle cleanup bug: when the main app is killed or crashes, the Chronicle sidecar can survive as an orphan and block the next app instance.

Environment

Evidence

Before mitigation, process state included an orphaned Chronicle process from the previous killed app instance:

PID    PPID  STAT  %CPU  RSS    ELAPSED  COMMAND
3430   1     S     0.0   12832  48:59    /Applications/Codex.app/Contents/Resources/codex_chronicle

The restarted app was repeatedly spawning a new sidecar that could not acquire the lock:

[AppServerConnection] Starting local app-server sidecar argsCount=0 command=/Applications/Codex.app/Contents/Resources/codex_chronicle cwd=null hostId=local
[AppServerConnection] app_server_sidecar_stderr hostId=local message="codex_chronicle starting\nWaiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] app_server_sidecar_stderr hostId=local message="Waiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] Stopping local app-server sidecar hostId=local pid=...

The log repeated that lock-wait message approximately every 10 seconds until the orphan was killed.

Before mitigation, the app had also entered a heavy thread hydration loop in the same restart window:

~660 method=thread/read
~658 method=thread/resume
~649 method=thread/unsubscribe
~656 maybe_resume_success

The active workload included several very large local threads and failed memory backfill jobs. This may be a separate stressor, but it amplified the failure:

memory_stage1 ... error ... stream disconnected before completion: Incomplete response returned, reason: max_output_tokens
memory_stage1 ... error ... Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.

Mitigation performed

  1. Rotated/deleted large local Codex log files to reduce disk churn.
  2. Killed only the stale orphaned codex_chronicle process and the old orphaned Crashpad handler from the previous app instance.
  3. Marked failed memory_stage1 retries as exhausted so the background summarizer would not immediately retry oversized failed jobs.

After killing the orphaned Chronicle process:

  • No codex_chronicle process remained outside the current app tree.
  • The Waiting to acquire lock loop stopped.
  • CPU dropped from runaway levels to a much more normal range.
  • Threads remained available; nothing was archived or deleted.

Expected behavior

If Codex Desktop is killed or crashes, any owned codex_chronicle sidecar should either:

  • exit with the parent app,
  • release its lock reliably,
  • or be detected and replaced safely by the next app launch.

The next Desktop launch should not repeatedly spawn sidecars that wait on a lock owned by a stale orphaned process.

Actual behavior

The orphaned codex_chronicle survived the killed app, held the lock, and caused the restarted app to repeatedly start/stop a Chronicle sidecar that could not acquire the lock. This correlated with high CPU/log churn and made the app unstable until the orphan was manually killed.

Possible fix direction

  • Ensure codex_chronicle is tied to the parent app lifecycle on macOS.
  • Store enough metadata with the Chronicle lock to detect stale/orphaned owners.
  • On Desktop startup, if the lock owner is parented to launchd and no current Codex parent owns it, either reclaim the lock or terminate/restart the sidecar.
  • Consider adding a bounded retry/backoff for sidecar startup when the lock is held.

Metadata

Metadata

Assignees

No one assigned

    Labels

    appIssues related to the Codex desktop appapp-serverIssues involving app server protocol or interfacesbugSomething isn't workingperformance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions