What issue are you seeing?
After Codex Desktop was force-killed and restarted, an orphaned codex_chronicle process remained alive under launchd and continued holding the Chronicle lock. The restarted Desktop app repeatedly spawned a new Chronicle sidecar, which logged Waiting to acquire lock (owned by pid=...) over and over while the app/server processes showed high CPU and memory churn.
Killing only the orphaned codex_chronicle process immediately removed that lock-wait loop and substantially reduced the runaway CPU behavior. No user threads were archived or deleted.
This seems like a Desktop lifecycle cleanup bug: when the main app is killed or crashes, the Chronicle sidecar can survive as an orphan and block the next app instance.
Environment
Evidence
Before mitigation, process state included an orphaned Chronicle process from the previous killed app instance:
PID PPID STAT %CPU RSS ELAPSED COMMAND
3430 1 S 0.0 12832 48:59 /Applications/Codex.app/Contents/Resources/codex_chronicle
The restarted app was repeatedly spawning a new sidecar that could not acquire the lock:
[AppServerConnection] Starting local app-server sidecar argsCount=0 command=/Applications/Codex.app/Contents/Resources/codex_chronicle cwd=null hostId=local
[AppServerConnection] app_server_sidecar_stderr hostId=local message="codex_chronicle starting\nWaiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] app_server_sidecar_stderr hostId=local message="Waiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] Stopping local app-server sidecar hostId=local pid=...
The log repeated that lock-wait message approximately every 10 seconds until the orphan was killed.
Before mitigation, the app had also entered a heavy thread hydration loop in the same restart window:
~660 method=thread/read
~658 method=thread/resume
~649 method=thread/unsubscribe
~656 maybe_resume_success
The active workload included several very large local threads and failed memory backfill jobs. This may be a separate stressor, but it amplified the failure:
memory_stage1 ... error ... stream disconnected before completion: Incomplete response returned, reason: max_output_tokens
memory_stage1 ... error ... Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.
Mitigation performed
- Rotated/deleted large local Codex log files to reduce disk churn.
- Killed only the stale orphaned
codex_chronicle process and the old orphaned Crashpad handler from the previous app instance.
- Marked failed
memory_stage1 retries as exhausted so the background summarizer would not immediately retry oversized failed jobs.
After killing the orphaned Chronicle process:
- No
codex_chronicle process remained outside the current app tree.
- The
Waiting to acquire lock loop stopped.
- CPU dropped from runaway levels to a much more normal range.
- Threads remained available; nothing was archived or deleted.
Expected behavior
If Codex Desktop is killed or crashes, any owned codex_chronicle sidecar should either:
- exit with the parent app,
- release its lock reliably,
- or be detected and replaced safely by the next app launch.
The next Desktop launch should not repeatedly spawn sidecars that wait on a lock owned by a stale orphaned process.
Actual behavior
The orphaned codex_chronicle survived the killed app, held the lock, and caused the restarted app to repeatedly start/stop a Chronicle sidecar that could not acquire the lock. This correlated with high CPU/log churn and made the app unstable until the orphan was manually killed.
Possible fix direction
- Ensure
codex_chronicle is tied to the parent app lifecycle on macOS.
- Store enough metadata with the Chronicle lock to detect stale/orphaned owners.
- On Desktop startup, if the lock owner is parented to
launchd and no current Codex parent owns it, either reclaim the lock or terminate/restart the sidecar.
- Consider adding a bounded retry/backoff for sidecar startup when the lock is held.
What issue are you seeing?
After Codex Desktop was force-killed and restarted, an orphaned
codex_chronicleprocess remained alive underlaunchdand continued holding the Chronicle lock. The restarted Desktop app repeatedly spawned a new Chronicle sidecar, which loggedWaiting to acquire lock (owned by pid=...)over and over while the app/server processes showed high CPU and memory churn.Killing only the orphaned
codex_chronicleprocess immediately removed that lock-wait loop and substantially reduced the runaway CPU behavior. No user threads were archived or deleted.This seems like a Desktop lifecycle cleanup bug: when the main app is killed or crashes, the Chronicle sidecar can survive as an orphan and block the next app instance.
Environment
26.422.30944build2080codex-cli 0.117.026.0.1 (25A362)arm64Array buffer allocation failed, but this report is specifically about the orphaned Chronicle lock holder after restart.Evidence
Before mitigation, process state included an orphaned Chronicle process from the previous killed app instance:
The restarted app was repeatedly spawning a new sidecar that could not acquire the lock:
The log repeated that lock-wait message approximately every 10 seconds until the orphan was killed.
Before mitigation, the app had also entered a heavy thread hydration loop in the same restart window:
The active workload included several very large local threads and failed memory backfill jobs. This may be a separate stressor, but it amplified the failure:
Mitigation performed
codex_chronicleprocess and the old orphaned Crashpad handler from the previous app instance.memory_stage1retries as exhausted so the background summarizer would not immediately retry oversized failed jobs.After killing the orphaned Chronicle process:
codex_chronicleprocess remained outside the current app tree.Waiting to acquire lockloop stopped.Expected behavior
If Codex Desktop is killed or crashes, any owned
codex_chroniclesidecar should either:The next Desktop launch should not repeatedly spawn sidecars that wait on a lock owned by a stale orphaned process.
Actual behavior
The orphaned
codex_chroniclesurvived the killed app, held the lock, and caused the restarted app to repeatedly start/stop a Chronicle sidecar that could not acquire the lock. This correlated with high CPU/log churn and made the app unstable until the orphan was manually killed.Possible fix direction
codex_chronicleis tied to the parent app lifecycle on macOS.launchdand no current Codex parent owns it, either reclaim the lock or terminate/restart the sidecar.