Desktop: orphaned codex_chronicle after killed app causes lock-wait loop and high CPU

### What issue are you seeing?

After Codex Desktop was force-killed and restarted, an orphaned `codex_chronicle` process remained alive under `launchd` and continued holding the Chronicle lock. The restarted Desktop app repeatedly spawned a new Chronicle sidecar, which logged `Waiting to acquire lock (owned by pid=...)` over and over while the app/server processes showed high CPU and memory churn.

Killing only the orphaned `codex_chronicle` process immediately removed that lock-wait loop and substantially reduced the runaway CPU behavior. No user threads were archived or deleted.

This seems like a Desktop lifecycle cleanup bug: when the main app is killed or crashes, the Chronicle sidecar can survive as an orphan and block the next app instance.

### Environment

- Codex Desktop: `26.422.30944` build `2080`
- Codex CLI: `codex-cli 0.117.0`
- macOS: `26.0.1 (25A362)`
- Architecture: `arm64`
- Related issue: #19166 appears related to large-thread instability / `Array buffer allocation failed`, but this report is specifically about the orphaned Chronicle lock holder after restart.

### Evidence

Before mitigation, process state included an orphaned Chronicle process from the previous killed app instance:

```text
PID    PPID  STAT  %CPU  RSS    ELAPSED  COMMAND
3430   1     S     0.0   12832  48:59    /Applications/Codex.app/Contents/Resources/codex_chronicle
```

The restarted app was repeatedly spawning a new sidecar that could not acquire the lock:

```text
[AppServerConnection] Starting local app-server sidecar argsCount=0 command=/Applications/Codex.app/Contents/Resources/codex_chronicle cwd=null hostId=local
[AppServerConnection] app_server_sidecar_stderr hostId=local message="codex_chronicle starting\nWaiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] app_server_sidecar_stderr hostId=local message="Waiting to acquire lock (owned by pid=3430)..."
[AppServerConnection] Stopping local app-server sidecar hostId=local pid=...
```

The log repeated that lock-wait message approximately every 10 seconds until the orphan was killed.

Before mitigation, the app had also entered a heavy thread hydration loop in the same restart window:

```text
~660 method=thread/read
~658 method=thread/resume
~649 method=thread/unsubscribe
~656 maybe_resume_success
```

The active workload included several very large local threads and failed memory backfill jobs. This may be a separate stressor, but it amplified the failure:

```text
memory_stage1 ... error ... stream disconnected before completion: Incomplete response returned, reason: max_output_tokens
memory_stage1 ... error ... Codex ran out of room in the model's context window. Start a new thread or clear earlier history before retrying.
```

### Mitigation performed

1. Rotated/deleted large local Codex log files to reduce disk churn.
2. Killed only the stale orphaned `codex_chronicle` process and the old orphaned Crashpad handler from the previous app instance.
3. Marked failed `memory_stage1` retries as exhausted so the background summarizer would not immediately retry oversized failed jobs.

After killing the orphaned Chronicle process:

- No `codex_chronicle` process remained outside the current app tree.
- The `Waiting to acquire lock` loop stopped.
- CPU dropped from runaway levels to a much more normal range.
- Threads remained available; nothing was archived or deleted.

### Expected behavior

If Codex Desktop is killed or crashes, any owned `codex_chronicle` sidecar should either:

- exit with the parent app,
- release its lock reliably,
- or be detected and replaced safely by the next app launch.

The next Desktop launch should not repeatedly spawn sidecars that wait on a lock owned by a stale orphaned process.

### Actual behavior

The orphaned `codex_chronicle` survived the killed app, held the lock, and caused the restarted app to repeatedly start/stop a Chronicle sidecar that could not acquire the lock. This correlated with high CPU/log churn and made the app unstable until the orphan was manually killed.

### Possible fix direction

- Ensure `codex_chronicle` is tied to the parent app lifecycle on macOS.
- Store enough metadata with the Chronicle lock to detect stale/orphaned owners.
- On Desktop startup, if the lock owner is parented to `launchd` and no current Codex parent owns it, either reclaim the lock or terminate/restart the sidecar.
- Consider adding a bounded retry/backoff for sidecar startup when the lock is held.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Desktop: orphaned codex_chronicle after killed app causes lock-wait loop and high CPU #19516

What issue are you seeing?

Environment

Evidence

Mitigation performed

Expected behavior

Actual behavior

Possible fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Desktop: orphaned codex_chronicle after killed app causes lock-wait loop and high CPU #19516

Description

What issue are you seeing?

Environment

Evidence

Mitigation performed

Expected behavior

Actual behavior

Possible fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions