Skip to content

draft: choose fixed SQLite dependency path for WAL-reset corruption#24664

Closed
btraut-openai wants to merge 3 commits into
mainfrom
btraut/fix-sqlite-wal-reset-corruption
Closed

draft: choose fixed SQLite dependency path for WAL-reset corruption#24664
btraut-openai wants to merge 3 commits into
mainfrom
btraut/fix-sqlite-wal-reset-corruption

Conversation

@btraut-openai
Copy link
Copy Markdown
Contributor

@btraut-openai btraut-openai commented May 26, 2026

Status: Blocked, Do Not Merge

Codex runtime state databases use pooled SQLite connections in WAL mode. SQLite documents a rare WAL-reset corruption bug affecting releases through 3.51.2 when separate connections write or checkpoint concurrently, and shipped Codex Nightly/Alpha binaries bundle vulnerable SQLite 3.46.0.

Earlier today, my ~/.codex/state_5.sqlite encountered structural B-tree corruption after thread and spawned-agent state persistence, eventually preventing startup. A fixed engine is needed to prevent future exposure; installing one will not make an already-corrupted database trustworthy.

This draft intentionally does not implement the fix. Its only code diff is a placeholder at the intended dependency-patch location while ownership of a pinned external dependency fork is decided.

Dependency Decision Needed

The narrow implementation path is a pinned external patch of sqlx-sqlite 0.8.6 that widens its libsqlite3-sys dependency constraint to admit official libsqlite3-sys 0.37.0, which bundles fixed SQLite 3.51.3. Upstream SQLx made this dependency relaxation in commit f5cdf3316d12ba0530486b4722a4114608fa1c84, but it is only published on the SQLx 0.9 line.

Unfortunately there's no good non-alpha sqlx-sqlite target that contains libsqlite3-sys with SQLite 3.51.3+, so this draft is paused while I find the cleanest place to patch that constraint. I'm open to alternative solutions if there's something cleaner.

Decision needed: which trusted repository should own the minimal sqlx-sqlite 0.8.6 fork and pinned revision used by Codex?

Rejected options:

  • Vendoring libsqlite3-sys or the SQLite amalgamation in Codex: it puts roughly 20 MB of third-party generated source in this repository and requires large-blob exceptions.
  • Updating to SQLx 0.9.0: it requires Rust 1.94.0, while Codex's current Bazel toolchain is Rust 1.93.0, and it broadens the change beyond the SQLite fix.

Implementation After Decision

Once the fork owner is chosen, this draft should be replaced with the real change:

  • Add the pinned external sqlx-sqlite 0.8.6 override and resolve official fixed libsqlite3-sys.
  • Add a codex-state regression guard that queries the actually linked sqlite_version() and rejects vulnerable releases.
  • Refresh Cargo and Bazel locks and verify built codex and codex-app-server artifacts embed fixed SQLite.

Recovery And Rollout

Healthy existing databases require no schema migration; this is an engine fix, not a Codex schema change.

Already-corrupted databases require backup-and-rebuild recovery. The CLI flow preserves runtime SQLite files plus WAL/SHM sidecars before rebuilding queryable thread state from rollout JSONL. Database-only state may not be reconstructed. The standalone app-server/Desktop startup path does not currently expose an equivalent visible recovery flow, so that UX requires focused follow-up alongside Nightly/Alpha rollout and corruption telemetry.

Placeholder Verification

  • Confirmed the proposed merge diff contains no vendored SQLite or libsqlite3-sys source.
  • Ran just bazel-lock-check.
  • Ran python3 .github/scripts/verify_cargo_workspace_manifests.py.
  • Ran the repository blob-size policy against the proposed tree.

Codex uses pooled WAL-mode SQLite state databases, and the bundled SQLite 3.46.0 is affected by the documented WAL-reset corruption race. Bundle fixed SQLite 3.51.3 through a pinned libsqlite3-sys override and guard the linked runtime version in codex-state tests.
@btraut-openai btraut-openai marked this pull request as ready for review May 26, 2026 23:45
The vendored dependency source is not a Codex workspace crate, and its official amalgamation assets necessarily exceed the default blob budget. Keep the pinned source auditable under third_party and allowlist the required large upstream artifacts.
@btraut-openai btraut-openai marked this pull request as draft May 26, 2026 23:49
Do not carry SQLite amalgamation source in Codex. Leave the draft blocked on selecting an owner for a pinned external sqlx-sqlite 0.8.6 patch that can admit fixed libsqlite3-sys releases.
@btraut-openai btraut-openai changed the title fix(state): bundle SQLite with WAL-reset corruption fix draft: choose fixed SQLite dependency path for WAL-reset corruption May 26, 2026
@btraut-openai
Copy link
Copy Markdown
Contributor Author

Rather than patching, it likely makes more sense to try to bump SQLx and Rust versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant