Skip to content

fix: ship fixed SQLite via SQLx 0.9#24670

Closed
btraut-openai wants to merge 3 commits into
mainfrom
btraut/sqlx-0.9-sqlite-wal-reset
Closed

fix: ship fixed SQLite via SQLx 0.9#24670
btraut-openai wants to merge 3 commits into
mainfrom
btraut/sqlx-0.9-sqlite-wal-reset

Conversation

@btraut-openai
Copy link
Copy Markdown
Contributor

Why

Codex uses pooled SQLite connections in WAL mode for runtime state databases. SQLite documents a rare WAL-reset race that can corrupt a database when concurrent writers or checkpointers operate across connections, and the bundled SQLite 3.46.0 currently shipped by Codex is in the affected version range.

This was observed in practice on May 26, 2026: ~/.codex/state_5.sqlite developed structural B-tree corruption after thread and spawned-agent state persistence, eventually preventing Nightly startup. Shipping a fixed SQLite engine prevents new exposure; merely upgrading a binary does not repair a database that is already malformed.

This is an alternate implementation to #24664. It explores taking the upstream SQLx release path rather than vendoring SQLite or maintaining a patched SQLx fork.

What Changed

  • Upgrade the Rust workspace toolchain and Bazel toolchain selection from 1.93.0 to 1.94.0, required by SQLx 0.9.0.
  • Upgrade SQLx from 0.8.6 to 0.9.0 and use sqlite-bundled rather than SQLx 0.9's broader sqlite aggregate feature.
  • Resolve libsqlite3-sys to 0.37.0, whose bundled amalgamation is SQLite 3.51.3, outside the affected WAL-reset range.
  • Adapt the existing state runtime to SQLx 0.9 API changes: preserve new migrator settings, remove obsolete QueryBuilder lifetimes, and explicitly audit fixed-fragment dynamic SQL construction via AssertSqlSafe.
  • Add a runtime-linked version assertion that fails if a vulnerable SQLite implementation is bundled again.

Data And Recovery

This change does not introduce a Codex schema migration or SQLite file-format migration. A healthy existing state database can be opened by the fixed build and remains readable by an older Codex build; rolling back would, however, re-expose subsequent writes to vulnerable SQLite.

An already-corrupted state_5.sqlite is not made trustworthy by installing this build. Recovery should continue to back up the database, WAL, and SHM files before rebuilding queryable state from durable rollout JSONL history. The existing CLI recovery flow provides backup-and-rebuild behavior; desktop recovery UX remains separate follow-up work.

Draft Caveat

This branch intentionally does not update .github/workflows/*: GitHub push protection rejects changes to those paths from this branch. If the SQLx 0.9 approach is chosen, CI and release workflow Rust toolchain pins must be updated to 1.94.0 through the authorized workflow-change path before merging.

Verification

  • just test -p codex-state
  • just test -p codex-cli
  • just fix -p codex-state
  • just bazel-lock-update
  • just bazel-lock-check
  • Verified through the new linked_sqlite_has_wal_reset_bug_fix test that the linked runtime SQLite version is not vulnerable.

just fmt completed its Rust formatting step; its unrelated Python SDK uv stage could not write to the sandboxed global cache in the local environment.

@btraut-openai
Copy link
Copy Markdown
Contributor Author

This is being handled by other folks who have a better idea of how to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant