Skip to content

fix(depot): fix migrating v1 -> v2 databases#5183

Open
NathanFlurry wants to merge 1 commit into
stack/docs-rivetkit-add-rust-quickstart-preview-lpzwyzmwfrom
stack/fix-depot-fix-migrating-v1-v2-databases-mxwxqyss
Open

fix(depot): fix migrating v1 -> v2 databases#5183
NathanFlurry wants to merge 1 commit into
stack/docs-rivetkit-add-rust-quickstart-preview-lpzwyzmwfrom
stack/fix-depot-fix-migrating-v1-v2-databases-mxwxqyss

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

No description provided.

@NathanFlurry
Copy link
Copy Markdown
Member Author

Stack for rivet-dev/rivet

Check out this stack: jj-stack get https://github.com/rivet-dev/rivet/pull/5183
Pull/update this stack: jj-stack sync
Publish local edits: jj-stack submit
Merge when ready: jj-stack merge

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 5, 2026

Code Review

Summary

This PR changes the v1->v2 SQLite migration from a hard-fail-on-sidecar strategy to a best-effort approach that logs and abandons sidecars. The change unblocks actors that crashed mid-transaction from being permanently stuck on v1.

Data integrity consideration for crash-mid-commit

The journal test case exercises the scenario where the actor is killed before committing — in that case the main file still holds the last committed state, and abandoning the journal is safe.

However, DELETE-mode SQLite also creates a journal when committing: it backs up original pages to the journal first, then writes new pages to the main file, then deletes the journal. If the actor process crashes between those two steps (journal exists, main file has partially new pages), the main file is in a torn state. Importing it as-is would produce a corrupted v2 database with a mix of old and new pages.

The original code blocked migration in this case rather than risk importing a corrupt file. The new code accepts it. This is probably the right trade-off operationally (stuck actors are worse than a rare corrupt-on-disk state), but the existing comment in read_v1_main says "best-effort" without calling out this specific risk. A more explicit note would help future readers:

// Best-effort: import the main file as-is, discarding any sidecar. For
// the pre-commit crash case the main file reflects the last committed
// state. For the rare mid-commit crash the main file may be partially
// torn; the actor restarts with a corrupted database rather than being
// permanently blocked on v1.

Orphaned sidecar data

abandon_v1_sidecar_if_exists logs and increments a metric but does not delete the sidecar KV entries. After migration the sidecar bytes remain in the actor's KV namespace alongside the new v2 keys. If downstream cleanup already sweeps v1 KV after migration this is fine, but if not, these keys persist indefinitely. Worth a follow-up task if cleanup is not handled elsewhere.

Redundant get in read_v1_file

After the new v1_file_exists fast-path (meta get -> list fallback), read_v1_file issues its own get on the same meta key and then re-calls v1_file_exists when that get returns empty. For sidecars that exist this means two get calls on the same key. Pre-existing pattern and not a bug, but worth noting if the migration path is ever on a hot path.

Minor: WAL test fixture uses empty bytes

The updated WAL test passes &[] where the old test used b"unexpected wal bytes". Using empty bytes still triggers the code path, but the non-empty bytes were a slightly more realistic fixture. Not a correctness issue.

Positives

  • Replacing the three ensure! / bail! guards with a single abandon_v1_sidecar_if_exists helper is a clean refactor.
  • Renaming the metric from _rejected_journal_total (single-type IntCounter) to _abandoned_sidecar_total (IntCounterVec with a sidecar label) is the right shape — one counter now covers all three sidecar types.
  • Updated tests verify the actual migrated content instead of just checking for an error, which is a meaningfully better assertion.
  • Tracing actor_id and sidecar type at warn level gives enough signal to triage orphaned sidecars in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant