fix(seidb-tool): addresses two issues against the FlatKV seidb tooling#3337
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d824dc6bce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3337 +/- ##
==========================================
+ Coverage 59.07% 59.13% +0.05%
==========================================
Files 2099 2097 -2
Lines 172988 172326 -662
==========================================
- Hits 102195 101903 -292
+ Misses 61922 61556 -366
+ Partials 8871 8867 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // errSourceChurning marks transient races where the source FlatKV directory | ||
| // mutates (snapshot pruned, WAL truncated) between our reads. It is the | ||
| // sentinel that prepareFlatKVToolingCloneWith uses to decide whether to | ||
| // retry instead of bailing out. | ||
| var errSourceChurning = errors.New("flatkv source kept churning during clone") |
There was a problem hiding this comment.
Not a blocker for this PR. As a future task, we should evaluate and update the flatKV snapshot threading model so that such races are not possible. Even if we're safe now, having to think about potential races makes safety a lot harder to reason about.
Summary
This PR addresses two auditor findings against the FlatKV seidb tooling: snapshot clones could silently degrade to multi-GB byte-copies on tmpfs, and a snapshot/WAL race let
catchupadvance over missing versions without complaint. The fixes pin the tool clone to the source filesystem, validate WAL coverage after the clone, and makecatchupenforce strict continuity fromcommittedVersion + 1.sei-db/tools/cmd/seidb/operations/flatkv_open.go: places the temp clone as a sibling ofdbDirsoos.Linkkeeps working when$TMPDIRis tmpfs; replaces the EXDEV byte-copy fallback for snapshot files with a fatal error so a misconfigured deployment cannot silently RAM-copy multi-GB snapshots; introduceserrSourceChurningandverifyClonedWALCoversto detect the race where a live writer rolls a new snapshot and front-truncates the WAL between our snapshot and changelog clone steps; routes that race through the existing retry loop alongsideos.ErrNotExist. Changelog files keep their byte-copy semantics (WAL recovery may rewrite a corrupted tail).sei-db/state_db/sc/flatkv/store_catchup.go: makescatchupreject WAL gaps loudly. WhencommittedVersion > 0,walFirstVermust be<= committedVersion + 1or catchup returns an error instead of silently jumping forward and corruptingcommittedLtHash. Adds an innerexpectedNextcheck inside the replay callback so an internal hole in the WAL (corruption or external surgery) is also rejected. Returns clean no-op when the WAL is entirely behindcommittedVersion.Test plan
sei-db/tools/cmd/seidb/operations/flatkv_open_test.go: covers same-filesystem placement of the tooling clone (TestPrepareFlatKVToolingClonePlacesTempDirOnSameFilesystem) and the snapshot/WAL truncation race (TestPrepareFlatKVToolingCloneDetectsWALTruncationRace), asserting that the latter surfaceserrSourceChurningso the retry loop can re-select the snapshot. Existing retry-on-ENOENT, current-symlink-missing, and historical-height tests continue to pass.sei-db/state_db/sc/flatkv/store_catchup_test.go:TestCatchupRejectsWALGapbuilds a v1..v5 store, front-truncates the WAL to start at v4, rewindscommittedVersionto v2, and asserts catchup returns aWAL gaperror instead of silently advancing.TestCatchupNoOpWhenWALBehindCommittedVersionguards the steady-state case where the WAL is entirely behindcommittedVersion.gofmt -s -l(clean) andgo test ./sei-db/state_db/... ./sei-db/tools/cmd/seidb/... ./sei-cosmos/storev2/rootmulti/... -run "Flatkv|FlatKV"(all green).