Skip to content

fix(seidb-tool): addresses two issues against the FlatKV seidb tooling#3337

Merged
blindchaser merged 5 commits into
mainfrom
yiren/flatkv-tool
May 5, 2026
Merged

fix(seidb-tool): addresses two issues against the FlatKV seidb tooling#3337
blindchaser merged 5 commits into
mainfrom
yiren/flatkv-tool

Conversation

@blindchaser

Copy link
Copy Markdown
Contributor

Summary

This PR addresses two auditor findings against the FlatKV seidb tooling: snapshot clones could silently degrade to multi-GB byte-copies on tmpfs, and a snapshot/WAL race let catchup advance over missing versions without complaint. The fixes pin the tool clone to the source filesystem, validate WAL coverage after the clone, and make catchup enforce strict continuity from committedVersion + 1.

  • sei-db/tools/cmd/seidb/operations/flatkv_open.go: places the temp clone as a sibling of dbDir so os.Link keeps working when $TMPDIR is tmpfs; replaces the EXDEV byte-copy fallback for snapshot files with a fatal error so a misconfigured deployment cannot silently RAM-copy multi-GB snapshots; introduces errSourceChurning and verifyClonedWALCovers to detect the race where a live writer rolls a new snapshot and front-truncates the WAL between our snapshot and changelog clone steps; routes that race through the existing retry loop alongside os.ErrNotExist. Changelog files keep their byte-copy semantics (WAL recovery may rewrite a corrupted tail).
  • sei-db/state_db/sc/flatkv/store_catchup.go: makes catchup reject WAL gaps loudly. When committedVersion > 0, walFirstVer must be <= committedVersion + 1 or catchup returns an error instead of silently jumping forward and corrupting committedLtHash. Adds an inner expectedNext check inside the replay callback so an internal hole in the WAL (corruption or external surgery) is also rejected. Returns clean no-op when the WAL is entirely behind committedVersion.

Test plan

  • sei-db/tools/cmd/seidb/operations/flatkv_open_test.go: covers same-filesystem placement of the tooling clone (TestPrepareFlatKVToolingClonePlacesTempDirOnSameFilesystem) and the snapshot/WAL truncation race (TestPrepareFlatKVToolingCloneDetectsWALTruncationRace), asserting that the latter surfaces errSourceChurning so the retry loop can re-select the snapshot. Existing retry-on-ENOENT, current-symlink-missing, and historical-height tests continue to pass.
  • sei-db/state_db/sc/flatkv/store_catchup_test.go: TestCatchupRejectsWALGap builds a v1..v5 store, front-truncates the WAL to start at v4, rewinds committedVersion to v2, and asserts catchup returns a WAL gap error instead of silently advancing. TestCatchupNoOpWhenWALBehindCommittedVersion guards the steady-state case where the WAL is entirely behind committedVersion.
  • Verified with gofmt -s -l (clean) and go test ./sei-db/state_db/... ./sei-db/tools/cmd/seidb/... ./sei-cosmos/storev2/rootmulti/... -run "Flatkv|FlatKV" (all green).

@github-actions

github-actions Bot commented Apr 29, 2026

Copy link
Copy Markdown

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 4, 2026, 5:51 PM

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d824dc6bce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sei-db/tools/cmd/seidb/operations/flatkv_open.go Outdated
@codecov

codecov Bot commented May 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.76471% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.13%. Comparing base (4ad975b) to head (6a4388b).

Files with missing lines Patch % Lines
sei-db/tools/cmd/seidb/operations/flatkv_open.go 62.26% 12 Missing and 8 partials ⚠️
sei-db/state_db/sc/flatkv/store_catchup.go 60.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3337      +/-   ##
==========================================
+ Coverage   59.07%   59.13%   +0.05%     
==========================================
  Files        2099     2097       -2     
  Lines      172988   172326     -662     
==========================================
- Hits       102195   101903     -292     
+ Misses      61922    61556     -366     
+ Partials     8871     8867       -4     
Flag Coverage Δ
sei-chain-pr 55.00% <61.76%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/flatkv/store_catchup.go 60.36% <60.00%> (-0.06%) ⬇️
sei-db/tools/cmd/seidb/operations/flatkv_open.go 65.97% <62.26%> (-1.58%) ⬇️

... and 62 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +32 to +36
// errSourceChurning marks transient races where the source FlatKV directory
// mutates (snapshot pruned, WAL truncated) between our reads. It is the
// sentinel that prepareFlatKVToolingCloneWith uses to decide whether to
// retry instead of bailing out.
var errSourceChurning = errors.New("flatkv source kept churning during clone")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this PR. As a future task, we should evaluate and update the flatKV snapshot threading model so that such races are not possible. Even if we're safe now, having to think about potential races makes safety a lot harder to reason about.

@blindchaser blindchaser added this pull request to the merge queue May 5, 2026
Merged via the queue into main with commit 3b7e0d3 May 5, 2026
45 of 54 checks passed
@blindchaser blindchaser deleted the yiren/flatkv-tool branch May 5, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants