Skip to content

feat: make snapshot prune async#2814

Merged
blindchaser merged 3 commits intomainfrom
yiren/main-prune-async
Feb 5, 2026
Merged

feat: make snapshot prune async#2814
blindchaser merged 3 commits intomainfrom
yiren/main-prune-async

Conversation

@blindchaser
Copy link
Contributor

@blindchaser blindchaser commented Feb 5, 2026

  • Run pruneSnapshots() asynchronously (go db.pruneSnapshots()) to avoid blocking Commit
  • Use TryLock instead of Lock so overlapping prunes are skipped rather than queued
  • Add logging for prune start/skip/completion with duration
  • Add idempotent Close() guard to prevent double-close
  • Update test to use Eventually on directory state

Describe your changes and provide context

Testing performed to validate your change

- Change pruneSnapshots() to run asynchronously (go db.pruneSnapshots())
- Replace pruneSnapshotLock mutex with atomic.Bool using CompareAndSwap
- Add logging for prune start/completion with duration
- Update Close() to wait for ongoing prune before shutdown
- Update test to use Eventually with pruningInProgress.Load()
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 5, 2026, 9:54 PM

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 353c42d82f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +517 to +520
// CAS: only one prune can run at a time
if !db.pruningInProgress.CompareAndSwap(false, true) {
db.logger.Info("pruneSnapshots skipped, previous prune still in progress")
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ensure prune requests aren’t dropped

The new CAS guard returns early when a prune is already running. That means a snapshot rewrite that finishes while a prune is in progress will log “skipped” and never schedule a follow‑up prune. Because pruneSnapshots() bases its retention on currentVersion at start, any snapshots created after it begins won’t be considered, leaving more than snapshotKeepRecent snapshots on disk until the next rewrite happens. If no further rewrite occurs, stale snapshots are never pruned (regression versus the previous mutex, which queued prunes). Consider queuing a rerun (e.g., loop until no more pending work) or scheduling another prune after the current one completes.

Useful? React with 👍 / 👎.

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.61%. Comparing base (7e01541) to head (a55512a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/state_db/sc/memiavl/db.go 84.61% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2814      +/-   ##
==========================================
+ Coverage   46.95%   56.61%   +9.65%     
==========================================
  Files        1967     2033      +66     
  Lines      160924   166263    +5339     
==========================================
+ Hits        75567    94131   +18564     
+ Misses      78790    63864   -14926     
- Partials     6567     8268    +1701     
Flag Coverage Δ
sei-chain 41.49% <84.61%> (-0.05%) ⬇️
sei-cosmos 48.13% <ø> (+<0.01%) ⬆️
sei-db 68.72% <ø> (ø)
sei-tendermint 58.11% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/memiavl/db.go 66.00% <84.61%> (+0.21%) ⬆️

... and 314 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

db.pruneSnapshotLock.Lock()
defer db.pruneSnapshotLock.Unlock()
// CAS: only one prune can run at a time
if !db.pruningInProgress.CompareAndSwap(false, true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works, but fwiw, there's also a TryLock option (fyi):

    if !db.pruningMu.TryLock() {
        db.logger.Info("pruneSnapshots skipped, previous prune still in progress")
        return
    }
    defer db.pruningMu.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems even simpler than cas way

db.logger.Info("pruneSnapshots started")
startTime := time.Now()
defer func() {
db.logger.Info("pruneSnapshots completed", "duration", fmt.Sprintf("%.2fs", time.Since(startTime).Seconds()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration_sec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}
db.closed = true
// Wait for any ongoing prune to finish, then block new prunes
for !db.pruningInProgress.CompareAndSwap(false, true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we used a mutex, this could be

   db.pruningMu.Lock() // blocks until pruning finishes, no spinning needed
   db.pruningMu.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take this approach

Copy link
Contributor

@stevenlanders stevenlanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments but lgtm

@blindchaser blindchaser enabled auto-merge (squash) February 5, 2026 21:56
defer db.pruneSnapshotLock.Unlock()

db.logger.Info("pruneSnapshots started")
startTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
@blindchaser blindchaser changed the title feat: make snapshot prune async with CAS-based concurrency control feat: make snapshot prune async Feb 5, 2026
@blindchaser blindchaser merged commit 5c26262 into main Feb 5, 2026
47 of 48 checks passed
@blindchaser blindchaser deleted the yiren/main-prune-async branch February 5, 2026 23:24
@seidroid
Copy link

seidroid bot commented Feb 5, 2026

Created backport PR for release/v6.3:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-2814-to-release/v6.3
git worktree add --checkout .worktree/backport-2814-to-release/v6.3 backport-2814-to-release/v6.3
cd .worktree/backport-2814-to-release/v6.3
git reset --hard HEAD^
git cherry-pick -x 5c26262c6e31fb0f25a8f26a59de8cbad2ca43d4
git push --force-with-lease

blindchaser added a commit that referenced this pull request Feb 5, 2026
…2814)

- Run pruneSnapshots() asynchronously (go db.pruneSnapshots()) to avoid
blocking Commit
- Use TryLock instead of Lock so overlapping prunes are skipped rather
than queued
- Add logging for prune start/skip/completion with duration
- Add idempotent Close() guard to prevent double-close
- Update test to use Eventually on directory state

(cherry picked from commit 5c26262)
blindchaser added a commit that referenced this pull request Feb 5, 2026
…2814)

- Run pruneSnapshots() asynchronously (go db.pruneSnapshots()) to avoid
blocking Commit
- Use TryLock instead of Lock so overlapping prunes are skipped rather
than queued
- Add logging for prune start/skip/completion with duration
- Add idempotent Close() guard to prevent double-close
- Update test to use Eventually on directory state

(cherry picked from commit 5c26262)
blindchaser added a commit that referenced this pull request Feb 6, 2026
Backport of #2814 to `release/v6.3`.

Co-authored-by: yirenz <blindchaser@users.noreply.github.com>
yzang2019 pushed a commit that referenced this pull request Feb 25, 2026
…2814)

- Run pruneSnapshots() asynchronously (go db.pruneSnapshots()) to avoid
blocking Commit
- Use TryLock instead of Lock so overlapping prunes are skipped rather
than queued
- Add logging for prune start/skip/completion with duration
- Add idempotent Close() guard to prevent double-close
- Update test to use Eventually on directory state

## Describe your changes and provide context

## Testing performed to validate your change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants