Skip to content

fix: pruning goroutine lifecycle and prune failure snapshot#2800

Merged
blindchaser merged 7 commits intomainfrom
yiren/fix-prune
Feb 18, 2026
Merged

fix: pruning goroutine lifecycle and prune failure snapshot#2800
blindchaser merged 7 commits intomainfrom
yiren/fix-prune

Conversation

@blindchaser
Copy link
Contributor

@blindchaser blindchaser commented Feb 4, 2026

Describe your changes and provide context

  • Wrap StateStore with PrunableStateStore that stops the pruning goroutine before closing the underlying database
  • Remove redundant stateStore.Close() from HandleClose() since cms.Close() already handles it
  • Add nil checks in Prune() methods for pebbledb/rocksdb as an additional safety layer
  • Use sync.Once to ensure Start/Stop/Close operations are safe to call multiple times
  • Prune old memiavl snapshots even when snapshot rewrite fails

Testing performed to validate your change

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 21, 2026, 10:44 PM

@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.28%. Comparing base (1bcbb0e) to head (f5da788).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/state_db/ss/composite/store.go 60.00% 4 Missing and 2 partials ⚠️
sei-db/state_db/ss/pruning/manager.go 88.57% 2 Missing and 2 partials ⚠️
sei-db/db_engine/pebbledb/mvcc/db.go 0.00% 1 Missing and 1 partial ⚠️
sei-db/db_engine/rocksdb/mvcc/db.go 0.00% 1 Missing and 1 partial ⚠️
sei-db/state_db/sc/memiavl/db.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2800      +/-   ##
==========================================
- Coverage   57.30%   57.28%   -0.03%     
==========================================
  Files        2089     2089              
  Lines      172183   172210      +27     
==========================================
- Hits        98669    98649      -20     
- Misses      64709    64742      +33     
- Partials     8805     8819      +14     
Flag Coverage Δ
sei-chain 52.77% <74.07%> (-0.02%) ⬇️
sei-cosmos 48.15% <ø> (ø)
sei-db 68.42% <0.00%> (-0.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
app/app.go 71.24% <ø> (+0.14%) ⬆️
sei-db/db_engine/pebbledb/mvcc/db.go 63.92% <0.00%> (-0.23%) ⬇️
sei-db/db_engine/rocksdb/mvcc/db.go 57.31% <0.00%> (-0.47%) ⬇️
sei-db/state_db/sc/memiavl/db.go 65.63% <0.00%> (-0.21%) ⬇️
sei-db/state_db/ss/pruning/manager.go 90.69% <88.57%> (+8.87%) ⬆️
sei-db/state_db/ss/composite/store.go 49.69% <60.00%> (+0.62%) ⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

default:
}

pruneStartTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
}
m.startOnce.Do(func() {
m.wg.Add(1)
go m.pruneLoop()

Check notice

Code scanning / CodeQL

Spawning a Go routine Note

Spawning a Go routine may be a possible source of non-determinism
time.Sleep(time.Duration(m.pruneInterval+randomDelay) * time.Second)
// Generate a random percentage (between 0% and 100%) of the fixed interval as a delay
randomPercentage := rand.Float64()
randomDelay := int64(float64(m.pruneInterval) * randomPercentage)

Check notice

Code scanning / CodeQL

Floating point arithmetic Note

Floating point arithmetic operations are not associative and a possible source of non-determinism
@blindchaser blindchaser force-pushed the yiren/fix-prune branch 3 times, most recently from b1ef012 to 9dba675 Compare February 17, 2026 15:49
… Close

- Add proper Stop() with stopCh/WaitGroup to pruning Manager for graceful shutdown
- Save pruning manager in CompositeStateStore and stop it on Close (idempotent)
- Add defensive nil checks in pebbledb/rocksdb Prune() for closed databases
- Remove duplicate stateStore.Close() in app HandleClose (already closed by BaseApp)

Co-authored-by: Cursor <cursoragent@cursor.com>
blindchaser and others added 2 commits February 17, 2026 11:06
Previously, pruneSnapshots() was only called after a successful snapshot
rewrite. If snapshot creation kept failing, old snapshots would accumulate
indefinitely. Now we prune old snapshots regardless of rewrite outcome.

Co-authored-by: Cursor <cursoragent@cursor.com>
@blindchaser blindchaser changed the title fix: pruning goroutine lifecycle management to prevent nil pointer on Close fix: pruning goroutine lifecycle and prune failure snapshot Feb 18, 2026
Copy link
Contributor

@cody-littley cody-littley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

What's the reason for the random delay in the pruning loop? Is the idea to ensure that multiple DBs don't always wake up and prune at the same time?

@blindchaser
Copy link
Contributor Author

LGTM.

What's the reason for the random delay in the pruning loop? Is the idea to ensure that multiple DBs don't always wake up and prune at the same time?

exactly, without this they'd all wake up and hit disk at the same time, may causing I/O spikes

@blindchaser blindchaser merged commit c6621a3 into main Feb 18, 2026
55 of 58 checks passed
@blindchaser blindchaser deleted the yiren/fix-prune branch February 18, 2026 20:54
@blindchaser
Copy link
Contributor Author

/backport

@seidroid
Copy link

seidroid bot commented Feb 21, 2026

Created backport PR for release/v6.3:

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-2800-to-release/v6.3
git worktree add --checkout .worktree/backport-2800-to-release/v6.3 backport-2800-to-release/v6.3
cd .worktree/backport-2800-to-release/v6.3
git reset --hard HEAD^
git cherry-pick -x c6621a327deee051f79f44fbce328c624ea80ccf
git push --force-with-lease

blindchaser added a commit that referenced this pull request Feb 21, 2026
- Wrap StateStore with PrunableStateStore that stops the pruning
goroutine before closing the underlying database
- Remove redundant stateStore.Close() from HandleClose() since
cms.Close() already handles it
- Add nil checks in Prune() methods for pebbledb/rocksdb as an
additional safety layer
- Use sync.Once to ensure Start/Stop/Close operations are safe to call
multiple times
- Prune old memiavl snapshots even when snapshot rewrite fails

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
(cherry picked from commit c6621a3)
blindchaser added a commit that referenced this pull request Feb 22, 2026
…ailure snapshot (#2947)

Backport of #2800 to `release/v6.3`.

---------

Co-authored-by: yirenz <blindchaser@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: blindchaser <zengyiren0@gmail.com>
yzang2019 pushed a commit that referenced this pull request Feb 25, 2026
- Wrap StateStore with PrunableStateStore that stops the pruning
goroutine before closing the underlying database
- Remove redundant stateStore.Close() from HandleClose() since
cms.Close() already handles it
- Add nil checks in Prune() methods for pebbledb/rocksdb as an
additional safety layer
- Use sync.Once to ensure Start/Stop/Close operations are safe to call
multiple times
- Prune old memiavl snapshots even when snapshot rewrite fails

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants