fix: pruning goroutine lifecycle and prune failure snapshot#2800
fix: pruning goroutine lifecycle and prune failure snapshot#2800blindchaser merged 7 commits intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2800 +/- ##
==========================================
- Coverage 57.30% 57.28% -0.03%
==========================================
Files 2089 2089
Lines 172183 172210 +27
==========================================
- Hits 98669 98649 -20
- Misses 64709 64742 +33
- Partials 8805 8819 +14
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| default: | ||
| } | ||
|
|
||
| pruneStartTime := time.Now() |
Check warning
Code scanning / CodeQL
Calling the system time Warning
| } | ||
| m.startOnce.Do(func() { | ||
| m.wg.Add(1) | ||
| go m.pruneLoop() |
Check notice
Code scanning / CodeQL
Spawning a Go routine Note
| time.Sleep(time.Duration(m.pruneInterval+randomDelay) * time.Second) | ||
| // Generate a random percentage (between 0% and 100%) of the fixed interval as a delay | ||
| randomPercentage := rand.Float64() | ||
| randomDelay := int64(float64(m.pruneInterval) * randomPercentage) |
Check notice
Code scanning / CodeQL
Floating point arithmetic Note
b1ef012 to
9dba675
Compare
… Close - Add proper Stop() with stopCh/WaitGroup to pruning Manager for graceful shutdown - Save pruning manager in CompositeStateStore and stop it on Close (idempotent) - Add defensive nil checks in pebbledb/rocksdb Prune() for closed databases - Remove duplicate stateStore.Close() in app HandleClose (already closed by BaseApp) Co-authored-by: Cursor <cursoragent@cursor.com>
9dba675 to
c4fa85d
Compare
Previously, pruneSnapshots() was only called after a successful snapshot rewrite. If snapshot creation kept failing, old snapshots would accumulate indefinitely. Now we prune old snapshots regardless of rewrite outcome. Co-authored-by: Cursor <cursoragent@cursor.com>
cody-littley
left a comment
There was a problem hiding this comment.
LGTM.
What's the reason for the random delay in the pruning loop? Is the idea to ensure that multiple DBs don't always wake up and prune at the same time?
exactly, without this they'd all wake up and hit disk at the same time, may causing I/O spikes |
|
/backport |
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-2800-to-release/v6.3
git worktree add --checkout .worktree/backport-2800-to-release/v6.3 backport-2800-to-release/v6.3
cd .worktree/backport-2800-to-release/v6.3
git reset --hard HEAD^
git cherry-pick -x c6621a327deee051f79f44fbce328c624ea80ccf
git push --force-with-lease |
- Wrap StateStore with PrunableStateStore that stops the pruning goroutine before closing the underlying database - Remove redundant stateStore.Close() from HandleClose() since cms.Close() already handles it - Add nil checks in Prune() methods for pebbledb/rocksdb as an additional safety layer - Use sync.Once to ensure Start/Stop/Close operations are safe to call multiple times - Prune old memiavl snapshots even when snapshot rewrite fails --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit c6621a3)
- Wrap StateStore with PrunableStateStore that stops the pruning goroutine before closing the underlying database - Remove redundant stateStore.Close() from HandleClose() since cms.Close() already handles it - Add nil checks in Prune() methods for pebbledb/rocksdb as an additional safety layer - Use sync.Once to ensure Start/Stop/Close operations are safe to call multiple times - Prune old memiavl snapshots even when snapshot rewrite fails --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Describe your changes and provide context
Testing performed to validate your change