Backport release/v6.3: fix: pruning goroutine lifecycle and prune failure snapshot#2947
Conversation
|
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-2800-to-release/v6.3
git worktree add --checkout .worktree/backport-2800-to-release/v6.3 backport-2800-to-release/v6.3
cd .worktree/backport-2800-to-release/v6.3
git reset --hard HEAD^
git cherry-pick -x c6621a327deee051f79f44fbce328c624ea80ccf
git push --force-with-lease |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (38.45%) is below the target coverage (40.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release/v6.3 #2947 +/- ##
=============================================
Coverage 43.46% 43.46%
=============================================
Files 1866 1866
Lines 155380 155420 +40
=============================================
+ Hits 67537 67560 +23
- Misses 81801 81808 +7
- Partials 6042 6052 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
- Wrap StateStore with PrunableStateStore that stops the pruning goroutine before closing the underlying database - Remove redundant stateStore.Close() from HandleClose() since cms.Close() already handles it - Add nil checks in Prune() methods for pebbledb/rocksdb as an additional safety layer - Use sync.Once to ensure Start/Stop/Close operations are safe to call multiple times - Prune old memiavl snapshots even when snapshot rewrite fails --------- Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit c6621a3)
7d6e36c to
2efb7e4
Compare
| default: | ||
| } | ||
|
|
||
| pruneStartTime := time.Now() |
Check warning
Code scanning / CodeQL
Calling the system time Warning
| m.logger.Info(fmt.Sprintf("Pruned state store till version %d took %s", pruneVersion, time.Since(pruneStartTime))) | ||
| m.startOnce.Do(func() { | ||
| m.wg.Add(1) | ||
| go m.pruneLoop() |
Check notice
Code scanning / CodeQL
Spawning a Go routine Note
| time.Sleep(time.Duration(m.pruneInterval+randomDelay) * time.Second) | ||
| // Generate a random percentage (between 0% and 100%) of the fixed interval as a delay | ||
| randomPercentage := rand.Float64() | ||
| randomDelay := int64(float64(m.pruneInterval) * randomPercentage) |
Check notice
Code scanning / CodeQL
Floating point arithmetic Note
|
|
||
| // Stop gracefully stops the pruning goroutine and waits for it to exit. | ||
| // Safe to call multiple times (idempotent). | ||
| func (m *Manager) Stop() { |
There was a problem hiding this comment.
We are not calling this at all in non-test codepath execution.
There was a problem hiding this comment.
Stop now wired into non-test close path,second comment addressed by lifecycle guarantee
sei-db/ss/rocksdb/db.go
Outdated
| // and trim history when possible. | ||
| func (db *Database) Prune(version int64) error { | ||
| // Defensive check: ensure database is not closed | ||
| if db.storage == nil { |
There was a problem hiding this comment.
This needs to be guarded; concurrent goroutines can still hit nil pointer dereference.
There was a problem hiding this comment.
This is managed by the upper lifecycle, but yeah I added an atomic check defensively.
Wrap the state store with a prunable wrapper so Close() always stops the pruning goroutine before closing the underlying DB, preventing prune/close races during shutdown. Co-authored-by: Cursor <cursoragent@cursor.com>
…eam handler nil dereference to align with main
Backport of #2800 to
release/v6.3.