WAL utility tears itself down after the first error#3006
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3006 +/- ##
==========================================
- Coverage 58.26% 58.13% -0.14%
==========================================
Files 2108 2113 +5
Lines 173664 174000 +336
==========================================
- Hits 101181 101147 -34
- Misses 63456 63798 +342
- Partials 9027 9055 +28
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| @@ -341,13 +372,23 @@ func (walLog *WAL[T]) handleTruncate(req *truncateRequest) { | |||
| err = walLog.log.TruncateBack(req.index) | |||
| } | |||
| if err != nil { | |||
| req.errChan <- fmt.Errorf("failed to truncate: %w", err) | |||
| err = fmt.Errorf("failed to truncate: %w", err) | |||
| if strings.Contains(err.Error(), "out of range") { | |||
There was a problem hiding this comment.
in case tidwall/wal library changes its error message wording in the future, how about:
if strings.Contains(err.Error(), "out of range") {
req.errChan <- fmt.Errorf("failed to truncate: %w", err)
return
}
walLog.reportFatalError(fmt.Errorf("failed to truncate: %w", err), req.errChan)
return
There was a problem hiding this comment.
I wrapped the error inside the "out of range" block, but I'm not sure if that is what you were asking me to do. New code is below:
if err != nil {
err = fmt.Errorf("failed to truncate: %w", err)
if strings.Contains(err.Error(), "out of range") {
err = fmt.Errorf("out of range truncate error: %w", err)
req.errChan <- err
return
}
walLog.reportFatalError(err, req.errChan)
return
}
| // Store on heap so the pointer remains valid after this function returns. | ||
| p := new(error) | ||
| *p = err | ||
| walLog.asyncError.Store(p) |
There was a problem hiding this comment.
should we call walLog.cancel() in the reportFatalError()?
There was a problem hiding this comment.
Explicitly calling cancel() is not required. When asyncError is set, the loop exits, and immediately after the loop exits the context is cancelled.
for running && walLog.asyncError.Load() == nil {
select {
case <-walLog.ctx.Done():
running = false
case req := <-walLog.writeChan:
walLog.handleWrite(req)
case req := <-walLog.truncateChan:
walLog.handleTruncate(req)
case <-pruneChan:
walLog.prune()
case <-walLog.closeReqChan:
running = false
}
}
walLog.cancel()
| // Store on heap so the pointer remains valid after this function returns. | ||
| p := new(error) | ||
| *p = err | ||
| walLog.asyncError.Store(p) |
There was a problem hiding this comment.
If we have multiple errors, will the later errors replace the previous one here? And are we OK with that?
There was a problem hiding this comment.
Went back and checked, and there was one edge case where this was possible (i.e. during drain() when we are tearing down the system). I fixed this issue, and so now it should never be possible for asyncError to be set more than once.
## Describe your changes and provide context https://linear.app/seilabs/issue/STO-397/address-wal-feedback Per feedback, the desired behavior of the WAL utility is that it should tear itself down after it encounters the first error. ## Testing performed to validate your change Unit test coverage. --------- Co-authored-by: Cody Littley <cody.littley@seinetwork.io>
Describe your changes and provide context
https://linear.app/seilabs/issue/STO-397/address-wal-feedback
Per feedback, the desired behavior of the WAL utility is that it should tear itself down after it encounters the first error.
Testing performed to validate your change
Unit test coverage.