-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): fix WAL segment housekeeping edge case leading to a failure to write to WAL table #3381
Conversation
79d6938
to
9e8c30c
Compare
9e8c30c
to
2a22f3f
Compare
It's a little difficult to tell from the diff what actually changed in the logic. The old logic accumulated state about what walId/segmentId is next to be applied in a The bug was in this code: // If the current segment is locked, scroll to next WAL id.
final int walIdLimit = segmentLocked ? walId + 1 : walId; // <--- actual bug
while (committedWalId < walIdLimit && ++committedWalSegmentIndex < committedSegmentSize) {
long sequencerPair = sequencerWalIDSegmentIDPairs.get(committedWalSegmentIndex);
committedWalId = Numbers.decodeHighInt(sequencerPair);
committedSegmentId = Numbers.decodeLowInt(sequencerPair);
} In the scenario from the test case it ended up holding This bad state was set up in the previous iteration of the loop and caused the wrong branch in the logic to be executed. if (walId == committedWalId) {
...
} else if (!segmentLocked) {
... // this code got executed when it should not have been.
} Ultimately it was easier to replace the |
core/src/test/java/io/questdb/test/griffin/wal/WalPurgeJobTest.java
Outdated
Show resolved
Hide resolved
…ries and segments.
[PR Coverage check]😍 pass : 169 / 173 (97.69%) file detail
|
Fixed a possible but unlikely scenario where the WalPurgeJob could delete a wal segment before it was applied.
The scenario consists of the following:
(1,6)
.(1,7)
is created and then closed and unlocked.(1,7)
as finds it to be unlocked.Faulty logic assumed that a locked segment would always be the last segment for a WAL, when this is not the case.
The logic has now been simplified, extracted out to a separate class for testability and the bug fixed.