New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix recovery issue in the code finding the latest check point #8418
Conversation
In the case there are only log files with versions greater than zero and no checkpoints in any of those, we mistakenly report that no recovery is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement to me.
@@ -62,7 +62,7 @@ public LatestCheckPoint find( long fromVersionBackwards ) throws IOException | |||
LogVersionedStoreChannel channel = PhysicalLogFile.tryOpenForVersion( logFiles, fileSystem, version ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future: Would be better if this actually threw the IOException and we caught only FileNotFoundException specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense, shall we investigate that in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps not at this point in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whenever we got some time
Whereas I think this is a good and correct change, one has to wonder how the db can get into a state where there are only tx log version > 0 AND none of them contains a checkpoint. This means that there have been rotations and pruning taking place and that pruning have actually pruned the last checkpont. Or, is this an issue with store copy/backup only? I could see that happening there perhaps. In any case this make recovery safer than it was before, whether or not there's an additional log pruning issue somewhere. |
@tinwelint This has been seen after store copy when pulling txs. Mistakenly no checkpoint was written into the log, but the recovery code failed to figure out that recovery was needed, which is unsafe. |
In the case there are only log files with versions greater than zero
and no checkpoints in any of those, we mistakenly report that no
recovery is needed.
changelog: Fix a bug that could prevent recovery from finding the latest check point record in the logs, preventing adequate recovery of the store.