Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recovery issue in the code finding the latest check point #8418

Merged
merged 1 commit into from Nov 23, 2016
Merged

Fix recovery issue in the code finding the latest check point #8418

merged 1 commit into from Nov 23, 2016

Conversation

davidegrohmann
Copy link
Contributor

@davidegrohmann davidegrohmann commented Nov 22, 2016

In the case there are only log files with versions greater than zero
and no checkpoints in any of those, we mistakenly report that no
recovery is needed.

changelog: Fix a bug that could prevent recovery from finding the latest check point record in the logs, preventing adequate recovery of the store.

In the case there are only log files with versions greater than zero
and no checkpoints in any of those, we mistakenly report that no
recovery is needed.
Copy link
Contributor

@martinfurmanski martinfurmanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an improvement to me.

@@ -62,7 +62,7 @@ public LatestCheckPoint find( long fromVersionBackwards ) throws IOException
LogVersionedStoreChannel channel = PhysicalLogFile.tryOpenForVersion( logFiles, fileSystem, version );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future: Would be better if this actually threw the IOException and we caught only FileNotFoundException specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense, shall we investigate that in a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not at this point in time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever we got some time

@tinwelint
Copy link
Member

tinwelint commented Nov 23, 2016

Whereas I think this is a good and correct change, one has to wonder how the db can get into a state where there are only tx log version > 0 AND none of them contains a checkpoint. This means that there have been rotations and pruning taking place and that pruning have actually pruned the last checkpont.

Or, is this an issue with store copy/backup only? I could see that happening there perhaps.

In any case this make recovery safer than it was before, whether or not there's an additional log pruning issue somewhere.

@davidegrohmann
Copy link
Contributor Author

@tinwelint This has been seen after store copy when pulling txs. Mistakenly no checkpoint was written into the log, but the recovery code failed to figure out that recovery was needed, which is unsafe.

@tinwelint tinwelint merged commit acc5c40 into neo4j:2.3 Nov 23, 2016
@davidegrohmann davidegrohmann deleted the 2.3-fix-recovery-bug branch November 24, 2016 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants