Fix recovery issue in the code finding the latest check point #8418

davidegrohmann · 2016-11-22T14:52:55Z

In the case there are only log files with versions greater than zero
and no checkpoints in any of those, we mistakenly report that no
recovery is needed.

changelog: Fix a bug that could prevent recovery from finding the latest check point record in the logs, preventing adequate recovery of the store.

In the case there are only log files with versions greater than zero and no checkpoints in any of those, we mistakenly report that no recovery is needed.

martinfurmanski

Looks like an improvement to me.

martinfurmanski · 2016-11-22T20:04:34Z

community/kernel/src/main/java/org/neo4j/kernel/recovery/LatestCheckPointFinder.java

@@ -62,7 +62,7 @@ public LatestCheckPoint find( long fromVersionBackwards ) throws IOException
            LogVersionedStoreChannel channel = PhysicalLogFile.tryOpenForVersion( logFiles, fileSystem, version );


For the future: Would be better if this actually threw the IOException and we caught only FileNotFoundException specifically.

it makes sense, shall we investigate that in a separate PR?

Perhaps not at this point in time?

whenever we got some time

tinwelint · 2016-11-23T09:49:20Z

Whereas I think this is a good and correct change, one has to wonder how the db can get into a state where there are only tx log version > 0 AND none of them contains a checkpoint. This means that there have been rotations and pruning taking place and that pruning have actually pruned the last checkpont.

Or, is this an issue with store copy/backup only? I could see that happening there perhaps.

In any case this make recovery safer than it was before, whether or not there's an additional log pruning issue somewhere.

davidegrohmann · 2016-11-23T10:00:09Z

@tinwelint This has been seen after store copy when pulling txs. Mistakenly no checkpoint was written into the log, but the recovery code failed to figure out that recovery was needed, which is unsafe.

Fix recovery issue in the code finding the latest check point

ad30989

In the case there are only log files with versions greater than zero and no checkpoints in any of those, we mistakenly report that no recovery is needed.

davidegrohmann added 2.3 bug kernel labels Nov 22, 2016

martinfurmanski approved these changes Nov 22, 2016

View reviewed changes

lutovich assigned tinwelint Nov 23, 2016

tinwelint merged commit acc5c40 into neo4j:2.3 Nov 23, 2016

davidegrohmann deleted the 2.3-fix-recovery-bug branch November 24, 2016 10:46

chrisvest added the changelog label Jan 23, 2017

tinwelint added the team-kernel label Aug 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix recovery issue in the code finding the latest check point #8418

Fix recovery issue in the code finding the latest check point #8418

davidegrohmann commented Nov 22, 2016 •

edited by chrisvest

martinfurmanski left a comment

martinfurmanski Nov 22, 2016

davidegrohmann Nov 22, 2016

martinfurmanski Nov 22, 2016

davidegrohmann Nov 22, 2016

tinwelint commented Nov 23, 2016 •

edited

davidegrohmann commented Nov 23, 2016

		@@ -62,7 +62,7 @@ public LatestCheckPoint find( long fromVersionBackwards ) throws IOException
		LogVersionedStoreChannel channel = PhysicalLogFile.tryOpenForVersion( logFiles, fileSystem, version );

Fix recovery issue in the code finding the latest check point #8418

Fix recovery issue in the code finding the latest check point #8418

Conversation

davidegrohmann commented Nov 22, 2016 • edited by chrisvest

martinfurmanski left a comment

Choose a reason for hiding this comment

martinfurmanski Nov 22, 2016

Choose a reason for hiding this comment

davidegrohmann Nov 22, 2016

Choose a reason for hiding this comment

martinfurmanski Nov 22, 2016

Choose a reason for hiding this comment

davidegrohmann Nov 22, 2016

Choose a reason for hiding this comment

tinwelint commented Nov 23, 2016 • edited

davidegrohmann commented Nov 23, 2016

davidegrohmann commented Nov 22, 2016 •

edited by chrisvest

tinwelint commented Nov 23, 2016 •

edited