Fsync directory after cleanup #28604

ywelsch · 2018-02-09T15:02:09Z

Relates to #28435

After copying over the Lucene segments during peer recovery, we call cleanupAndVerify which removes all other files in the directory and which then calls getMetadata to check if the resulting files are a proper index. There are two issues with this:

the directory is not fsynced after the deletions, so that the call to getMetadata, which lists files in the directory, can get a stale view, possibly seeing a deleted corruption marker (which leads to the exception seen in [CI] CorruptedFileIT#testReplicaCorruption fails on Windows #28435)
failing to delete a corruption marker should result in a hard failure, as the shard is otherwise unusable.

dnhatn

LGTM.

After copying over the Lucene segments during peer recovery, we call cleanupAndVerify which removes all other files in the directory and which then calls getMetadata to check if the resulting files are a proper index. There are two issues with this: - the directory is not fsynced after the deletions, so that the call to getMetadata, which lists files in the directory, can get a stale view, possibly seeing a deleted corruption marker (which leads to the exception seen in #28435) - failing to delete a corruption marker should result in a hard failure, as the shard is otherwise unusable.

Fsync directory after cleanup

e5b25cc

ywelsch added :Internal v7.0.0 v6.3.0 labels Feb 9, 2018

ywelsch requested a review from dnhatn February 9, 2018 15:02

dnhatn mentioned this pull request Feb 9, 2018

Require translogUUID when reading global checkpoint #28587

Merged

dnhatn approved these changes Feb 9, 2018

View reviewed changes

ywelsch merged commit 5735e08 into elastic:master Feb 9, 2018

dnhatn mentioned this pull request Feb 9, 2018

[CI] CorruptedFileIT#testReplicaCorruption fails on Windows #28435

Closed

ywelsch added the v6.2.3 label Feb 15, 2018

ywelsch mentioned this pull request Feb 15, 2018

org.elasticsearch.index.store.CorruptedFileIT.testReplicaCorruption fails on 6.2 on windows #28689

Closed

clintongormley added v6.2.2 >bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed v6.2.3 labels Feb 17, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fsync directory after cleanup #28604

Fsync directory after cleanup #28604

ywelsch commented Feb 9, 2018

dnhatn left a comment

Fsync directory after cleanup #28604

Fsync directory after cleanup #28604

Conversation

ywelsch commented Feb 9, 2018

dnhatn left a comment

Choose a reason for hiding this comment