Skip to content

Commit

Permalink
Fsync directory after cleanup (#28604)
Browse files Browse the repository at this point in the history
After copying over the Lucene segments during peer recovery, we call cleanupAndVerify which removes all other files in the directory and which then calls getMetadata to check if the resulting files are a proper index. There are two issues with this:

- the directory is not fsynced after the deletions, so that the call to getMetadata, which lists files in the directory, can get a stale view, possibly seeing a deleted corruption marker (which leads to the exception seen in #28435)
- failing to delete a corruption marker should result in a hard failure, as the shard is otherwise unusable.
  • Loading branch information
ywelsch committed Feb 9, 2018
1 parent 231fd3c commit 5735e08
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions server/src/main/java/org/elasticsearch/index/store/Store.java
Original file line number Diff line number Diff line change
Expand Up @@ -661,17 +661,17 @@ private static void failIfCorrupted(Directory directory, ShardId shardId) throws
public void cleanupAndVerify(String reason, MetadataSnapshot sourceMetaData) throws IOException {
metadataLock.writeLock().lock();
try (Lock writeLock = directory.obtainLock(IndexWriter.WRITE_LOCK_NAME)) {
final StoreDirectory dir = directory;
for (String existingFile : dir.listAll()) {
for (String existingFile : directory.listAll()) {
if (Store.isAutogenerated(existingFile) || sourceMetaData.contains(existingFile)) {
continue; // don't delete snapshot file, or the checksums file (note, this is extra protection since the Store won't delete checksum)
}
try {
dir.deleteFile(reason, existingFile);
directory.deleteFile(reason, existingFile);
// FNF should not happen since we hold a write lock?
} catch (IOException ex) {
if (existingFile.startsWith(IndexFileNames.SEGMENTS)
|| existingFile.equals(IndexFileNames.OLD_SEGMENTS_GEN)) {
|| existingFile.equals(IndexFileNames.OLD_SEGMENTS_GEN)
|| existingFile.startsWith(CORRUPTED)) {
// TODO do we need to also fail this if we can't delete the pending commit file?
// if one of those files can't be deleted we better fail the cleanup otherwise we might leave an old commit point around?
throw new IllegalStateException("Can't delete " + existingFile + " - cleanup failed", ex);
Expand All @@ -680,6 +680,7 @@ public void cleanupAndVerify(String reason, MetadataSnapshot sourceMetaData) thr
// ignore, we don't really care, will get deleted later on
}
}
directory.syncMetaData();
final Store.MetadataSnapshot metadataOrEmpty = getMetadata(null);
verifyAfterCleanup(sourceMetaData, metadataOrEmpty);
} finally {
Expand Down

0 comments on commit 5735e08

Please sign in to comment.