Snapshot restore should first remove unused files #20148

mikemccand · 2016-08-24T22:23:55Z

Spinoff from Lucene 6.2.0 upgrade: https://github.com/elastic/elasticsearch/pull/20147/files/0ccfe6978918b9a756b3c882613e8e392aafc7e1#r76148215

As of Lucene 6.2.0, Directory.createOutput now requires that the specified file does not already exist, but this caused test failures in the snapshot/restore tests because we do sometimes overwrite a file during restore in some cases.

I think ES should instead prune unused files up front before doing this?

The text was updated successfully, but these errors were encountered:

abeyad · 2016-08-25T04:22:04Z

It seems dangerous to prune files up front before writing to the same file. Given @imotov's comment here about how this could happen, I believe we should just throw an exception and not allow the restore to proceed. Pruning the files would amount to deleting data that the user added to the index and we probably shouldn't try to do anything too cute with the restore process trying to rename those files to get them restored. It seems dangerous in general to restore to an index already written to outside of the restore process. For example, what would happen right now if snapshot A was restored, which restored idx containing v1 of document 1, then we index v2 of document 1, then we try to restore snapshot B that already contained v2 of document 1 but with a different JSON source?

imotov · 2016-08-25T13:19:38Z

What I described is a rare but possible scenario. And in this scenario throwing an exception and notifying a user about diverging lifelines might make sense. However, there are two other more common scenarios that could lead to the same issue:

the index into which we restore is a complexity different index
it's the same index, but we lost primary some time after we took a snapshot, and are trying to restore into one of the former replicas.

I think throwing an exception in these 2 scenarios would lead to a bad user experience. Both scenarios are easy to detect in 5.0 since we now have both index uuids and shard allocation uuid. Unfortunately, we didn't store this information currently, so handling old snapshots might still be an issue.

abeyad · 2016-08-25T17:02:34Z

the index into which we restore is a complexity different index

@imotov in this scenario, isn't it still dangerous to overwrite existing index files? it still means we are overwriting some data. shouldn't the user be required to delete the index first and then proceed with a restore?

it's the same index, but we lost primary some time after we took a snapshot, and are trying to restore into one of the former replicas.

not sure i follow this scenario; if we lost the primary, then we should have promoted one of the replicas to primary? I'm sure you meant something else by it, I just didn't get it.

abeyad · 2016-08-25T17:04:44Z

Maybe the restore operation could have an optional force or overwrite option so we only overwrite existing files if that is set? That way at least its clear and the directive came from the user..

ywelsch · 2016-08-25T20:45:55Z

We currently only allow restoring into an existing index if that index is closed. If no index with same name already exists in the cluster, we select a fresh index uuid (and thus restore into a fresh folder on disk). I see restoring into a closed index as a feature when a shard has been corrupted on disk but a recent snapshot is available. By only restoring the corrupt/missing files, this might get you running again faster. As this is a really exceptional scenario, I think it justifies having to specify an explicit "override" option in that case.

abeyad · 2016-08-26T21:04:16Z

I discussed this with @s1monw. There are several ways we could end up with the same Lucene file name. For example, index to primary and replica, both primary and replica have the same documents but different segment files, snapshot primary, primary goes down, replica gets promoted, now try to restore into the index, we could end up with the same situation of trying to overwrite segments with different checksums.

Given the whole idea with restore is to forget about what's already there and restore the index to the state saved in the snapshot, what @mikemccand did in deleting the file if it already exists before restoring it is correct. I will remove the TODO comment in the code.

A longer term solution would be to use the recovery logic for restore, because that would provide us, amongst other things, the ability to be more robust in handling file restores and restoring them to temporary files first, then moving the older ones to backups and the restored files to the active ones. But implementing this is a much larger scope and not necessary here.

If there are no other concerns, I think this doesn't need any work currently and can be closed.

clintongormley · 2016-08-29T12:31:57Z

thanks @abeyad

imotov · 2016-08-29T14:08:42Z

I thought a bit more about it over the weekend and discussed it with @abeyad. It looks like we just made restore non-incremental. If this is indeed what we want, we should probably document this as a breaking change in 5.0 and remove code that deals with incremental restore in restore service. Alternatively, we can modify it the logic in restore to just delete files with non-matching checksums instead of wiping out entire directory.

clintongormley · 2016-08-29T14:09:21Z

Good point. Reopening

abeyad · 2016-08-29T19:12:08Z

Actually, restores are still incremental. The reason is, we only try to restore a file (by calling restoreFile) if the file was missing or different in the Lucene store. For files that exist in the store and are the same (same length / checksum), we do not try calling restoreFile on it (see https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L1559). Hence we would never try to first delete the good file because we would never even attempt a restore in the first place on that file.

mikemccand · 2016-08-29T20:14:14Z

I'm very confused :) I could swear I saw a test failure because we were in fact attempting to overwrite a pre-existing file here. Lucene no longer allows that, since it throws an exception as of 6.2.0 vs previously silently overwriting the pre-existing file.

abeyad · 2016-08-29T20:19:14Z

@mikemccand no you are absolutely correct. Basically, @imotov raised the concern that with your addition of IOUtils.deleteFilesIgnoringExceptions, that would delete all files already existing in the store before restoring them, even good copies of files, thereby rendering restores non-incremental. However, that is not the case because the only files on which we execute restoreFile are those for which the file is missing or its different than what is currently in the store. So we have not lost incremental restores. And your added delete invocation is needed to first delete those "different" files for which we already have a file of the same name but it has a different checksum and/or length.

mikemccand · 2016-08-29T20:21:57Z

Ahh OK I understand; thanks @abeyad.

I think this low level is the wrong place to delete the pre-existing file. Can't we remove the file at the higher-up-in-the-stack-place that determined that the file is different?

imotov · 2016-08-29T20:23:34Z

Yep. Sorry for confusion. I guess, we can re-close it then.

mikemccand · 2016-08-29T20:27:56Z

Hmm do we have a test that would fail if I had in fact broken incremental restore?

abeyad · 2016-08-29T20:37:28Z

I don't believe we do other than noticing a slow down in restore times. Its a good idea to add one though will require some work on setting up the test.

@mikemccand I created this small PR that moves the file deletion before trying to restore: #20220. I think we don't want to do the deletes at the point of knowing which files are different because we haven't necessarily kicked off the restore process upon making that determination, but this PR allows us to first delete the different files only, then proceed with the restore, as opposed to just trying to delete any file that comes our way before restoring. I would appreciate your feedback on it.

s1monw · 2016-08-30T07:00:01Z

@abeyad we can close this again right?

abeyad · 2016-08-30T13:29:59Z

@s1monw yes, i'm closing, we enhanced the implementation of this delete in #20220

mikemccand added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Aug 24, 2016

mikemccand mentioned this issue Aug 24, 2016

Upgrade to Lucene 6.2.0 #20147

Merged

clintongormley added blocker v5.0.0-beta1 labels Aug 25, 2016

clintongormley removed the v5.0.0-beta1 label Aug 29, 2016

clintongormley closed this as completed Aug 29, 2016

clintongormley reopened this Aug 29, 2016

clintongormley added the v5.0.0-beta1 label Aug 29, 2016

abeyad self-assigned this Aug 29, 2016

abeyad mentioned this issue Aug 29, 2016

Delete differing files in the store before restoring #20220

Merged

abeyad closed this as completed Aug 30, 2016

clintongormley added the v5.0.0-beta1 label Sep 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshot restore should first remove unused files #20148

Snapshot restore should first remove unused files #20148

mikemccand commented Aug 24, 2016

abeyad commented Aug 25, 2016

imotov commented Aug 25, 2016

abeyad commented Aug 25, 2016

abeyad commented Aug 25, 2016

ywelsch commented Aug 25, 2016

abeyad commented Aug 26, 2016

clintongormley commented Aug 29, 2016

imotov commented Aug 29, 2016

clintongormley commented Aug 29, 2016

abeyad commented Aug 29, 2016

mikemccand commented Aug 29, 2016

abeyad commented Aug 29, 2016

mikemccand commented Aug 29, 2016

imotov commented Aug 29, 2016

mikemccand commented Aug 29, 2016

abeyad commented Aug 29, 2016

s1monw commented Aug 30, 2016

abeyad commented Aug 30, 2016

Snapshot restore should first remove unused files #20148

Snapshot restore should first remove unused files #20148

Comments

mikemccand commented Aug 24, 2016

abeyad commented Aug 25, 2016

imotov commented Aug 25, 2016

abeyad commented Aug 25, 2016

abeyad commented Aug 25, 2016

ywelsch commented Aug 25, 2016

abeyad commented Aug 26, 2016

clintongormley commented Aug 29, 2016

imotov commented Aug 29, 2016

clintongormley commented Aug 29, 2016

abeyad commented Aug 29, 2016

mikemccand commented Aug 29, 2016

abeyad commented Aug 29, 2016

mikemccand commented Aug 29, 2016

imotov commented Aug 29, 2016

mikemccand commented Aug 29, 2016

abeyad commented Aug 29, 2016

s1monw commented Aug 30, 2016

abeyad commented Aug 30, 2016