Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot restore should first remove unused files #20148

Closed
mikemccand opened this issue Aug 24, 2016 · 18 comments
Closed

Snapshot restore should first remove unused files #20148

mikemccand opened this issue Aug 24, 2016 · 18 comments
Assignees
Labels
blocker :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v5.0.0-beta1

Comments

@mikemccand
Copy link
Contributor

Spinoff from Lucene 6.2.0 upgrade: https://github.com/elastic/elasticsearch/pull/20147/files/0ccfe6978918b9a756b3c882613e8e392aafc7e1#r76148215

As of Lucene 6.2.0, Directory.createOutput now requires that the specified file does not already exist, but this caused test failures in the snapshot/restore tests because we do sometimes overwrite a file during restore in some cases.

I think ES should instead prune unused files up front before doing this?

@mikemccand mikemccand added >enhancement :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Aug 24, 2016
@abeyad
Copy link

abeyad commented Aug 25, 2016

It seems dangerous to prune files up front before writing to the same file. Given @imotov's comment here about how this could happen, I believe we should just throw an exception and not allow the restore to proceed. Pruning the files would amount to deleting data that the user added to the index and we probably shouldn't try to do anything too cute with the restore process trying to rename those files to get them restored. It seems dangerous in general to restore to an index already written to outside of the restore process. For example, what would happen right now if snapshot A was restored, which restored idx containing v1 of document 1, then we index v2 of document 1, then we try to restore snapshot B that already contained v2 of document 1 but with a different JSON source?

@imotov
Copy link
Contributor

imotov commented Aug 25, 2016

What I described is a rare but possible scenario. And in this scenario throwing an exception and notifying a user about diverging lifelines might make sense. However, there are two other more common scenarios that could lead to the same issue:

  1. the index into which we restore is a complexity different index
  2. it's the same index, but we lost primary some time after we took a snapshot, and are trying to restore into one of the former replicas.

I think throwing an exception in these 2 scenarios would lead to a bad user experience. Both scenarios are easy to detect in 5.0 since we now have both index uuids and shard allocation uuid. Unfortunately, we didn't store this information currently, so handling old snapshots might still be an issue.

@abeyad
Copy link

abeyad commented Aug 25, 2016

  1. the index into which we restore is a complexity different index

@imotov in this scenario, isn't it still dangerous to overwrite existing index files? it still means we are overwriting some data. shouldn't the user be required to delete the index first and then proceed with a restore?

  1. it's the same index, but we lost primary some time after we took a snapshot, and are trying to restore into one of the former replicas.

not sure i follow this scenario; if we lost the primary, then we should have promoted one of the replicas to primary? I'm sure you meant something else by it, I just didn't get it.

@abeyad
Copy link

abeyad commented Aug 25, 2016

Maybe the restore operation could have an optional force or overwrite option so we only overwrite existing files if that is set? That way at least its clear and the directive came from the user..

@ywelsch
Copy link
Contributor

ywelsch commented Aug 25, 2016

We currently only allow restoring into an existing index if that index is closed. If no index with same name already exists in the cluster, we select a fresh index uuid (and thus restore into a fresh folder on disk). I see restoring into a closed index as a feature when a shard has been corrupted on disk but a recent snapshot is available. By only restoring the corrupt/missing files, this might get you running again faster. As this is a really exceptional scenario, I think it justifies having to specify an explicit "override" option in that case.

@abeyad
Copy link

abeyad commented Aug 26, 2016

I discussed this with @s1monw. There are several ways we could end up with the same Lucene file name. For example, index to primary and replica, both primary and replica have the same documents but different segment files, snapshot primary, primary goes down, replica gets promoted, now try to restore into the index, we could end up with the same situation of trying to overwrite segments with different checksums.

Given the whole idea with restore is to forget about what's already there and restore the index to the state saved in the snapshot, what @mikemccand did in deleting the file if it already exists before restoring it is correct. I will remove the TODO comment in the code.

A longer term solution would be to use the recovery logic for restore, because that would provide us, amongst other things, the ability to be more robust in handling file restores and restoring them to temporary files first, then moving the older ones to backups and the restored files to the active ones. But implementing this is a much larger scope and not necessary here.

If there are no other concerns, I think this doesn't need any work currently and can be closed.

@clintongormley
Copy link

thanks @abeyad

@imotov
Copy link
Contributor

imotov commented Aug 29, 2016

I thought a bit more about it over the weekend and discussed it with @abeyad. It looks like we just made restore non-incremental. If this is indeed what we want, we should probably document this as a breaking change in 5.0 and remove code that deals with incremental restore in restore service. Alternatively, we can modify it the logic in restore to just delete files with non-matching checksums instead of wiping out entire directory.

@clintongormley
Copy link

Good point. Reopening

@abeyad abeyad self-assigned this Aug 29, 2016
@abeyad
Copy link

abeyad commented Aug 29, 2016

Actually, restores are still incremental. The reason is, we only try to restore a file (by calling restoreFile) if the file was missing or different in the Lucene store. For files that exist in the store and are the same (same length / checksum), we do not try calling restoreFile on it (see https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java#L1559). Hence we would never try to first delete the good file because we would never even attempt a restore in the first place on that file.

@mikemccand
Copy link
Contributor Author

I'm very confused :) I could swear I saw a test failure because we were in fact attempting to overwrite a pre-existing file here. Lucene no longer allows that, since it throws an exception as of 6.2.0 vs previously silently overwriting the pre-existing file.

@abeyad
Copy link

abeyad commented Aug 29, 2016

@mikemccand no you are absolutely correct. Basically, @imotov raised the concern that with your addition of IOUtils.deleteFilesIgnoringExceptions, that would delete all files already existing in the store before restoring them, even good copies of files, thereby rendering restores non-incremental. However, that is not the case because the only files on which we execute restoreFile are those for which the file is missing or its different than what is currently in the store. So we have not lost incremental restores. And your added delete invocation is needed to first delete those "different" files for which we already have a file of the same name but it has a different checksum and/or length.

@mikemccand
Copy link
Contributor Author

Ahh OK I understand; thanks @abeyad.

I think this low level is the wrong place to delete the pre-existing file. Can't we remove the file at the higher-up-in-the-stack-place that determined that the file is different?

@imotov
Copy link
Contributor

imotov commented Aug 29, 2016

Yep. Sorry for confusion. I guess, we can re-close it then.

@mikemccand
Copy link
Contributor Author

Hmm do we have a test that would fail if I had in fact broken incremental restore?

@abeyad
Copy link

abeyad commented Aug 29, 2016

I don't believe we do other than noticing a slow down in restore times. Its a good idea to add one though will require some work on setting up the test.

@mikemccand I created this small PR that moves the file deletion before trying to restore: #20220. I think we don't want to do the deletes at the point of knowing which files are different because we haven't necessarily kicked off the restore process upon making that determination, but this PR allows us to first delete the different files only, then proceed with the restore, as opposed to just trying to delete any file that comes our way before restoring. I would appreciate your feedback on it.

@s1monw
Copy link
Contributor

s1monw commented Aug 30, 2016

@abeyad we can close this again right?

@abeyad
Copy link

abeyad commented Aug 30, 2016

@s1monw yes, i'm closing, we enhanced the implementation of this delete in #20220

@abeyad abeyad closed this as completed Aug 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v5.0.0-beta1
Projects
None yet
Development

No branches or pull requests

6 participants