-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot restore should first remove unused files #20148
Comments
It seems dangerous to prune files up front before writing to the same file. Given @imotov's comment here about how this could happen, I believe we should just throw an exception and not allow the restore to proceed. Pruning the files would amount to deleting data that the user added to the index and we probably shouldn't try to do anything too cute with the restore process trying to rename those files to get them restored. It seems dangerous in general to restore to an index already written to outside of the restore process. For example, what would happen right now if snapshot |
What I described is a rare but possible scenario. And in this scenario throwing an exception and notifying a user about diverging lifelines might make sense. However, there are two other more common scenarios that could lead to the same issue:
I think throwing an exception in these 2 scenarios would lead to a bad user experience. Both scenarios are easy to detect in 5.0 since we now have both index uuids and shard allocation uuid. Unfortunately, we didn't store this information currently, so handling old snapshots might still be an issue. |
@imotov in this scenario, isn't it still dangerous to overwrite existing index files? it still means we are overwriting some data. shouldn't the user be required to delete the index first and then proceed with a restore?
not sure i follow this scenario; if we lost the primary, then we should have promoted one of the replicas to primary? I'm sure you meant something else by it, I just didn't get it. |
Maybe the restore operation could have an optional |
We currently only allow restoring into an existing index if that index is closed. If no index with same name already exists in the cluster, we select a fresh index uuid (and thus restore into a fresh folder on disk). I see restoring into a closed index as a feature when a shard has been corrupted on disk but a recent snapshot is available. By only restoring the corrupt/missing files, this might get you running again faster. As this is a really exceptional scenario, I think it justifies having to specify an explicit "override" option in that case. |
I discussed this with @s1monw. There are several ways we could end up with the same Lucene file name. For example, index to primary and replica, both primary and replica have the same documents but different segment files, snapshot primary, primary goes down, replica gets promoted, now try to restore into the index, we could end up with the same situation of trying to overwrite segments with different checksums. Given the whole idea with restore is to forget about what's already there and restore the index to the state saved in the snapshot, what @mikemccand did in deleting the file if it already exists before restoring it is correct. I will remove the TODO comment in the code. A longer term solution would be to use the recovery logic for restore, because that would provide us, amongst other things, the ability to be more robust in handling file restores and restoring them to temporary files first, then moving the older ones to backups and the restored files to the active ones. But implementing this is a much larger scope and not necessary here. If there are no other concerns, I think this doesn't need any work currently and can be closed. |
thanks @abeyad |
I thought a bit more about it over the weekend and discussed it with @abeyad. It looks like we just made restore non-incremental. If this is indeed what we want, we should probably document this as a breaking change in 5.0 and remove code that deals with incremental restore in restore service. Alternatively, we can modify it the logic in restore to just delete files with non-matching checksums instead of wiping out entire directory. |
Good point. Reopening |
Actually, restores are still incremental. The reason is, we only try to restore a file (by calling |
I'm very confused :) I could swear I saw a test failure because we were in fact attempting to overwrite a pre-existing file here. Lucene no longer allows that, since it throws an exception as of 6.2.0 vs previously silently overwriting the pre-existing file. |
@mikemccand no you are absolutely correct. Basically, @imotov raised the concern that with your addition of |
Ahh OK I understand; thanks @abeyad. I think this low level is the wrong place to delete the pre-existing file. Can't we remove the file at the higher-up-in-the-stack-place that determined that the file is different? |
Yep. Sorry for confusion. I guess, we can re-close it then. |
Hmm do we have a test that would fail if I had in fact broken incremental restore? |
I don't believe we do other than noticing a slow down in restore times. Its a good idea to add one though will require some work on setting up the test. @mikemccand I created this small PR that moves the file deletion before trying to restore: #20220. I think we don't want to do the deletes at the point of knowing which files are different because we haven't necessarily kicked off the restore process upon making that determination, but this PR allows us to first delete the different files only, then proceed with the restore, as opposed to just trying to delete any file that comes our way before restoring. I would appreciate your feedback on it. |
@abeyad we can close this again right? |
Spinoff from Lucene 6.2.0 upgrade: https://github.com/elastic/elasticsearch/pull/20147/files/0ccfe6978918b9a756b3c882613e8e392aafc7e1#r76148215
As of Lucene 6.2.0,
Directory.createOutput
now requires that the specified file does not already exist, but this caused test failures in the snapshot/restore tests because we do sometimes overwrite a file during restore in some cases.I think ES should instead prune unused files up front before doing this?
The text was updated successfully, but these errors were encountered: