Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Gracefully handles pre 2.x compressed snapshots #22267
In pre 2.x versions, if the repository was set to compress snapshots,
This commit gracefully handles the situation by introducing a new
I am not really sure what the right solution here, but this approach seems dangerous to me. Since we are not adding this snapshot to the list of snapshots in 5.x we will not consider files that belong to this snapshot as needed, which means that we might remove these files during consecutive snapshots, right?
I am thinking about a scenario where 1.x index was upgraded to 2.x so we have both 1.x and 2.x snapshots pointing to the same data directory. So, in a sense we might make this snapshot non-restorable even in earlier versions of elasticsearch.
I think a better approach might be to identify a list of snapshots that cannot be brought over to 5.x and throw an exception with the list of these snapshots asking user to either delete these snapshots from the existing repository or switch to a new repository.
What happens in the case where there are no compressed snapshots, when a 5.x cluster is touching the repository for the first time, it updates the snapshots index file to the 5.x version. In doing so, it is able to read the legacy pre 2.x format and get the snapshot metadata and proceed accordingly. When you GET all the snapshots, it lists all the snapshots including the pre 2.x ones. It is only when you try to restore the snapshot that you get an error saying "snapshot contains index data that is too old" (something to that effect). It is only when dealing with compressed snapshots that we have an issue. The reason is we can't even read the snapshot metadata via
The upgrade process will only remove the snapshot from the new
The issue is that we can't even list the snapshots to delete until we are able to load the repository in 5.x We can, and indeed do, do this for uncompressed snapshots, as mentioned above, but the same strategy does not work for snapshots compressed in 1.x. The reason I considered it acceptable to remove the snapshot from the
Not sure if this makes sense, but this is the rationale for this approach and I'm not sure if it can be done differently without adding a new API that operates on a "non-updated" repository so one can pick and chose what snapshots they want to keep.
@abeyad I was trying to describe a scenario, where this might lead to loosing data in the 1.x snapshots without a warning (I don't consider logging on master a fair warning in this case). Might be it's me how is missing something here though. Can we chat on zoom about it tomorrow morning, I will try to explain the scenario in more details.
I think this is much better, but I have a couple of questions. How are we going to handle this in 6.x? Would it be possible to improve the delete snapshot experience for the unsupported snapshots or they are pretty much stuck in this repository forever since we cannot really clean them?
I will forward port this code to master, with the only omission being the actual creation of the
Its impossible to know from the repository data which data files can be deleted, because the
One (not so great) option is that we could iterate over all the data files in the repository and if they throw a similar compression exception, we can know they are outdated (belonging to 1.x) and delete them. But the overhead of iterating over all repository files like that could be huge.