New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully handles pre 2.x compressed snapshots #22267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really sure what the right solution here, but this approach seems dangerous to me. Since we are not adding this snapshot to the list of snapshots in 5.x we will not consider files that belong to this snapshot as needed, which means that we might remove these files during consecutive snapshots, right?
I am thinking about a scenario where 1.x index was upgraded to 2.x so we have both 1.x and 2.x snapshots pointing to the same data directory. So, in a sense we might make this snapshot non-restorable even in earlier versions of elasticsearch.
I think a better approach might be to identify a list of snapshots that cannot be brought over to 5.x and throw an exception with the list of these snapshots asking user to either delete these snapshots from the existing repository or switch to a new repository.
What happens in the case where there are no compressed snapshots, when a 5.x cluster is touching the repository for the first time, it updates the snapshots index file to the 5.x version. In doing so, it is able to read the legacy pre 2.x format and get the snapshot metadata and proceed accordingly. When you GET all the snapshots, it lists all the snapshots including the pre 2.x ones. It is only when you try to restore the snapshot that you get an error saying "snapshot contains index data that is too old" (something to that effect). It is only when dealing with compressed snapshots that we have an issue. The reason is we can't even read the snapshot metadata via
The upgrade process will only remove the snapshot from the new
The issue is that we can't even list the snapshots to delete until we are able to load the repository in 5.x We can, and indeed do, do this for uncompressed snapshots, as mentioned above, but the same strategy does not work for snapshots compressed in 1.x. The reason I considered it acceptable to remove the snapshot from the Not sure if this makes sense, but this is the rationale for this approach and I'm not sure if it can be done differently without adding a new API that operates on a "non-updated" repository so one can pick and chose what snapshots they want to keep. |
@abeyad I was trying to describe a scenario, where this might lead to loosing data in the 1.x snapshots without a warning (I don't consider logging on master a fair warning in this case). Might be it's me how is missing something here though. Can we chat on zoom about it tomorrow morning, I will try to explain the scenario in more details. |
64a0ba6
to
2a3faf7
Compare
@imotov I pushed 2a3faf7daf723b6c5ab52862aecc9277979f8dbc that implements the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much better, but I have a couple of questions. How are we going to handle this in 6.x? Would it be possible to improve the delete snapshot experience for the unsupported snapshots or they are pretty much stuck in this repository forever since we cannot really clean them?
I will forward port this code to master, with the only omission being the actual creation of the
Its impossible to know from the repository data which data files can be deleted, because the One (not so great) option is that we could iterate over all the data files in the repository and if they throw a similar compression exception, we can know they are outdated (belonging to 1.x) and delete them. But the overhead of iterating over all repository files like that could be huge. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess this is the best we can do in this case. LGTM.
In pre 2.x versions, if the repository was set to compress snapshots, then snapshots would be compressed with the LZF algorithm. In 5.x, Elasticsearch no longer supports the LZF compression algorithm. This presents an issue when retrieving snapshots in a repository or upgrading repository data to the 5.x version, because Elasticsearch throws an exception when it tries to read the snapshot metadata because it was compressed using LZF. This commit gracefully handles the situation by introducing a new incompatible-snapshots blob to the repository. For any pre-2.x snapshot that cannot be read, that snapshot is removed from the list of active snapshots, because the snapshot could not be restored anyway. Instead, the snapshot is recorded in the incompatible-snapshots blob. When listing snapshots, both active snapshots and incompatible snapshots will be listed, with incompatible snapshots showing a `INCOMPATIBLE` state. Any attempt to restore an incompatible snapshot will result in an exception.
2a3faf7
to
959ad35
Compare
retest this please |
thanks for the review @imotov |
master: b0c009a |
In pre 2.x versions, if the repository was set to compress snapshots, then snapshots would be compressed with the LZF algorithm. In 5.x, Elasticsearch no longer supports the LZF compression algorithm. This presents an issue when retrieving snapshots in a repository or upgrading repository data to the 5.x version, because Elasticsearch throws an exception when it tries to read the snapshot metadata because it was compressed using LZF. This commit gracefully handles the situation by introducing a new incompatible-snapshots blob to the repository. For any pre-2.x snapshot that cannot be read, that snapshot is removed from the list of active snapshots, because the snapshot could not be restored anyway. Instead, the snapshot is recorded in the incompatible-snapshots blob. When listing snapshots, both active snapshots and incompatible snapshots will be listed, with incompatible snapshots showing a `INCOMPATIBLE` state. Any attempt to restore an incompatible snapshot will result in an exception.
5.2 commit: 29b777a |
In elastic#22267, we introduced the notion of incompatible snapshots in a repository, and they were stored in a root-level blob named `incompatible-snapshots`. If there were no incompatible snapshots in the repository, then there was no `incompatible-snapshots` blob. However, this causes a problem for some cloud-based repositories, because if the blob does not exist, the cloud-based repositories may attempt to keep retrying the read of a non-existent blob with expontential backoff until giving up. This causes performance issues (and potential timeouts) on snapshot operations because getting the `incompatible-snapshots` is part of getting the repository data (see RepositoryData#getRepositoryData()). This commit fixes the issue by creating an empty `incompatible-snapshots` blob in the repository if one does not exist.
In #22267, we introduced the notion of incompatible snapshots in a repository, and they were stored in a root-level blob named `incompatible-snapshots`. If there were no incompatible snapshots in the repository, then there was no `incompatible-snapshots` blob. However, this causes a problem for some cloud-based repositories, because if the blob does not exist, the cloud-based repositories may attempt to keep retrying the read of a non-existent blob with expontential backoff until giving up. This causes performance issues (and potential timeouts) on snapshot operations because getting the `incompatible-snapshots` is part of getting the repository data (see RepositoryData#getRepositoryData()). This commit fixes the issue by creating an empty `incompatible-snapshots` blob in the repository if one does not exist.
In #22267, we introduced the notion of incompatible snapshots in a repository, and they were stored in a root-level blob named `incompatible-snapshots`. If there were no incompatible snapshots in the repository, then there was no `incompatible-snapshots` blob. However, this causes a problem for some cloud-based repositories, because if the blob does not exist, the cloud-based repositories may attempt to keep retrying the read of a non-existent blob with expontential backoff until giving up. This causes performance issues (and potential timeouts) on snapshot operations because getting the `incompatible-snapshots` is part of getting the repository data (see RepositoryData#getRepositoryData()). This commit fixes the issue by creating an empty `incompatible-snapshots` blob in the repository if one does not exist.
In #22267, we introduced the notion of incompatible snapshots in a repository, and they were stored in a root-level blob named `incompatible-snapshots`. If there were no incompatible snapshots in the repository, then there was no `incompatible-snapshots` blob. However, this causes a problem for some cloud-based repositories, because if the blob does not exist, the cloud-based repositories may attempt to keep retrying the read of a non-existent blob with expontential backoff until giving up. This causes performance issues (and potential timeouts) on snapshot operations because getting the `incompatible-snapshots` is part of getting the repository data (see RepositoryData#getRepositoryData()). This commit fixes the issue by creating an empty `incompatible-snapshots` blob in the repository if one does not exist.
This commit fixes inefficient (worst case exponential) loading of snapshot repository data when checking for incompatible snapshots, that was introduced in #22267. When getting snapshot information, getRepositoryData() was called on every snapshot, so if there are a large number of snapshots in the repository and _all snapshots were requested, the performance degraded exponentially. This commit fixes the issue by only calling getRepositoryData once and using the data from it in all subsequent calls to get snapshot information. Closes #24509
This commit fixes inefficient (worst case exponential) loading of snapshot repository data when checking for incompatible snapshots, that was introduced in #22267. When getting snapshot information, getRepositoryData() was called on every snapshot, so if there are a large number of snapshots in the repository and _all snapshots were requested, the performance degraded exponentially. This commit fixes the issue by only calling getRepositoryData once and using the data from it in all subsequent calls to get snapshot information. Closes #24509
In pre 2.x versions, if the repository was set to compress snapshots,
then snapshots would be compressed with the LZF algorithm. In 5.x,
Elasticsearch no longer supports the LZF compression algorithm. This
presents an issue when retrieving snapshots in a repository or upgrading
repository data to the 5.x version, because Elasticsearch throws an
exception when it tries to read the snapshot metadata because it was
compressed using LZF.
This commit gracefully handles the situation by introducing a new
incompatible-snapshots blob to the repository. For any pre-2.x snapshot
that cannot be read, that snapshot is removed from the list of active
snapshots, because the snapshot could not be restored anyway. Instead,
the snapshot is recorded in the incompatible-snapshots blob. When
listing snapshots, both active snapshots and incompatible snapshots will
be listed, with incompatible snapshots showing a
INCOMPATIBLE
state.Any attempt to restore an incompatible snapshot will result in an
exception.