-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969
Conversation
39a6e57
to
ee3c3d1
Compare
try (InputStream stream = blobContainer.openInput(SNAPSHOT_INDEX_PREFIX + latest)) { | ||
try (XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(stream)) { | ||
parser.nextToken(); | ||
return new Tuple<>(BlobStoreIndexShardSnapshots.fromXContent(parser), latest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the method readSnapshot(stream) again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reuse it here because this method reads multiple snapshots. But I can definitely add a new method readSnapshots()
similiar to readSnapshot()
and use it here.
LGTM, left some comments. |
@tlrx Thanks! I have updated the PR with your suggestions. |
LGTM, just a typo in documentation. |
@@ -250,13 +266,37 @@ public static void writeSnapshot(BlobStoreIndexShardSnapshot snapshot, OutputStr | |||
* @throws IOException if an IOException occurs | |||
* */ | |||
public static BlobStoreIndexShardSnapshot readSnapshot(InputStream stream) throws IOException { | |||
try (XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(stream)) { | |||
byte[] data = ByteStreams.toByteArray(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to close the stream after reading it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's responsibility of whoever opened it (caller).
6310866
to
f138b6b
Compare
f138b6b
to
abb7193
Compare
if (currentGen > generation) { | ||
generation = currentGen; | ||
} | ||
} catch (NumberFormatException e) { | ||
logger.warn("file [{}] does not conform to the '__' schema"); | ||
logger.warn("file [{}] does not conform to the '" + DATA_BLOB_PREFIX + "' schema"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the '{}' syntax here for DATA_BLOB_PREFIX
@imotov left some more comments on this |
} | ||
|
||
|
||
List<SnapshotFiles> snapshots = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fallback right? For older repos that don't have the new snapshot index file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Added a comment.
@imotov left some more minor comments, mostly around documentation, exceptions, and renaming a var |
@dakrone thanks! pushed an update. |
Thanks @imotov, LGTM! |
…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958
0e8834e
to
59d9f7e
Compare
Will this be back ported to 1.x? |
@niemyjski this was a significant change that required changing the snapshot file format and it was too big of a change for a patch level release. So we didn't port to 1.x and there are no current plans to do it. |
Fixes #8958