Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969

imotov · 2014-12-15T21:34:48Z

tlrx · 2015-01-26T09:58:50Z

src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

+                try (InputStream stream = blobContainer.openInput(SNAPSHOT_INDEX_PREFIX + latest)) {
+                    try (XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(stream)) {
+                        parser.nextToken();
+                        return new Tuple<>(BlobStoreIndexShardSnapshots.fromXContent(parser), latest);


Maybe use the method readSnapshot(stream) again?

I cannot reuse it here because this method reads multiple snapshots. But I can definitely add a new method readSnapshots() similiar to readSnapshot() and use it here.

tlrx · 2015-01-26T11:06:50Z

LGTM, left some comments.

imotov · 2015-01-26T19:08:41Z

@tlrx Thanks! I have updated the PR with your suggestions.

tlrx · 2015-01-29T08:05:05Z

LGTM, just a typo in documentation.

dakrone · 2015-02-05T19:20:45Z

src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

@@ -250,13 +266,37 @@ public static void writeSnapshot(BlobStoreIndexShardSnapshot snapshot, OutputStr
     * @throws IOException if an IOException occurs
     * */
    public static BlobStoreIndexShardSnapshot readSnapshot(InputStream stream) throws IOException {
-        try (XContentParser parser = XContentFactory.xContent(XContentType.JSON).createParser(stream)) {
+        byte[] data = ByteStreams.toByteArray(stream);


Do you need to close the stream after reading it here?

It's responsibility of whoever opened it (caller).

dakrone · 2015-05-04T21:01:42Z

src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

                    if (currentGen > generation) {
                        generation = currentGen;
                    }
                } catch (NumberFormatException e) {
-                    logger.warn("file [{}] does not conform to the '__' schema");
+                    logger.warn("file [{}] does not conform to the '" + DATA_BLOB_PREFIX + "' schema");


Can use the '{}' syntax here for DATA_BLOB_PREFIX

dakrone · 2015-05-04T21:16:46Z

@imotov left some more comments on this

dakrone · 2015-06-02T16:52:03Z

src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardRepository.java

+            }
+
+
+            List<SnapshotFiles> snapshots = Lists.newArrayList();


This is fallback right? For older repos that don't have the new snapshot index file?

Correct. Added a comment.

dakrone · 2015-06-02T17:11:25Z

@imotov left some more minor comments, mostly around documentation, exceptions, and renaming a var

imotov · 2015-06-02T18:54:11Z

@dakrone thanks! pushed an update.

dakrone · 2015-06-02T20:01:19Z

Thanks @imotov, LGTM!

…th large number of snapshots Each shard repository consists of snapshot file for each snapshot - this file contains a map between original physical file that is snapshotted and its representation in repository. This data includes original filename, checksum and length. When a new snapshot is created, elasticsearch needs to read all these snapshot files to figure which file are already present in the repository and which files still have to be copied there. This change adds a new index file that contains all this information combined into a single file. So, if a repository has 1000 snapshots with 1000 shards elasticsearch will only need to read 1000 blobs (one per shard) instead of 1,000,000 to delete a snapshot. This change should also improve snapshot creation speed on repositories with large number of snapshot and high latency. Fixes elastic#8958

niemyjski · 2016-03-18T16:09:51Z

Will this be back ported to 1.x?

imotov · 2016-03-21T15:47:55Z

@niemyjski this was a significant change that required changing the snapshot file format and it was too big of a change for a patch level release. So we didn't port to 1.x and there are no current plans to do it.

imotov added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug v2.0.0-beta1 review labels Dec 15, 2014

imotov force-pushed the issue-8958-snapshot-scaling branch from 39a6e57 to ee3c3d1 Compare January 23, 2015 17:10

imotov assigned tlrx Jan 23, 2015

tlrx reviewed Jan 26, 2015
View reviewed changes

tlrx removed the review label Jan 29, 2015

tlrx removed their assignment Jan 29, 2015

dakrone reviewed Feb 5, 2015
View reviewed changes

imotov force-pushed the issue-8958-snapshot-scaling branch from 6310866 to f138b6b Compare February 6, 2015 02:44

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

imotov force-pushed the issue-8958-snapshot-scaling branch from f138b6b to abb7193 Compare March 10, 2015 23:17

imotov added the review label Mar 21, 2015

imotov assigned s1monw Mar 21, 2015

tlrx mentioned this pull request Apr 1, 2015

Snapshot repository registration api call causing "out of memory" errors #10344

Closed

dakrone reviewed May 4, 2015
View reviewed changes

dakrone reviewed Jun 2, 2015
View reviewed changes

imotov force-pushed the issue-8958-snapshot-scaling branch from 0e8834e to 59d9f7e Compare June 2, 2015 22:39

imotov merged commit 59d9f7e into elastic:master Jun 2, 2015

imotov removed the review label Jun 2, 2015

clintongormley changed the title ~~Improve snapshot creation and deletion performance on repositories with ...~~ Improve snapshot creation and deletion performance on repositories with large number of snapshots Jun 3, 2015

untergeek mentioned this pull request Mar 18, 2016

3.4.1 timeouts with 1.7.5 elastic/curator#582

Closed

imotov mentioned this pull request Mar 21, 2016

Snapshot deletion and creation slow down as number of snapshots in repository grows #8958

Closed

imotov deleted the issue-8958-snapshot-scaling branch May 1, 2020 22:25

DaveCTurner mentioned this pull request Nov 29, 2021

Add recovery infrastructure hook to work with older Lucene indices #81056

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969

Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969

imotov commented Dec 15, 2014

tlrx Jan 26, 2015

imotov Jan 26, 2015

tlrx commented Jan 26, 2015

imotov commented Jan 26, 2015

tlrx commented Jan 29, 2015

dakrone Feb 5, 2015

imotov Feb 5, 2015

dakrone May 4, 2015

dakrone commented May 4, 2015

dakrone Jun 2, 2015

imotov Jun 2, 2015

dakrone commented Jun 2, 2015

imotov commented Jun 2, 2015

dakrone commented Jun 2, 2015

niemyjski commented Mar 18, 2016

imotov commented Mar 21, 2016

Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969

Improve snapshot creation and deletion performance on repositories with large number of snapshots #8969

Conversation

imotov commented Dec 15, 2014

tlrx Jan 26, 2015

Choose a reason for hiding this comment

imotov Jan 26, 2015

Choose a reason for hiding this comment

tlrx commented Jan 26, 2015

imotov commented Jan 26, 2015

tlrx commented Jan 29, 2015

dakrone Feb 5, 2015

Choose a reason for hiding this comment

imotov Feb 5, 2015

Choose a reason for hiding this comment

dakrone May 4, 2015

Choose a reason for hiding this comment

dakrone commented May 4, 2015

dakrone Jun 2, 2015

Choose a reason for hiding this comment

imotov Jun 2, 2015

Choose a reason for hiding this comment

dakrone commented Jun 2, 2015

imotov commented Jun 2, 2015

dakrone commented Jun 2, 2015

niemyjski commented Mar 18, 2016

imotov commented Mar 21, 2016