-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't Upload Redundant Shard Files #51729
Don't Upload Redundant Shard Files #51729
Conversation
Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
@@ -1035,7 +1035,8 @@ public void testUnrestorableFilesDuringRestore() throws Exception { | |||
final String indexName = "unrestorable-files"; | |||
final int maxRetries = randomIntBetween(1, 10); | |||
|
|||
Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries).build(); | |||
Settings createIndexSettings = Settings.builder().put(SETTING_ALLOCATION_MAX_RETRY.getKey(), maxRetries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little annoying but if we run this test with more than a single shard, then it might be that we get shards that aren't corrupted because they contain no documents now as well as no physical data files because only the .si
and segments_N
are uploaded for the empty shards.
@@ -1132,11 +1132,11 @@ public void testSnapshotTotalAndIncrementalSizes() throws IOException { | |||
|
|||
SnapshotStats stats = snapshots.get(0).getStats(); | |||
|
|||
assertThat(stats.getTotalFileCount(), is(snapshot0FileCount)); | |||
assertThat(stats.getTotalSize(), is(snapshot0FileSize)); | |||
assertThat(stats.getTotalFileCount(), greaterThanOrEqualTo(snapshot0FileCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simply doesn't match up now and I figured what really matters in these tests is the consistency of the numbers. I didn't want to adjust the results for the file counts to filter out the files that weren't physically uploaded since they are still uploaded as part of the metadata technically.
We do have a bunch of other tests that verify all the files are there and incrementality works as expected so I figured this adjustment is good enough.
private static final String DATA_BLOB_PREFIX = "__"; | ||
private static final String UPLOADED_DATA_BLOB_PREFIX = "__"; | ||
|
||
private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to have virtual blob here so that the logic/assertions in BlobStoreIndexShardSnapshot(s)
works without format change and things stay BwC. Technically we could do without this kind of "ID", but then we'd have to go through the whole dance of only updating the format and not uploading the blob once all the snapshots are newer than version X. If we do it this way, we get the benefit of faster restores and snapshots right away since the meta hash works the same way in 6.x
already hence even in 7.7
all possible restores should work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why we have that hash at all (which blows up the size of the snap-*.dat
)? Why is the checksum not sufficient?
My thoughts exactly as well. But now that we have it, I'd actually rather keep it than remove the |
I'm mostly concerned that it blows up the shard index file quite a bit (up to 1MB per snapshot), and that it puts a stricter limitation on the maximum number of snapshots that we can have. Can we leave the hash out of that file? |
True, but so far this has not been an issue for anyone to my knowledge (EDIT: I've also confirmed this with Paul just now, he's never seen any error that would suggest this is a problem either). In the end, these blobs are in almost all cases <0.5kb and thus somewhat close to the size of other data we keep per snapshot anyway.
I'm wondering if that's a good move strategically. We could nicely use this data to speed up restores, searchable snapshots and have an efficient way of tackling #50231 at no extra cost and without BwC-annoying change to the format of the shard level metadata. Why not take the free win instead of fixing an issue no one seems to experience? Generally, I'd then rather make a BwC breaking change to the meta if it has some real win to it, given how involved it tends to be and we have #45736 on the roadmap anyway which will require an adjustment to the shard level metadata. |
Note that I was wondering whether we could leave the hash out of the |
Yea that's a fair point. That's something we could do, though it would mess with my plan for #50231 because it would take away the single spot we currently have for all the segment infos :D |
@ywelsch given the discussion just now on incrementality I take it we keep the segments data in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
private static final String DATA_BLOB_PREFIX = "__"; | ||
private static final String UPLOADED_DATA_BLOB_PREFIX = "__"; | ||
|
||
private static final String VIRTUAL_DATA_BLOB_PREFIX = "v__"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some Javadocs here explaining what virtual data blobs are?
@@ -1516,6 +1519,9 @@ public void snapshotShard(Store store, MapperService mapperService, SnapshotId s | |||
} | |||
} | |||
|
|||
// We can skip writing blobs where the metadata hash is equal to the blob's contents because we store the hash/contents | |||
// directly in the shard level metadata in this case | |||
final boolean needsWrite = md.hash().length != md.length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert here that the content is indeed the same? EDIT: happens below. This might be too subtle otherwise. Perhaps also add a method to StoreFileMetaData
that marks these files specially, so that it's more obvious to identify the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8794541 ... actually found a way to make this even safer via the checksum test just in case some will use this method elsewhere or something changes in the future. Maybe take another look before I merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Yannick! |
Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.
Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.
This commit adapts searchable snapshots to the latest changes from master. The REST handlers declare their routes differently since #51950 and SearchableSnapshotIndexInput need to account for segment infos files that are stored in shard snapshot metadata hash and not uploaded to the blob store anymore since #51729.
Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
physically loading this information as well.