Improve recovery / snapshot restoring file identity handling #7351

Merged
merged 1 commit into from Aug 21, 2014

Projects

None yet

6 participants

@s1monw
Contributor
s1monw commented Aug 20, 2014

This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparison to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This all or nothing strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the .si / segments.N file.

@kimchy kimchy added the resiliency label Aug 20, 2014
@s1monw
Contributor
s1monw commented Aug 20, 2014

@imotov @rmuir can you guys do a review here? I am not sure about the XContent changes in Backup/Restore would be good to get some ideas here...

@rmuir
Contributor
rmuir commented Aug 20, 2014

The diffing logic here etc looks great to me.

@imotov imotov commented on an outdated diff Aug 20, 2014
.../snapshots/blobstore/BlobStoreIndexShardSnapshot.java
@@ -221,6 +224,12 @@ public static void toXContent(FileInfo file, XContentBuilder builder, ToXContent
if (file.metadata.writtenBy() != null) {
builder.field(Fields.WRITTEN_BY, file.metadata.writtenBy());
}
+
+ if (file.metadata.hash() != null && file.metadata().hash().length > 0) {
+ BytesRef hash = file.metadata.hash();
+ // TODO - not sure if this is the right way to do this - can't we pass the bytes ref directly?
+ builder.field(Fields.META_HASH, Base64.encodeBytes(hash.bytes, hash.offset, hash.length));
@imotov imotov commented on an outdated diff Aug 20, 2014
...napshots/blobstore/BlobStoreIndexShardRepository.java
@@ -716,9 +714,10 @@ public void restore() {
long totalSize = 0;
int numberOfReusedFiles = 0;
long reusedTotalSize = 0;
- Map<String, StoreFileMetaData> metadata = Collections.emptyMap();
+ Map<String, StoreFileMetaData> metaMap = new HashMap<>();
@imotov
imotov Aug 20, 2014 Member

It looks like it was replaced by snapshotMetaData bellow and is not needed anymore.

@imotov
Member
imotov commented Aug 20, 2014

I left a couple of minor comments. Otherwise, looks good to me.

@s1monw
Contributor
s1monw commented Aug 21, 2014

@imotov I pushed a new commit including a test for the FileInfo serialization

@imotov
Member
imotov commented Aug 21, 2014

LGTM

@s1monw
Contributor
s1monw commented Aug 21, 2014

I think we have a small regression here for snapshot and restore since we don't have the hash for the segments in the already existing snapshot. I think we can read the hashes for those where we calculated them from the snapshot on the fly if necessary. I will open a followup for this as I already discussed this with @imotov

@s1monw s1monw [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
058a02b
@s1monw s1monw merged commit 058a02b into elastic:master Aug 21, 2014
@s1monw s1monw added a commit that referenced this pull request Aug 21, 2014
@s1monw s1monw [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
8263de9
@s1monw s1monw deleted the s1monw:fix_checksums branch Aug 25, 2014
@jpountz jpountz removed the review label Aug 26, 2014
@s1monw s1monw added a commit to s1monw/elasticsearch that referenced this pull request Aug 26, 2014
@s1monw s1monw [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
c63626b
@s1monw s1monw added a commit that referenced this pull request Aug 26, 2014
@s1monw s1monw [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
c8d2e48
@s1monw s1monw added a commit that referenced this pull request Sep 8, 2014
@s1monw @areek s1monw + areek [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
7ce41fd
@s1monw s1monw added a commit that referenced this pull request Sep 8, 2014
@s1monw @areek s1monw + areek [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
3a72d4e
@clintongormley clintongormley changed the title from Improve recovery / snapshot restoring file identity handling to Resiliency: Improve recovery / snapshot restoring file identity handling Sep 8, 2014
@s1monw s1monw added a commit to s1monw/elasticsearch that referenced this pull request Sep 24, 2014
@s1monw s1monw [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
7ff308a
@s1monw s1monw added a commit to s1monw/elasticsearch that referenced this pull request Sep 24, 2014
@s1monw s1monw [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
5e735de
@s1monw s1monw added v1.3.3 bug labels Sep 24, 2014
@clintongormley clintongormley changed the title from Resiliency: Improve recovery / snapshot restoring file identity handling to Improve recovery / snapshot restoring file identity handling Jun 7, 2015
@mute mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
@s1monw s1monw [STORE] Improve recovery / snapshot restoring file identity handling
This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparsion to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This "all or nothing" strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the ".si" / "segments.N" file.

Closes #7351
cb22210
@mute mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
@s1monw s1monw [SNAPSHOT] Add BWC layer to .si / segments_N hashing
Due to additional safety added in #7351 we compute now a strong hash for
.si and segments_N files which are compared during snapshot / restore.
Old snapshots don't have this hash which can cause unnecessary copying
of large amount of data. This commit adds the ability to fetch this
hash from the blob store if needed.

Closes #7434
839f0a5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment