-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some checksum code refactoring #9113
Conversation
Summary: To prepare for adding checksum to footer and "context aware" checksums. This also brings closely related code much closer together. Also now checking for unrecognized checksum type on reading footer, rather than later on use. Also removed an obsolete function delcaration Test Plan: existing tests
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
||
uint32_t ComputeBuiltinChecksumWithLastByte(ChecksumType type, const char* data, | ||
size_t data_size, char last_byte) { | ||
switch (type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code care if data_size == 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works as expected for size 0. This function does not implement the behavior of ComputeBuiltinChecksum
on zero-length input because the logical input length is always >= 1 for this function.
table/format.cc
Outdated
case kXXH3: { | ||
// See corresponding code in ComputeBuiltinChecksumWithLastByte | ||
uint32_t v = Lower32of64(XXH3_64bits(data, data_size - 1)); | ||
return ModifyChecksumForLastByte(v, data[data_size - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why in the "normal" case you are not including the last byte with the original checksum. I understand in the other case where it is not part of the "stream" but not necessarily here. Can you elaborate more on the reasoning in the comment please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment on function declaration:
// If data_size >= 1, then
// ComputeBuiltinChecksum(type, data, size)
// ==
// ComputeBuiltinChecksumWithLastByte(type, data, size - 1, data[size - 1])
I will push some updated unit tests, though |
@pdillinger has updated the pull request. You must reimport the pull request before landing. |
This push also changes the handling of zero-length checksums by ComputeBuiltinChecksum to hopefully be more "natural" |
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
// See corresponding code in ComputeBuiltinChecksumWithLastByte | ||
uint32_t v = Lower32of64(XXH3_64bits(data, data_size - 1)); | ||
return ModifyChecksumForLastByte(v, data[data_size - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems "different" than the old code, but I cannot seem to find where the old code was actually used. If you are convinced that this works the same as the old code and does not cause any issues (or there was no old code and adding the last byte this way is the "right thing to do", I guess it is okay (just feels funny)....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's equivalent and visually similar to the old VerifyBlockChecksum in reader_common.cc. The old code used some mind tricks to hide the - 1
and didn't have to deal with the case of zero checksummed bytes.
VerifyBlockChecksum is/was used in block_based_table_reader.cc
EXPECT_EQ(TrailerAsString(b1, ct1, t), "0000000000"); | ||
EXPECT_EQ(TrailerAsString(b1, ct2, t), "0100000000"); | ||
EXPECT_EQ(TrailerAsString(b1, ct3, t), "0700000000"); | ||
EXPECT_EQ(TrailerAsString(b2, ct1, t), "0000000000"); | ||
EXPECT_EQ(TrailerAsString(b2, ct2, t), "0100000000"); | ||
EXPECT_EQ(TrailerAsString(b2, ct3, t), "0700000000"); | ||
EXPECT_EQ(ChecksumAsString(empty, t), "00000000"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the expected checksum value is changing, doesn't that mean the test is now broken? I might be misreading the test but it sounds like those constants being compared to should never be changed or compatibility will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't some of the later values actually different though?
kCRC32c: b1, ct2, t -- Was "012F9B0A57", now "2F9B0A57";
Some of the values appear to be different in the first non-zero character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leading byte for "trailer" is the compression type. ChecksumAsString doesn't include this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Not obvious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@pdillinger merged this pull request in dfedc74. |
Summary: To prepare for adding checksum to footer and "context aware"
checksums. This also brings closely related code much closer together.
Recently added
BlockBasedTableBuilder::ComputeBlockTrailer
for testingis made obsolete in the refactoring, as testing the checksums can happen
at a lower level of abstraction.
Also now checking for unrecognized checksum type on reading footer,
rather than later on use.
Also removed an obsolete function delcaration.
Test Plan: existing tests worked before refactoring to remove
ComputeBlockTrailer
. And then refactored+improved tests using it.