Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some checksum code refactoring #9113

Closed
wants to merge 2 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Nov 2, 2021

Summary: To prepare for adding checksum to footer and "context aware"
checksums. This also brings closely related code much closer together.

Recently added BlockBasedTableBuilder::ComputeBlockTrailer for testing
is made obsolete in the refactoring, as testing the checksums can happen
at a lower level of abstraction.

Also now checking for unrecognized checksum type on reading footer,
rather than later on use.

Also removed an obsolete function delcaration.

Test Plan: existing tests worked before refactoring to remove
ComputeBlockTrailer. And then refactored+improved tests using it.

Summary: To prepare for adding checksum to footer and "context aware"
checksums. This also brings closely related code much closer together.

Also now checking for unrecognized checksum type on reading footer,
rather than later on use.

Also removed an obsolete function delcaration

Test Plan: existing tests
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.


uint32_t ComputeBuiltinChecksumWithLastByte(ChecksumType type, const char* data,
size_t data_size, char last_byte) {
switch (type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code care if data_size == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works as expected for size 0. This function does not implement the behavior of ComputeBuiltinChecksum on zero-length input because the logical input length is always >= 1 for this function.

table/format.cc Outdated
case kXXH3: {
// See corresponding code in ComputeBuiltinChecksumWithLastByte
uint32_t v = Lower32of64(XXH3_64bits(data, data_size - 1));
return ModifyChecksumForLastByte(v, data[data_size - 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why in the "normal" case you are not including the last byte with the original checksum. I understand in the other case where it is not part of the "stream" but not necessarily here. Can you elaborate more on the reasoning in the comment please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on function declaration:

// If data_size >= 1, then
// ComputeBuiltinChecksum(type, data, size)
// ==
// ComputeBuiltinChecksumWithLastByte(type, data, size - 1, data[size - 1])

table/format.cc Show resolved Hide resolved
@pdillinger
Copy link
Contributor Author

I will push some updated unit tests, though

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@pdillinger
Copy link
Contributor Author

This push also changes the handling of zero-length checksums by ComputeBuiltinChecksum to hopefully be more "natural"

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment on lines +384 to +386
// See corresponding code in ComputeBuiltinChecksumWithLastByte
uint32_t v = Lower32of64(XXH3_64bits(data, data_size - 1));
return ModifyChecksumForLastByte(v, data[data_size - 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems "different" than the old code, but I cannot seem to find where the old code was actually used. If you are convinced that this works the same as the old code and does not cause any issues (or there was no old code and adding the last byte this way is the "right thing to do", I guess it is okay (just feels funny)....

Copy link
Contributor Author

@pdillinger pdillinger Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's equivalent and visually similar to the old VerifyBlockChecksum in reader_common.cc. The old code used some mind tricks to hide the - 1 and didn't have to deal with the case of zero checksummed bytes.

VerifyBlockChecksum is/was used in block_based_table_reader.cc

Comment on lines -2263 to +2279
EXPECT_EQ(TrailerAsString(b1, ct1, t), "0000000000");
EXPECT_EQ(TrailerAsString(b1, ct2, t), "0100000000");
EXPECT_EQ(TrailerAsString(b1, ct3, t), "0700000000");
EXPECT_EQ(TrailerAsString(b2, ct1, t), "0000000000");
EXPECT_EQ(TrailerAsString(b2, ct2, t), "0100000000");
EXPECT_EQ(TrailerAsString(b2, ct3, t), "0700000000");
EXPECT_EQ(ChecksumAsString(empty, t), "00000000");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the expected checksum value is changing, doesn't that mean the test is now broken? I might be misreading the test but it sounds like those constants being compared to should never be changed or compatibility will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it's hard to see in the diff
Screen Shot 2021-11-02 at 3 45 46 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of the later values actually different though?

kCRC32c: b1, ct2, t -- Was "012F9B0A57", now "2F9B0A57";

Some of the values appear to be different in the first non-zero character?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leading byte for "trailer" is the compression type. ChecksumAsString doesn't include this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Not obvious!

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in dfedc74.

facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2021
Summary:
Work around annoying compiler warning-as-error from #9113

Pull Request resolved: #9128

Test Plan: CI

Reviewed By: jay-zhuang

Differential Revision: D32181499

Pulled By: pdillinger

fbshipit-source-id: d7e5f7857a29f7ba47c49c3aee7150b5763b65d9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants