Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endianness was incorrectly assumed for GroupWord #20

Merged
merged 2 commits into from
Oct 28, 2021

Conversation

Erk-
Copy link
Contributor

@Erk- Erk- commented Oct 25, 2021

This assumption caused incorrect slots to chosen which would lead to it being unusable.

Likely also the cause for rust-lang/rust#90123, though I have not tested if it resolves it.

Tested on Linux on Z which is big-endian, I do not know if this happen on other BE systems.

@@ -29,7 +29,7 @@ impl GroupQuery {
// has pretty much the same effect as a hash collision, something
// that we need to deal with in any case anyway.

let group = GroupWord::from_le_bytes(*group);
let group = GroupWord::from_ne_bytes(*group);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the place where the original bytes are written be changed from native endian to little endian instead? This change asserts that the serialization format is endianness dependent, when I think it should use a fixed endianness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell with some smaller tests this does not change how the hash table is serialized. That said there may be a better solution, as far as I can tell this issues comes from some assumptions of layout of arrays in memory. The most simple test that is failing is probably this one https://github.com/rust-lang/odht/blob/main/src/swisstable_group_query/mod.rs#L58..L73

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the underlying format is byte oriented, so endianness doesn't matter in the serialization format.

However, this GroupWord is batching the u8 control words to try to efficiently scan for matches or empties. It's using the bit-oriented trailing_zeros() to produce byte-oriented usize indexes into the control words. It makes sense that we would want from_le_bytes for that, to ensure low bytes are loaded at the correct "trailing" end. So I don't really understand what's happening here, why from_ne_bytes is fixing anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! The values eq_mask and empty_mask also have a to_le() on their construction, so they're effectively flipped twice. We can either flip with from_le_bytes(..) or these to_le(), but should not do both.

@michaelwoerister
Copy link
Member

That's interesting. Thanks for the PR and bug report, @Erk-!

I'll need to take some time to actually understand what's going on here.

@cuviper
Copy link
Member

cuviper commented Oct 26, 2021

I can confirm via qemu that a lot of tests fail before this change:

failures:
    raw_table::tests::stress
    swisstable_group_query::tests::match_iter
    swisstable_group_query::tests::match_iter_with_empty
    swisstable_group_query::tests::partially_filled
    tests::from_iterator
    tests::hash_table_at_different_alignments
    tests::init_in_place
    tests::quickchecks::k15_v0::lookup
    tests::quickchecks::k16_v0::lookup
    tests::quickchecks::k16_v16::lookup
    tests::quickchecks::k16_v17::lookup
    tests::quickchecks::k16_v1::lookup
    tests::quickchecks::k16_v2::lookup
    tests::quickchecks::k16_v3::lookup
    tests::quickchecks::k16_v4::lookup
    tests::quickchecks::k16_v8::lookup
    tests::quickchecks::k17_v0::lookup
    tests::quickchecks::k17_v4::lookup
    tests::quickchecks::k1_v0::insert_with_duplicates
    tests::quickchecks::k1_v0::lookup
    tests::quickchecks::k20_v4::lookup
    tests::quickchecks::k2_v0::insert_with_duplicates
    tests::quickchecks::k2_v0::lookup
    tests::quickchecks::k2_v4::insert_with_duplicates
    tests::quickchecks::k2_v4::lookup
    tests::quickchecks::k3_v0::lookup
    tests::quickchecks::k4_v0::lookup
    tests::quickchecks::k4_v4::lookup
    tests::quickchecks::k63_v0::lookup
    tests::quickchecks::k64_v0::lookup
    tests::quickchecks::k64_v4::lookup
    tests::quickchecks::k8_v0::lookup
    tests::quickchecks::k8_v4::lookup

test result: FAILED. 102 passed; 33 failed; 0 ignored; 0 measured; 0 filtered out; finished in 7.20s

and all tests pass with this pull request.
(I'll try to find a native machine to test on as well.)

@cuviper
Copy link
Member

cuviper commented Oct 26, 2021

I've now tested and confirmed the fix on native s390x hardware as well.

It also works if we leave the from_le_bytes alone and instead remove the to_le calls, per #20 (comment). I think that's preferable, but I'll leave it to @michaelwoerister as the reviewer.

@michaelwoerister
Copy link
Member

I finally had some time to take a closer look. I concur with @cuviper's diagnosis that this is a mismatch between working with addresses (which dependent on endianess) and bit-offsets within a u64 (which don't depend on endianess).

I think the correct solution is to keep GroupWord::from_le_bytes(*group) and instead remove the to_le() calls when computing eq_mask and empty_mask. We keep from_le_bytes because we want to construct a u64 where the first byte we read ends up in the lowest 8-bits, the second byte in the second lowest, and so on. This is exactly what from_le_bytes does.

Once we have converted the raw bytes into a u64 we are not working with anything endian-dependent anymore. We just work with bit-offsets within integer values -- so the to_le() calls should not be there.

@Erk-, would you mind updating the PR accordingly?

@michaelwoerister
Copy link
Member

Once this is merged, I'll add a Miri-based regression tests (see #19).

@michaelwoerister michaelwoerister merged commit 1ae6bb2 into rust-lang:main Oct 28, 2021
@michaelwoerister
Copy link
Member

Thanks, @Erk-!

@Erk- Erk- deleted the fix/endianess-groupword branch October 28, 2021 11:19
@cuviper
Copy link
Member

cuviper commented Oct 28, 2021

@michaelwoerister will you be publishing this fix? We should get this updated in rust master and beta to avoid shipping a regression in rust-lang/rust#90123.

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 30, 2021
…imulacrum

Update odht crate to 0.3.1 (big-endian bugfix)

Update `odht` to 0.3.1 in order to get rust-lang/odht#20 which fixes issue rust-lang#90123.
@michaelwoerister
Copy link
Member

@cuviper, the fix was merged into rustc in rust-lang/rust#90403.

@cuviper
Copy link
Member

cuviper commented Nov 1, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants