Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SSE2 SIMD in IndexTable::find_entry #176

Merged
merged 4 commits into from
Feb 3, 2023

Conversation

Tpt
Copy link
Contributor

@Tpt Tpt commented Jan 31, 2023

No description provided.

@Tpt Tpt force-pushed the simd_find_entry branch 2 times, most recently from 5427684 to 54d8e85 Compare January 31, 2023 21:49
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated
let current = _mm_unpacklo_epi32(first_two, last_two);
let cmp = _mm_movemask_epi8(_mm_cmpeq_epi32(current, target));
if cmp != 0 {
// We found a match, let's validate it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we can simply return entry at i + cmp.leading_zeroes() / 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is that the shift might be of 30 bits i.e. are left with a 34 bit number that we cast to 32 bits. So, this means we need to check again, sadly. But we can definitely be smarter and skip not useful comparison.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the idea to just skip two bits of check in this case? (but then need to shift 32 bits instead of 30, so we check the 32 bit of the start, the 2 last unchecked bits will be checked with the rest of the key on value access).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly for the skipping. Currently we check the last 32 bits, and check again the remaining 2 bits before returning. We can indeed simplfy the operations by just checking the first 32 bits (i.e. the high weight part of the entry if I am not mistaken).

Now, there is the question of which two bits are the most discriminative, the higher 2 or the lower 2?) Because the partial_key is build with u64::from_be_bytes I would tend to think the higher two might be more relevant but I am not sure. What do you think about it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spend time remembering/reunderstanding this today. might still be wrong, but my conclusion was the higher 2 so we in case of index 16 we would need to move >> 32 instead of 30. (the other way we would miss checking key content as the part of the key stored and check in index is 26 bytes containing the last two bits, then we indeed got 16 bit check from the access to the index, 32 from the index key content and 26 * 8 from the value table, should be 256).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm going to update the PR to use the high 32 bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous version was actually more correct. We can't assume that partial key is always 32 bits, so shifts are still needed. We can ignore the last two bits for the cases when it is 33 or 34 bytes however. A bit of explanation:

Keys are always 256 bits. Key bits [48..256] are stored in the value table. So if find_entry returns a match, these bits are compared to make sure the rest of the key matches . Therefore find_entry must only match the first 48 bits of the key. Current version actually matches 50, so that's a bit of an overkill.

The first index_bits of the key are simply the offset of the chunk in the file. All entries in the chunk share the same first index_bits of the key. They are not stored in the entries. Bits [index_bits..50] is what's actually stored in the entry and that's called partial_key. The rest of the entry bits is address. As index_bits grows, so does address_bits, and partial_key shrinks. But the chunk index bits grows as well, so we can always match 50 bits of the key in the index file.

Indexes start with index_bits==16. With index_bits==16 partial_key is 34, but the last two bits can be ignored and we only need to match 48 bits.
With index_bits==21 however partial_key is only 29 bits, so the last 3 of the first 32 entry bits should be excluded from comparison.

Copy link
Contributor Author

@Tpt Tpt Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed explanation. I indeed forgot that the index_bit can grow further. I have applied @cheme comments to fix the implementation.

I also added a test to cover this case because it was not already done by the existing tests.

@arkpar
Copy link
Member

arkpar commented Jan 31, 2023

Great stuff. Curious to see if it has any affect on the query performance.

src/index.rs Outdated Show resolved Hide resolved
@arkpar arkpar requested a review from cheme February 1, 2023 10:03
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Show resolved Hide resolved
@Tpt
Copy link
Contributor Author

Tpt commented Feb 1, 2023

@arkpar @cheme Thank you for your reviews. I have written a version 2 using the assumption that the high 32bits are part of the partial key.

I ran a quick benchmark on my laptop. The SSE2 version seems to be slightly faster but the numbers are not stable enough to tell more than that. We might also try an AVX version.

src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
src/index.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, could do :

diff --git a/src/index.rs b/src/index.rs
index 5bd78b9..f5e94bf 100644
--- a/src/index.rs
+++ b/src/index.rs
@@ -235,12 +235,14 @@ impl IndexTable {
                Ok(try_io!(Ok(&map[offset..offset + CHUNK_LEN])))
        }
 
+       #[cfg(target_feature = "sse2")]
        fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
-               if cfg!(target_feature = "sse2") {
-                       self.find_entry_sse2(key_prefix, sub_index, chunk)
-               } else {
-                       self.find_entry_base(key_prefix, sub_index, chunk)
-               }
+               self.find_entry_sse2(key_prefix, sub_index, chunk)
+       }
+
+       #[cfg(not(target_feature = "sse2"))]
+       fn find_entry(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
+               self.find_entry_base(key_prefix, sub_index, chunk)
        }
 
        #[cfg(target_feature = "sse2")]
@@ -284,6 +286,7 @@ impl IndexTable {
                (Entry::empty(), 0)
        }
 
+       #[cfg(any(not(target_feature = "sse2"), test))]
        fn find_entry_base(&self, key_prefix: u64, sub_index: usize, chunk: &[u8]) -> (Entry, usize) {
                assert!(chunk.len() >= CHUNK_ENTRIES * 8);
                let partial_key = Entry::extract_key(key_prefix, self.id.index_bits());
@@ -608,19 +611,19 @@ mod test {
 
        #[test]
        fn test_find_entries() {

or expose a dummy find_entry_sse2 function when no feature sse2 (method may be missing if no traget).

@arkpar arkpar merged commit fc1250f into paritytech:master Feb 3, 2023
@arkpar arkpar mentioned this pull request Feb 6, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants