Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MISC] simplify enumerating all bins in a binning_bitvector #2930

Merged
merged 2 commits into from
Jan 25, 2022

Conversation

marehr
Copy link
Member

@marehr marehr commented Jan 24, 2022

This PR simplifies how the bins of a binning_bitvector are enumerated.

It removes the special case if the bit_sequence is 1 << 63 (most significant bit set) by realizing that the before mentioned UB (undefined behaviour) only happens when shifting by 1 << 64, any smaller shift value is fine.

This isn't the case here, as std::countr_zero will only return 64 for 0, but this case is checked by the for loop. So we know that the number of zeros is at most <= 63. The second insight is to do a second tmp >>= 1 instead of incrementing step++ before the actual shift, this will ensure that the shifted by value will be at most <= 63 and thus eliminating the previous edge case.

bin += step++;
tmp >>= step;

// is the same as
bin += step;
step++;
tmp >>= step;

// is the same as
bin += step;
tmp >>= step; // step is <= 63
tmp >>= 1;

Idea moved to seqan/product_backlog#412

@marehr marehr requested review from a team and Irallia and removed request for a team January 24, 2022 00:04
@vercel
Copy link

vercel bot commented Jan 24, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/4zhzTzrv7FZKp44KQDnEHN1PLwcN
✅ Preview: https://seqan3-git-fork-marehr-ibfbins-seqan.vercel.app

@codecov
Copy link

codecov bot commented Jan 24, 2022

Codecov Report

Merging #2930 (dcc4f95) into master (58c7109) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2930   +/-   ##
=======================================
  Coverage   98.28%   98.28%           
=======================================
  Files         267      267           
  Lines       11462    11466    +4     
=======================================
+ Hits        11265    11269    +4     
  Misses        197      197           
Impacted Files Coverage Δ
...n3/search/dream_index/interleaved_bloom_filter.hpp 100.00% <100.00%> (ø)
include/seqan3/search/views/minimiser.hpp 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58c7109...dcc4f95. Read the comment docs.

Copy link
Contributor

@Irallia Irallia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM, thank you!

@Irallia Irallia requested review from a team and eseiler and removed request for a team January 24, 2022 10:00
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!
Just one 💅

Please merge when satisfied

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Click for details

Master

$ interleaved_bloom_filter_benchmark --benchmark_enable_random_interleaving=true --benchmark_repetitions=100 --benchmark_min_time=0.1 --benchmark_filter=bulk_count_benchmark --benchmark_display_aggregates_only=yes | grep --color=auto mean | sort
2022-01-25T17:40:17+01:00
Running interleaved_bloom_filter_benchmark
Run on (16 X 3892.69 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.17, 0.35, 0.28
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/16384/2/1000_mean          16095 ns        16095 ns          100 hashes/sec=62.1336M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/512/2/1000_mean            16109 ns        16109 ns          100 hashes/sec=62.0838M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/128/2/1000_mean        1209346 ns      1209351 ns          100 hashes/sec=826.976k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/4/2/1000_mean          1208471 ns      1208476 ns          100 hashes/sec=827.572k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/16384/2/1000_mean         7562 ns         7562 ns          100 hashes/sec=132.262M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/512/2/1000_mean           7052 ns         7052 ns          100 hashes/sec=141.822M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/128/2/1000_mean       294799 ns       294800 ns          100 hashes/sec=3.39248M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/4/2/1000_mean         292890 ns       292891 ns          100 hashes/sec=3.41471M/s

PR

$ interleaved_bloom_filter_benchmark --benchmark_enable_random_interleaving=true --benchmark_repetitions=100 --benchmark_min_time=0.1 --benchmark_filter=bulk_count_benchmark --benchmark_display_aggregates_only=yes | grep mean | sort
2022-01-25T17:36:40+01:00
Running interleaved_bloom_filter_benchmark
Run on (16 X 3892.69 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.24, 0.25, 0.22
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/16384/2/1000_mean          16028 ns        16028 ns          100 hashes/sec=62.3961M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/512/2/1000_mean            16024 ns        16024 ns          100 hashes/sec=62.413M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/128/2/1000_mean        1257416 ns      1257420 ns          100 hashes/sec=795.442k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/4/2/1000_mean          1258726 ns      1258733 ns          100 hashes/sec=794.643k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/16384/2/1000_mean         7330 ns         7330 ns          100 hashes/sec=136.466M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/512/2/1000_mean           6815 ns         6815 ns          100 hashes/sec=146.759M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/128/2/1000_mean       324572 ns       324574 ns          100 hashes/sec=3.08282M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/4/2/1000_mean         321442 ns       321444 ns          100 hashes/sec=3.11216M/s
uncompressed count master PR
64/16384/2/1000_mean 132.262 M/s 136.466 M/s
64/512/2/1000_mean 141.822 M/s 146.759 M/s
8192/128/2/1000_mean 3.39248 M/s 3.08282 M/s
8192/4/2/1000_mean 3.41471 M/s 3.11216 M/s
compressed count master PR
64/16384/2/1000_mean 62.1336 M/s 62.3961 M/s
64/512/2/1000_mean 62.0838 M/s 62.4130 M/s
8192/128/2/1000_mean 826.976 k/s 795.442 k/s
8192/4/2/1000_mean 827.572 k/s 794.643 k/s

// Each iteration can handle 64 bits, so we need to iterate `((rhs.size() + 63) >> 6` many times
for (size_t batch = 0, bin = 0; batch < ((rhs.size() + 63) >> 6); bin = 64 * ++batch)
// Jump to the next 1 and return the number of places jumped in the bit_sequence
auto jump_to_next_1bit = [](size_t & x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto jump_to_next_1bit = [](size_t & x)
auto jump_to_next_1bit = [] (size_t & x)

@marehr
Copy link
Member Author

marehr commented Jan 25, 2022

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Honestly, wasn't expecting this. Thank you! I think you misunderstood me, the old code wasn't any more suboptimal than the current one, it was just harder to read. As you can see here, the https://godbolt.org/z/o64srTo38 instructions are pretty much the same (the compiler seems to optimise the additional if conditional completely out). What I meant with double the instructions was when converting this double, nested loop into an iterator approach. So all the differences should just be measurement variances.

@eseiler
Copy link
Member

eseiler commented Jan 26, 2022

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Honestly, wasn't expecting this. Thank you! I think you misunderstood me, the old code wasn't any more suboptimal than the current one, it was just harder to read. As you can see here, the https://godbolt.org/z/o64srTo38 instructions are pretty much the same (the compiler seems to optimise the additional if conditional completely out). What I meant with double the instructions was when converting this double, nested loop into an iterator approach. So all the differences should just be measurement variances.

But the iterator-stuff is follow-up?
For this PR, I just plugged in some std::vectors
https://godbolt.org/z/ebG5EE4bK

@marehr
Copy link
Member Author

marehr commented Jan 26, 2022

But the iterator-stuff is follow-up?

No, this was the reason for me to drop the iterator idea completely and make it a callback approach, if we want to expose a way to enumerate the bins in the binning_vector.

@eseiler
Copy link
Member

eseiler commented Jan 26, 2022

Ah alright, so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants