[MISC] simplify enumerating all bins in a binning_bitvector #2930

marehr · 2022-01-24T00:04:31Z

This PR simplifies how the bins of a binning_bitvector are enumerated.

It removes the special case if the bit_sequence is 1 << 63 (most significant bit set) by realizing that the before mentioned UB (undefined behaviour) only happens when shifting by 1 << 64, any smaller shift value is fine.

This isn't the case here, as std::countr_zero will only return 64 for 0, but this case is checked by the for loop. So we know that the number of zeros is at most <= 63. The second insight is to do a second tmp >>= 1 instead of incrementing step++ before the actual shift, this will ensure that the shifted by value will be at most <= 63 and thus eliminating the previous edge case.

bin += step++;
tmp >>= step;

// is the same as
bin += step;
step++;
tmp >>= step;

// is the same as
bin += step;
tmp >>= step; // step is <= 63
tmp >>= 1;

Idea moved to seqan/product_backlog#412

vercel · 2022-01-24T00:04:36Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/4zhzTzrv7FZKp44KQDnEHN1PLwcN
✅ Preview: https://seqan3-git-fork-marehr-ibfbins-seqan.vercel.app

codecov · 2022-01-24T00:10:49Z

Codecov Report

Merging #2930 (dcc4f95) into master (58c7109) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #2930   +/-   ##
=======================================
  Coverage   98.28%   98.28%           
=======================================
  Files         267      267           
  Lines       11462    11466    +4     
=======================================
+ Hits        11265    11269    +4     
  Misses        197      197

Impacted Files	Coverage Δ
...n3/search/dream_index/interleaved_bloom_filter.hpp	`100.00% <100.00%> (ø)`
include/seqan3/search/views/minimiser.hpp	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58c7109...dcc4f95. Read the comment docs.

Irallia

LGFM, thank you!

eseiler

Very nice!
Just one 💅

Please merge when satisfied

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Click for details

Master

$ interleaved_bloom_filter_benchmark --benchmark_enable_random_interleaving=true --benchmark_repetitions=100 --benchmark_min_time=0.1 --benchmark_filter=bulk_count_benchmark --benchmark_display_aggregates_only=yes | grep --color=auto mean | sort
2022-01-25T17:40:17+01:00
Running interleaved_bloom_filter_benchmark
Run on (16 X 3892.69 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.17, 0.35, 0.28
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/16384/2/1000_mean          16095 ns        16095 ns          100 hashes/sec=62.1336M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/512/2/1000_mean            16109 ns        16109 ns          100 hashes/sec=62.0838M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/128/2/1000_mean        1209346 ns      1209351 ns          100 hashes/sec=826.976k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/4/2/1000_mean          1208471 ns      1208476 ns          100 hashes/sec=827.572k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/16384/2/1000_mean         7562 ns         7562 ns          100 hashes/sec=132.262M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/512/2/1000_mean           7052 ns         7052 ns          100 hashes/sec=141.822M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/128/2/1000_mean       294799 ns       294800 ns          100 hashes/sec=3.39248M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/4/2/1000_mean         292890 ns       292891 ns          100 hashes/sec=3.41471M/s

PR

$ interleaved_bloom_filter_benchmark --benchmark_enable_random_interleaving=true --benchmark_repetitions=100 --benchmark_min_time=0.1 --benchmark_filter=bulk_count_benchmark --benchmark_display_aggregates_only=yes | grep mean | sort
2022-01-25T17:36:40+01:00
Running interleaved_bloom_filter_benchmark
Run on (16 X 3892.69 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.24, 0.25, 0.22
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/16384/2/1000_mean          16028 ns        16028 ns          100 hashes/sec=62.3961M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/64/512/2/1000_mean            16024 ns        16024 ns          100 hashes/sec=62.413M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/128/2/1000_mean        1257416 ns      1257420 ns          100 hashes/sec=795.442k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::compressed>>/8192/4/2/1000_mean          1258726 ns      1258733 ns          100 hashes/sec=794.643k/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/16384/2/1000_mean         7330 ns         7330 ns          100 hashes/sec=136.466M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/64/512/2/1000_mean           6815 ns         6815 ns          100 hashes/sec=146.759M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/128/2/1000_mean       324572 ns       324574 ns          100 hashes/sec=3.08282M/s
bulk_count_benchmark<seqan3::interleaved_bloom_filter<seqan3::data_layout::uncompressed>>/8192/4/2/1000_mean         321442 ns       321444 ns          100 hashes/sec=3.11216M/s

uncompressed count	master	PR
64/16384/2/1000_mean	132.262 M/s	136.466 M/s
64/512/2/1000_mean	141.822 M/s	146.759 M/s
8192/128/2/1000_mean	3.39248 M/s	3.08282 M/s
8192/4/2/1000_mean	3.41471 M/s	3.11216 M/s

compressed count	master	PR
64/16384/2/1000_mean	62.1336 M/s	62.3961 M/s
64/512/2/1000_mean	62.0838 M/s	62.4130 M/s
8192/128/2/1000_mean	826.976 k/s	795.442 k/s
8192/4/2/1000_mean	827.572 k/s	794.643 k/s

eseiler · 2022-01-24T10:39:48Z

include/seqan3/search/dream_index/interleaved_bloom_filter.hpp

-        // Each iteration can handle 64 bits, so we need to iterate `((rhs.size() + 63) >> 6` many times
-        for (size_t batch = 0, bin = 0; batch < ((rhs.size() + 63) >> 6); bin = 64 * ++batch)
+        // Jump to the next 1 and return the number of places jumped in the bit_sequence
+        auto jump_to_next_1bit = [](size_t & x)


Suggested change

auto jump_to_next_1bit = [](size_t & x)

auto jump_to_next_1bit = [] (size_t & x)

…ry friend declaration

marehr · 2022-01-25T22:27:28Z

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Honestly, wasn't expecting this. Thank you! I think you misunderstood me, the old code wasn't any more suboptimal than the current one, it was just harder to read. As you can see here, the https://godbolt.org/z/o64srTo38 instructions are pretty much the same (the compiler seems to optimise the additional if conditional completely out). What I meant with double the instructions was when converting this double, nested loop into an iterator approach. So all the differences should just be measurement variances.

eseiler · 2022-01-26T08:36:26Z

Even though there are much fewer and easier to understand LOC as well as fewer instructions in the assembly, the benchmark isn't always faster. However, the benchmark parameters are quite small.

Honestly, wasn't expecting this. Thank you! I think you misunderstood me, the old code wasn't any more suboptimal than the current one, it was just harder to read. As you can see here, the https://godbolt.org/z/o64srTo38 instructions are pretty much the same (the compiler seems to optimise the additional if conditional completely out). What I meant with double the instructions was when converting this double, nested loop into an iterator approach. So all the differences should just be measurement variances.

But the iterator-stuff is follow-up?
For this PR, I just plugged in some std::vectors
https://godbolt.org/z/ebG5EE4bK

marehr · 2022-01-26T13:30:25Z

But the iterator-stuff is follow-up?

No, this was the reason for me to drop the iterator idea completely and make it a callback approach, if we want to expose a way to enumerate the bins in the binning_vector.

eseiler · 2022-01-26T14:11:42Z

Ah alright, so

iterators are a bad way to access/enumerate the binning_vector (https://godbolt.org/z/o64srTo38)
the changes introduced in this PR make the code more readable
the changes introduced in this PR seem to be more efficient/have fewer instructions (https://godbolt.org/z/ebG5EE4bK)

marehr requested review from a team and Irallia and removed request for a team January 24, 2022 00:04

vercel bot deployed to Preview January 24, 2022 00:04 View deployment

vercel bot deployed to Preview January 24, 2022 00:16 View deployment

eseiler mentioned this pull request Jan 24, 2022

Enumerating binning_bitvector bins seqan/product_backlog#412

Open

Irallia approved these changes Jan 24, 2022

View reviewed changes

Irallia requested review from a team and eseiler and removed request for a team January 24, 2022 10:00

eseiler approved these changes Jan 25, 2022

View reviewed changes

marehr added 2 commits January 25, 2022 23:17

[MISC] simplify enumerating all bins in a binning_bitvector

9d15c31

[MISC] use public API in seqan3::counting_vector and remove unnecessa…

dcc4f95

…ry friend declaration

marehr force-pushed the ibf_bins branch from f4ae6e0 to dcc4f95 Compare January 25, 2022 22:32

vercel bot deployed to Preview January 25, 2022 22:32 View deployment

marehr enabled auto-merge January 25, 2022 22:33

marehr merged commit 25606c4 into seqan:master Jan 25, 2022

marehr deleted the ibf_bins branch January 25, 2022 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MISC] simplify enumerating all bins in a binning_bitvector #2930

[MISC] simplify enumerating all bins in a binning_bitvector #2930

marehr commented Jan 24, 2022 •

edited by eseiler

Loading

vercel bot commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading

Irallia left a comment

eseiler left a comment

eseiler Jan 24, 2022

marehr commented Jan 25, 2022 •

edited

Loading

eseiler commented Jan 26, 2022

marehr commented Jan 26, 2022

eseiler commented Jan 26, 2022 •

edited

Loading

	auto jump_to_next_1bit = [](size_t & x)
	auto jump_to_next_1bit = [] (size_t & x)

[MISC] simplify enumerating all bins in a binning_bitvector #2930

[MISC] simplify enumerating all bins in a binning_bitvector #2930

Conversation

marehr commented Jan 24, 2022 • edited by eseiler Loading

vercel bot commented Jan 24, 2022 • edited Loading

codecov bot commented Jan 24, 2022 • edited Loading

Codecov Report

Irallia left a comment

Choose a reason for hiding this comment

eseiler left a comment

Choose a reason for hiding this comment

Master

PR

eseiler Jan 24, 2022

Choose a reason for hiding this comment

marehr commented Jan 25, 2022 • edited Loading

eseiler commented Jan 26, 2022

marehr commented Jan 26, 2022

eseiler commented Jan 26, 2022 • edited Loading

marehr commented Jan 24, 2022 •

edited by eseiler

Loading

vercel bot commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading

marehr commented Jan 25, 2022 •

edited

Loading

eseiler commented Jan 26, 2022 •

edited

Loading