-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEST] Index Benchmark #960
Conversation
Codecov Report
@@ Coverage Diff @@
## master #960 +/- ##
=======================================
Coverage 97.63% 97.63%
=======================================
Files 235 235
Lines 8904 8904
=======================================
Hits 8693 8693
Misses 211 211
Continue to review full report at Codecov.
|
Let me know if I should change anything, I want to get these off of my backlog I'm not sure if there's an error in my code or if these results reflect reality but it seems strange that seqan3 has so much trouble only for small texts but not longer ones
|
@Clemapfel thank you for this work. Please let us know if you can finish this PR, otherwise we will take it over. |
b748b17
to
446fb14
Compare
Size = 50
Size = 50'000
Size = 500'000
SeqAn3 seems to be asymptocially faster than SeqAn2 😁 |
446fb14
to
f55d8ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions and ideas for improvement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you! 👍
b->Args({500, 1'000}); | ||
} | ||
|
||
struct fm_index_seqan2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an enum?
class enum index_tag
{
fm_index_seqan2,
fm_index_seqan3,
bi_fm_index_seqan2,
bi_fm_index_seqan3
}
|
||
inner_rng_t make_single_seqan3(size_t seed = 0) | ||
{ | ||
// The characters with rank 254 and 255 are reserved for the indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually test a text with an alphabet of such a large size? Does our index (the SDSL) even work?
I cannot see a respective test in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string
? 😁
|
||
#if SEQAN3_HAS_SEQAN2 | ||
// Map the SeqAn3 alphabet to its SeqAn2 equivalent. | ||
using seqan2_alphabet_t = std::conditional_t<std::same_as<alphabet_t, seqan3::dna4>, seqan::Dna, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to do this here?
Can't you write directly into the benchmark template:
BENCHMARK_TEMPLATE(index_benchmark, bi_fm_index_seqan3, std::vector<seqan3::dna4>)->Apply(arguments);
#if SEQAN3_HAS_SEQAN2
- BENCHMARK_TEMPLATE(index_benchmark, bi_fm_index_seqan2, std::vector<seqan3::dna4>)->Apply(arguments);
+ BENCHMARK_TEMPLATE(index_benchmark, bi_fm_index_seqan2, seqan::String<seqan::Dna>)->Apply(arguments);
#endif // SEQAN3_HAS_SEQAN2
This also seems error-prone if change the alphabet to dna5
for testing, this function will fall back to char
for seqan2 wihtout any error
using fm_index_seqan2_t = seqan::Index<seqan2_rng_t, seqan::FMIndex<void, index_cfg>>; | ||
using bi_fm_index_seqan2_t = seqan::Index<seqan2_rng_t, seqan::BidirectionalIndex<seqan::FMIndex<void, index_cfg>>>; | ||
|
||
seqan2_rng_t make_sequence_seqan2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should introduce a seqan3::test::generate_seqan2_sequence
into the global test include header since this is something we will need more often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seqan3/test/include/seqan3/test/performance/sequence_generator.hpp
Lines 77 to 92 in 1c2b720
auto generate_sequence_seqan2(size_t const len = 500, | |
size_t const variance = 0, | |
size_t const seed = 0) | |
{ | |
std::mt19937 gen(seed); | |
std::uniform_int_distribution<uint8_t> dis_alpha(0, seqan::ValueSize<alphabet_t>::VALUE - 1); | |
std::uniform_int_distribution<size_t> dis_length(len - variance, len + variance); | |
seqan::String<alphabet_t> sequence; | |
size_t length = dis_length(gen); | |
for (size_t l = 0; l < length; ++l) | |
appendValue(sequence, alphabet_t{dis_alpha(gen)}); | |
return sequence; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe templatize over the range_type?
#if SEQAN3_HAS_SEQAN2 | ||
if constexpr (std::same_as<index_tag_t, fm_index_seqan2> || std::same_as<index_tag_t, bi_fm_index_seqan2>) | ||
{ | ||
typename sequence_generator<index_tag_t, rng_t>::index_t index{generator.seqan2}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you store this in a member?
typename sequence_generator<index_tag_t, rng_t>::index_t index{generator.seqan2}; | |
typename sequence_generator<index_tag_t, rng_t>::index_t index{generator.make_seqan2_sequence()}; |
} | ||
} | ||
|
||
BENCHMARK_TEMPLATE(index_benchmark, fm_index_seqan3, std::vector<seqan3::dna4>)->Apply(arguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are generating the same sequences for every new benchmark right? This is quite inefficient for these many benchmarks :/ make we could store them somehow?
Sorry that this will require some more refactoring but we don't want the benchmark to take too much time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sequence generation isn't actually taking long, but I can refactor the tests.
The whole point of the class was to use the same sequences for seqan2 and seqan3, but I can just generate them individually and store them.
In this case, I would also block this PR by #920 such that I have the numeric sequence generator for generating char-sequences for seqan3 (numeric-sequence -> to_char?), because I think this is easier than generating a char sequence and then filtering out the ranks that are too high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just store a long enough sequence, and then do | seqan3::views::take(length)
on it. the view should produce overhead, but then again I'm not entirely sure ... but it would be a quite elegant solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I would also block this PR by #920 such that I have the numeric sequence generator for generating char-sequences for seqan3 (numeric-sequence -> to_char?), because I think this is easier than generating a char sequence and then filtering out the ranks that are too high.
👍
I could just store a long enough sequence, and then do | seqan3::views::take(length) on it. the view should produce overhead, but then again I'm not entirely sure ... but it would be a quite elegant solution?
Indeed this would be elegant :) And I think it would be comparable if you use
seqan3::views::take(length) | seqan3::views::to<std::vector>
? This, of course, involves copying but this should not be a major performance bottleneck.. Or you could use vies::take for seqan3 and Infix<>
for seqan2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, something like this. I'll try it out and then we can talk about it on monday 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅
#include <seqan3/search/fm_index/all.hpp> | ||
#include <seqan3/test/performance/sequence_generator.hpp> | ||
#include <seqan3/test/seqan2.hpp> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove newline
| seqan3::views::to<std::string>}; | ||
}; | ||
|
||
sequence_store_seqan3 store3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequence_store_seqan3 store3; | |
sequence_store_seqan3 store{}; |
I like the design :)
if constexpr (index_tag == tag::fm_index) | ||
{ | ||
seqan::Index<rng_t, seqan::FMIndex<void, index_cfg>> index{sequence}; | ||
seqan::indexCreate(index, seqan::FibreSALF()); | ||
} | ||
else | ||
{ | ||
seqan::Index<rng_t, seqan::BidirectionalIndex<seqan::FMIndex<void, index_cfg>>> index{sequence}; | ||
seqan::indexCreate(index, seqan::FibreSALF()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using index_type = std::conditional_t<index_tag == tag::fm_index, seqan::FMIndex<void, index_cfg>, seqan::BidirectionalIndex<seqan::FMIndex<void, index_cfg>>>;
seqan::Index<rng_t, index_type> index{sequence};
seqan::indexCreate(index, seqan::FibreSALF());
instead of the if else clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and clean commit history
if constexpr (dimension == 1) | ||
sequence = std::move(inner_sequence); | ||
else | ||
for (int32_t i = 0; i < state.range(1); ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multi line else block with braces
81cc4f0
to
a1866a7
Compare
benchmark construction of fm- and bi-fm-index and compare with seqan2
resolves #1566