-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Make the final chunk only as long as the longest sequence in the collection. #1378
[FEATURE] Make the final chunk only as long as the longest sequence in the collection. #1378
Conversation
fddce63
to
e93dcfc
Compare
…n the collection.
e93dcfc
to
483c96d
Compare
Codecov Report
@@ Coverage Diff @@
## master #1378 +/- ##
==========================================
+ Coverage 97.54% 97.56% +0.02%
==========================================
Files 231 231
Lines 8747 8755 +8
==========================================
+ Hits 8532 8542 +10
+ Misses 215 213 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of review, thank you for the pr. I think we should sit together next week and should look at that together in person.
uint8_t pos = j * chunk_size + i; // matrix entry to fill | ||
if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block | ||
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill | ||
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this remaining_elements
and it can be bigger than chunk_size
?
if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block | ||
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill | ||
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos]; | ||
max_size_of_last_chunk = std::max(max_size_of_last_chunk, current_chunk_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this computed ahead of time? The remaining_elements
will only decrease so only the first time this inner loop is executed, max_size_of_last_chunk
will be set. All further execution of the outer loop the result will not change anymore and we waste computation.
if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block | ||
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill | ||
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos]; | ||
max_size_of_last_chunk = std::max(max_size_of_last_chunk, current_chunk_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this computed a head of time? The remaining_elements
will only decrease so only the first time this inner loop is executed, max_size_of_last_chunk
will be set. All other times we waste computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it feels that the entire loop could be improved.
But that was not my primary goal here. So I didn't spent much time on making it real tight. Just wanted to get it working for now. When benchmarking suggests that there would be a measurable gain in doing this more optimal one can do it at a later point. But if you see immediate improvements we can discuss this and improve it.
|
||
if (final_chunk) | ||
{ // Store the size of the last chunk. | ||
size_t size_of_last_chunk = max_size_of_last_chunk % chunk_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_size_of_last_chunk
is misleading, because it can be bigger than chunk_size. Or are you referring to a different chunk? If think max_remaining_elements
would be better.
@marehr polite ping |
@@ -357,25 +350,25 @@ class view_to_simd<urng_t, simd_t>::iterator_type | |||
if constexpr (chunk_size == simd_traits<max_simd_type>::length / 2) // upcast into 2 vectors. | |||
{ | |||
return std::array{simd::upcast<simd_t>(extract_halve<0>(row)), // 1. halve | |||
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve | |||
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve | |
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. half |
halve is the verb ;)
size_t max_distance = 0; | ||
for (auto && [it, sent] : views::zip(iterators_before_update, cached_sentinel)) | ||
max_distance = std::max<size_t>(std::ranges::distance(it, sent), max_distance); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra newline 💅
10d0f78
to
3c06b17
Compare
Before this change the last chunk always had a size of simd_traits<simd_t>::length, which caused some issues when dealing with the last chunk, since the last elements in the transformed sequence might not have carried any information anymore if the size of the longest sequence was not a multiple of the simd size. Instead, the new code tracks the end of the last column in the final chunk and returns a span that only covers the sequences.