Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Make the final chunk only as long as the longest sequence in the collection. #1378

Merged
merged 2 commits into from
Dec 15, 2019

Conversation

rrahn
Copy link
Contributor

@rrahn rrahn commented Nov 23, 2019

Before this change the last chunk always had a size of simd_traits<simd_t>::length, which caused some issues when dealing with the last chunk, since the last elements in the transformed sequence might not have carried any information anymore if the size of the longest sequence was not a multiple of the simd size. Instead, the new code tracks the end of the last column in the final chunk and returns a span that only covers the sequences.

@rrahn rrahn requested a review from marehr November 23, 2019 15:59
@rrahn rrahn force-pushed the feature/simd_view_stops_at_longest_sequence branch from fddce63 to e93dcfc Compare November 23, 2019 19:36
@rrahn rrahn force-pushed the feature/simd_view_stops_at_longest_sequence branch from e93dcfc to 483c96d Compare November 23, 2019 20:07
@codecov
Copy link

codecov bot commented Nov 23, 2019

Codecov Report

Merging #1378 into master will increase coverage by 0.02%.
The diff coverage is 87.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1378      +/-   ##
==========================================
+ Coverage   97.54%   97.56%   +0.02%     
==========================================
  Files         231      231              
  Lines        8747     8755       +8     
==========================================
+ Hits         8532     8542      +10     
+ Misses        215      213       -2
Impacted Files Coverage Δ
include/seqan3/core/simd/view_to_simd.hpp 92.92% <87.17%> (+2.44%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4fa221...3c06b17. Read the comment docs.

@marehr
Copy link
Member

marehr commented Nov 29, 2019

@rrahn I looked at the code cov and there are some untested places. Can you verify if it is intended that those places are not tested yet?

Copy link
Member

@marehr marehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review, thank you for the pr. I think we should sit together next week and should look at that together in person.

include/seqan3/core/simd/view_to_simd.hpp Show resolved Hide resolved
uint8_t pos = j * chunk_size + i; // matrix entry to fill
if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this remaining_elements and it can be bigger than chunk_size?

if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos];
max_size_of_last_chunk = std::max(max_size_of_last_chunk, current_chunk_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this computed ahead of time? The remaining_elements will only decrease so only the first time this inner loop is executed, max_size_of_last_chunk will be set. All further execution of the outer loop the result will not change anymore and we waste computation.

if (cached_sentinel[i] - cached_iter[i] >= max_size) // not in final block
uint8_t pos = chunk_pos * chunk_size + sequence_pos; // matrix entry to fill
size_t current_chunk_size = cached_sentinel[sequence_pos] - cached_iter[sequence_pos];
max_size_of_last_chunk = std::max(max_size_of_last_chunk, current_chunk_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this computed a head of time? The remaining_elements will only decrease so only the first time this inner loop is executed, max_size_of_last_chunk will be set. All other times we waste computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it feels that the entire loop could be improved.
But that was not my primary goal here. So I didn't spent much time on making it real tight. Just wanted to get it working for now. When benchmarking suggests that there would be a measurable gain in doing this more optimal one can do it at a later point. But if you see immediate improvements we can discuss this and improve it.

include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved

if (final_chunk)
{ // Store the size of the last chunk.
size_t size_of_last_chunk = max_size_of_last_chunk % chunk_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_size_of_last_chunk is misleading, because it can be bigger than chunk_size. Or are you referring to a different chunk? If think max_remaining_elements would be better.

@rrahn rrahn requested a review from marehr December 4, 2019 11:05
include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
include/seqan3/core/simd/view_to_simd.hpp Outdated Show resolved Hide resolved
@rrahn rrahn requested a review from marehr December 6, 2019 09:41
@rrahn rrahn requested a review from marehr December 9, 2019 22:53
@rrahn
Copy link
Contributor Author

rrahn commented Dec 12, 2019

@marehr polite ping

@@ -357,25 +350,25 @@ class view_to_simd<urng_t, simd_t>::iterator_type
if constexpr (chunk_size == simd_traits<max_simd_type>::length / 2) // upcast into 2 vectors.
{
return std::array{simd::upcast<simd_t>(extract_halve<0>(row)), // 1. halve
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. halve
simd::upcast<simd_t>(extract_halve<1>(row))}; // 2. half

halve is the verb ;)

size_t max_distance = 0;
for (auto && [it, sent] : views::zip(iterators_before_update, cached_sentinel))
max_distance = std::max<size_t>(std::ranges::distance(it, sent), max_distance);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra newline 💅

@rrahn rrahn force-pushed the feature/simd_view_stops_at_longest_sequence branch from 10d0f78 to 3c06b17 Compare December 15, 2019 09:27
@rrahn rrahn merged commit 6f7f3c2 into seqan:master Dec 15, 2019
@rrahn rrahn deleted the feature/simd_view_stops_at_longest_sequence branch January 16, 2020 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants