Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC, FEATURE] Improve dna4 complement performance and add documentation. #3026

Merged
merged 4 commits into from
Oct 21, 2022

Conversation

smehringer
Copy link
Member

@smehringer smehringer commented Jun 21, 2022

(Aims to) Resolves #1970

@vercel
Copy link

vercel bot commented Jun 21, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
seqan3 ✅ Ready (Inspect) Visit Preview Oct 21, 2022 at 8:18AM (UTC)

@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 21, 2022
@codecov
Copy link

codecov bot commented Jun 21, 2022

Codecov Report

Base: 98.24% // Head: 98.25% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (68c7bd4) compared to base (e723338).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3026   +/-   ##
=======================================
  Coverage   98.24%   98.25%           
=======================================
  Files         275      276    +1     
  Lines       12318    12344   +26     
=======================================
+ Hits        12102    12128   +26     
  Misses        216      216           
Impacted Files Coverage Δ
include/seqan3/alphabet/nucleotide/dna4.hpp 100.00% <100.00%> (ø)
test/include/seqan3/test/performance/simd_dna4.hpp 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 21, 2022
@eseiler
Copy link
Member

eseiler commented Jun 21, 2022

I played around a bit:

https://quick-bench.com/q/RtiW9XqNDG_O9BzGo8gUzKAZQNo

constexpr uint8_t rank(char c) noexcept
{
    uint8_t rank = static_cast<uint8_t>(c);
    rank |= (rank & 0b0001'0000) >> 1;
    rank += 1u;
    rank >>= 2;
    rank &= 0b0000'0011;
    return rank;
}
A = 0100'0001 
C = 0100'0011
G = 0100'0011
T = 0101'1011
a = 0110'0001
c = 0110'0011
g = 0110'0111
t = 0111'0100
rank |= (rank & 0b0001'0000) >> 1; // Set 4th bit to one if 5th bit is set (t and T)
rank += 1u; // Luckily flips the bits we need to flip
rank >>= 2; // Drop lowest 2 bits
rank &= 0b0000'0011; // Mask other bits
return rank;

Works for both lower and upper case chars.
Microbenchmark looks promising.

I still need to test it with this PR and auto-vectorization.

@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 21, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 21, 2022
@eseiler
Copy link
Member

eseiler commented Jun 21, 2022

assign_char<seqan3::dna4>                             80.6 ns         7183289
assign_char<seqan3::rna4>                             79.1 ns         8674274
assign_char<seqan3::simdifyable_dna4>                  184 ns         3756637

to_char<seqan3::dna4>                                 66.0 ns         10194749
to_char<seqan3::rna4>                                 65.8 ns         10030062
to_char<seqan3::simdifyable_dna4>                      100 ns          6781066

====

complement<tag::revcomp_dna4_inline>                 24971 ns         26504
complement<tag::seqan3_dna4>                        303294 ns          2297
complement<tag::seqan3_dna4_vector>                 450026 ns          1557
complement<tag::seqan3_dna4_simdifyable>           2025158 ns           347
complement<tag::seqan3_dna4_simdifyable_vector>      31200 ns         22344

^^^ current assign_char
vvv other assign_char

complement<tag::revcomp_dna4_inline>                 24949 ns         27054
complement<tag::seqan3_dna4>                        302653 ns          2299
complement<tag::seqan3_dna4_vector>                 466759 ns          1490
complement<tag::seqan3_dna4_simdifyable>            710485 ns           961
complement<tag::seqan3_dna4_simdifyable_vector>      38617 ns         18205

Still needs a few things to be fully dna4-compatible.

As a side note, if we were to do the complement in assign_char, we would get a bit closer:

    constexpr simdifyable_dna4 & assign_char(char_type const c) noexcept
    {
        char_type const upper_char = c & 0b0101'1111; // to_upper
        rank_type rank = (upper_char == 'A') * 3 + (upper_char == 'C') * 2 + (upper_char == 'G');
        return assign_rank(rank);
    }
complement<tag::revcomp_dna4_inline>                 25064 ns        27042
complement<tag::seqan3_dna4>                        303093 ns         2147
complement<tag::seqan3_dna4_vector>                 451256 ns         1552
complement<tag::seqan3_dna4_simdifyable>           2029439 ns          340
complement<tag::seqan3_dna4_simdifyable_vector>      28431 ns        24158

@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Jun 21, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 18, 2022
@eseiler eseiler marked this pull request as ready for review October 19, 2022 13:16
@seqan-actions seqan-actions added the lint [INTERNAL] signal for linting label Oct 19, 2022
@seqan-actions seqan-actions removed the lint [INTERNAL] signal for linting label Oct 19, 2022
@seqan-actions seqan-actions added lint [INTERNAL] signal for linting and removed lint [INTERNAL] signal for linting labels Oct 19, 2022
Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-organized the files such that we have the simd_dna4 only in one place

Comment on lines 36 to 37
allocator(allocator &&) = default;
allocator & operator=(allocator &&) = default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete these?
Their usage will lead to double free of forward and reverse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I get it for the copy ctor/assignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the default implementation of the move operator looks something like this:

allocator & operator=(allocator && other) {
    forward = std::move(other.forward);
    reverse = std::move(other.reverse);
}

If I have to allocator a1 and a2 and now move one into the other a1 = std::move(a2)
the original memory of a1.forward and a1.reverse is lost and not freed.
But when a1 and a2 go out of scope, they point to the same memory and there will be a double free on the same memory.

smehringer and others added 4 commits October 21, 2022 10:14
Co-authored-by: Enrico Seiler <enrico.seiler@hotmail.de>
Co-authored-by: Enrico Seiler <enrico.seiler@hotmail.de>
Co-authored-by: Enrico Seiler <enrico.seiler@hotmail.de>
Co-authored-by: Enrico Seiler <enrico.seiler@hotmail.de>
@seqan-actions seqan-actions added the lint [INTERNAL] signal for linting label Oct 21, 2022
@eseiler eseiler requested a review from SGSSGene October 21, 2022 08:15
@seqan-actions seqan-actions removed the lint [INTERNAL] signal for linting label Oct 21, 2022
@eseiler eseiler merged commit 70734be into seqan:master Oct 21, 2022
@smehringer smehringer deleted the dna4_perf_bug branch November 27, 2023 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Alphabet] dna4 complement and SIMD
4 participants