Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concepts for sequence and alignment #60

Closed
3 of 4 tasks
marehr opened this issue Apr 16, 2020 · 3 comments · Fixed by seqan/seqan3#2359
Closed
3 of 4 tasks

Concepts for sequence and alignment #60

marehr opened this issue Apr 16, 2020 · 3 comments · Fixed by seqan/seqan3#2359
Labels
needs refinement A story that was not discussed and/or estimated by the team yet but is planned for upcoming sprints.
Milestone

Comments

@marehr
Copy link
Member

marehr commented Apr 16, 2020

We currently have a concept for aligned_sequence that does not build upon a sequence.

Tasks from resolution

  • Introduce seqan3::sequence concept (Range sequence concept seqan3#1898)
  • Change seqan3::aligned_sequence concept to not include writable requirements
  • Introduce the seqan3::writable_aligned_sequence
  • Introduce the seqan3::detail::pairwise_alignment concept

Sequence Concept

We talked and decided(?) some time ago that a sequence is a biological sequence and means an range over an alphabet type.

Only caveat is, that std::vector<int> is a sequence too. (we could also exclude builtin types and or just int-types).

We use that definition of a sequence implicitly in the aligned_sequence

template <typename t>
SEQAN3_CONCEPT aligned_sequence =
    std::ranges::forward_range<t> && // here
    alphabet<reference_t<t>> && // here
    weakly_assignable_from<reference_t<t>, gap const &> &&
    requires { typename detail::unaligned_seq_t<t>; } &&
    requires (t v, detail::unaligned_seq_t<t> unaligned)
    {
        { insert_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>; // global functions for generic usability
        { insert_gap(v, std::ranges::begin(v), 2) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v), std::ranges::end(v)) } -> std::ranges::iterator_t<t>;
        { assign_unaligned(v, unaligned) } -> void;
    };

https://github.com/seqan/seqan3/blob/70b4e57575ae1f92be2f0f5e70e0ba7d090e492d/include/seqan3/alignment/aligned_sequence/aligned_sequence_concept.hpp#L203-L216

and in the debugstream overload for ranges

template <std::ranges::input_range rng_t, typename char_t> // here
inline debug_stream_type<char_t> & operator<<(debug_stream_type<char_t> & s, rng_t && r)
// ...
    if constexpr (alphabet<reference_t<rng_t>> && // here
                  !detail::is_uint_adaptation_v<remove_cvref_t<reference_t<rng_t>>>)
    {
        for (auto && l : r)
            s << l;
    }

https://github.com/seqan/seqan3/blob/70b4e57575ae1f92be2f0f5e70e0ba7d090e492d/include/seqan3/core/detail/debug_stream_range.hpp#L97-L102

Sequence Concept Proposal

template <typename sequence_t>
concept sequence = std::ranges::input_range<sequence_t> &&
                   alphabet<std::ranges::range_reference_t<sequence_t>>;

This conflicts a bit with a aligned_sequence that is defined like

template <typename t>
SEQAN3_CONCEPT aligned_sequence =
    std::ranges::forward_range<t> &&
    alphabet<reference_t<t>> &&
    weakly_assignable_from<reference_t<t>, gap const &> &&
    requires { typename detail::unaligned_seq_t<t>; } &&
    requires (t v, detail::unaligned_seq_t<t> unaligned)
    {
        { insert_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>; // global functions for generic usability
        { insert_gap(v, std::ranges::begin(v), 2) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v), std::ranges::end(v)) } -> std::ranges::iterator_t<t>;
        { assign_unaligned(v, unaligned) } -> void;
    };

https://github.com/seqan/seqan3/blob/70b4e57575ae1f92be2f0f5e70e0ba7d090e492d/include/seqan3/alignment/aligned_sequence/aligned_sequence_concept.hpp#L203-L216

and requires a forward_range (why?).

Aligned Sequence Concept Proposal

I would propose this to change to

template <typename t>
SEQAN3_CONCEPT aligned_sequence =
    sequence<t> &&
    std::ranges::forward_range<t> &&
    //...

Another problem with aligned_sequence is that it requires to be writable. We should split this, like alphabet and writable_alphabet into two concepts.

template <typename t>
SEQAN3_CONCEPT aligned_sequence =
    sequence<t> &&
    // std::ranges::forward_range<t> && // << does not need this anymore, right? because we don't use interfaces with iterators anymore
    // allow static_cast<std::ranges::range_value_t<t>>(gap{}) // or range_reference_t?
    // i.e. a gap should be convertible to the value type of the range. that means that the underlying alphabet can handle and supports the gap type, like seqan3::gapped.
    std::convertible_to<gap, std::ranges::range_value_t<t>>
template <typename t>
SEQAN3_CONCEPT writable_aligned_sequence =
    aligned_sequence<t> &&
    std::ranges::forward_range<t> && // this is needed to use the writable functions
    // allow std::declval<std::ranges::range_reference_t<t>>() = std::declval<gap const &>() 
    // i.e. the value of the range can be assigned with a gap symbol
    weakly_assignable_from<reference_t<t>, gap const &> &&
    requires { typename detail::unaligned_seq_t<t>; } && // this is only needed for writable, right?
    requires (t v, detail::unaligned_seq_t<t> unaligned)
    {
        { insert_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>; // global functions for generic usability
        { insert_gap(v, std::ranges::begin(v), 2) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v)) } -> std::ranges::iterator_t<t>;
        { erase_gap(v, std::ranges::begin(v), std::ranges::end(v)) } -> std::ranges::iterator_t<t>;
        { assign_unaligned(v, unaligned) } -> void;
    };

Hannes also had a propasal

aligned_range.pdf

Alignment Concept

We currently have an implicit definition what an alignment is in the debugstream

    requires !std::ranges::input_range<tuple_t> && // stupid copy'n'paste error that is unnecessary
             !alphabet<tuple_t> && // exclude alphabet_tuple_base  // stupid copy'n'paste error that is unnecessary
             tuple_like<remove_cvref_t<tuple_t>> &&
             detail::all_satisfy_aligned_seq<detail::tuple_type_list_t<remove_cvref_t<tuple_t>>>

https://github.com/seqan/seqan3/blob/70b4e57575ae1f92be2f0f5e70e0ba7d090e492d/include/seqan3/alignment/aligned_sequence/aligned_sequence_concept.hpp#L556-L559

In this definition we assume that an alignment is always a tuple of aligned_sequences.

template <tuple_like alignment_type>
//!\cond
    requires std::tuple_size_v<remove_cvref_t<alignment_type>> == 2 &&
             detail::all_satisfy_aligned_seq<detail::tuple_type_list_t<alignment_type>>
//!\endcond
inline void alignment_from_cigar(alignment_type & alignment, std::vector<cigar> const & cigar_vector)

https://github.com/seqan/seqan3/blob/0ff2e117952743f081735df9956be4c512f4ccba/include/seqan3/io/alignment_file/detail.hpp#L331-L336

Uses also tuple_like, but is clearly for pairwise_alignment. This also an instance of a writable alignment.

template <
    tuple_like alignment_t,
    typename database_t,
    typename query_t,
    typename trace_matrix_t>
//!\cond
    requires matrix<remove_cvref_t<trace_matrix_t>> &&
             std::same_as<typename remove_cvref_t<trace_matrix_t>::value_type, trace_directions> &&
             detail::all_satisfy_aligned_seq<detail::tuple_type_list_t<alignment_t>>
//!\endcond

https://github.com/seqan/seqan3/blob/0ff2e117952743f081735df9956be4c512f4ccba/include/seqan3/alignment/matrix/alignment_trace_algorithms.hpp#L95-L104

This also an instance of a writable alignment. (This interface is deprecated and will be removed)


My question is if a tuple-like is always the right data structure?
I thought a multiple sequence alignment is a alignment over multiple sequences. Are the number of those multiple sequences always known at compile time? If not a multiple alignment should not be defined over a tuple, or at least not exclusively.

For a pairwise_alignment this makes sense because we know that we always have 2 sequences and I find it right to make sure at compile time that we have two sequences.

Alignment Concept Proposal

(1) Make pairwise alignment special and only allow ranges of aligned sequences

template <typename pair_t>
concept pair_like = tuple_like<pair_t> && std::tuple_size_v<pair_t> == 2;

template <typename pairwise_alignment_t>
concept pairwise_alignment = pair_like<pairwise_alignment_t> && aligned_sequence<std::tuple_element<0, pairwise_alignment_t>> && aligned_sequence<std::tuple_element<1, pairwise_alignment_t>>
// ???
template <typename ...aligned_sequences_t>
concept multiple_alignment = (aligned_sequence<aligned_sequences_t> && ... && true)
template <typename alignment_t>
concept alignment = pairwise_alignment<alignment_t> ||
                    requires() // range of aligned_sequences
                    {
                        requires aligned_sequence<std::ranges::range_reference_t<alignment_t>>
                    };

(2) Keep tuple as alignment (currently indirectly used)

// maybe fixed_size_multiple_alignment, n_ary_alignment, tuple_alignment, multiple_n_ary_alignment, multiple_tuple_alignment?
template <typename alignment_t>
concept alignment = tuple_like<alignment_t> && detail::all_satisfy_aligned_seq<detail::tuple_type_list_t<remove_cvref_t<alignment_t>>>;
template <typename alignment_t>
concept pairwise_alignment = alignment<alignment_t> && std::tuple_size_t<alignment_t> == 2;

(3) Combine (1) and (2); allow tuple and ranges of aligned_sequences.

But what would that look like?

@marehr marehr added the needs refinement A story that was not discussed and/or estimated by the team yet but is planned for upcoming sprints. label Apr 16, 2020
@rrahn
Copy link
Contributor

rrahn commented Apr 30, 2020

I agree with the sequence concept and the aligned_sequence concept as well as the writable_aligned_sequence concept.
I am unsure about the alignment concepts though. In general a pairwise sequence alignment is always a multiple sequence alignment as well. So I would be confused if I do compute a multiple_alignment with two sequences and the result is either a pairwise sequence alignment but not a multiple sequence alignment (if we know that we can return it as pair) or it is a multiple alignment but not a pairwise even though it is one. The question is where do we need to know whether we have a multiple or a pariwise alignment? What can we tell about the differnet semantics? Maybe we need to have a common interface around an alignment irrespective of it's underlying implementation?

@smehringer
Copy link
Member

I agree with the sequence concepts (and I would like to insist on not excluding built-in types because I actually use integer sequences in seqan2).
I also agree with introducing an alignment concept, I wanted to do that when first working with sam bam but back then Hannes or Rene argued we don't need it. I forgot the rational though.
If we use the concept only in internal requires clauses (do you know?) I would propose that it will remain in the detail scope for now. This way we don't need to make a perfect decision on the multiple sequence alignment issue now, where we do not have a use case to answer some of Renes questions (at least I don't).

@rrahn
Copy link
Contributor

rrahn commented May 11, 2020

Core Meeting Notes:

https://github.com/seqan/seqan3/wiki/Core-Meeting-notes#2020-05-11-marehr

We discussed about writable_aligned_sequence and it's name; We stayed with the name, because we already have the term writable_alphabet.

@rrahn rrahn added this to the Sprint 6 milestone Jun 8, 2020
@smehringer smehringer self-assigned this Jun 12, 2020
@smehringer smehringer removed their assignment Aug 10, 2020
marehr added a commit to marehr/seqan3 that referenced this issue Jan 28, 2021
…gnment

This PR finishes task seqan/product_backlog#60
by implementing seqan3::detail::(writable_)pairwise_alignment.
marehr added a commit to marehr/seqan3 that referenced this issue Jan 28, 2021
…gnment

This PR finishes task seqan/product_backlog#60
by implementing seqan3::detail::(writable_)pairwise_alignment.
marehr added a commit to marehr/seqan3 that referenced this issue Feb 2, 2021
…alignment

This PR finishes task seqan/product_backlog#60
by implementing seqan3::detail::(writable_)pairwise_alignment.
marehr added a commit to marehr/seqan3 that referenced this issue Feb 3, 2021
…alignment

This PR finishes task seqan/product_backlog#60
by implementing seqan3::detail::(writable_)pairwise_alignment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs refinement A story that was not discussed and/or estimated by the team yet but is planned for upcoming sprints.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants