[feature] views::char_strictly_to #2898

h-2 · 2021-11-22T14:32:44Z

No description provided.

vercel · 2021-11-22T14:32:49Z

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/3vgWuKpASAkbALqQj1iZXPwNpifK
✅ Preview: https://seqan3-git-fork-h-2-viewscharsstrictlyto-seqan.vercel.app

codecov · 2021-11-22T14:56:45Z

Codecov Report

Merging #2898 (915dedb) into master (b30f4c7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 915dedb differs from pull request most recent head 9263bf5. Consider uploading reports for the commit 9263bf5 to get more accurate results

@@            Coverage Diff             @@
##           master    #2898      +/-   ##
==========================================
- Coverage   98.28%   98.28%   -0.01%     
==========================================
  Files         266      267       +1     
  Lines       11455    11459       +4     
==========================================
+ Hits        11259    11262       +3     
- Misses        196      197       +1

Impacted Files	Coverage Δ
...nclude/seqan3/alphabet/views/validate_char_for.hpp	`100.00% <100.00%> (ø)`
...de/seqan3/argument_parser/detail/version_check.hpp	`93.12% <0.00%> (-0.77%)`	⬇️
include/seqan3/argument_parser/validators.hpp	`100.00% <0.00%> (ø)`
...nclude/seqan3/io/sam_file/input_format_concept.hpp	`100.00% <0.00%> (ø)`
...clude/seqan3/io/sam_file/output_format_concept.hpp	`100.00% <0.00%> (ø)`
...e/seqan3/alphabet/container/bitpacked_sequence.hpp	`100.00% <0.00%> (ø)`
...qan3/alphabet/container/concatenated_sequences.hpp	`97.23% <0.00%> (ø)`
...an3/alignment/pairwise/edit_distance_algorithm.hpp	`100.00% <0.00%> (ø)`
...n3/alignment/pairwise/policy/affine_gap_policy.hpp	`100.00% <0.00%> (ø)`
...ignment/pairwise/policy/simd_affine_gap_policy.hpp	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b30f4c7...9263bf5. Read the comment docs.

h-2 · 2021-11-22T17:09:54Z

I don't know why Codecov claims that coverage is reduced.

Irallia

LGFM, just some spelling and style things.

include/seqan3/alphabet/views/char_strictly_to.hpp

marehr · 2021-11-26T13:49:15Z

I don't know if this was discussed, but is this REALLY necessary? What's the use case, what's the semantic?

This view mangles input validation with transformation, and it should be a niche use-case for most users. Most users will get their sequences by reading in a file by using our IO. Of course our IO should provide some level of diagnostics if the read in characters are wrong. But should this API really be public in the alphabet module?

If you want to use it in IO, please just make it detail for now.

I would even argue that the current functionality is kind of lacking for a user. It just throws an exception, but the user won't be able to recover from that. They don't know where the offending character is, they don't know how the context around the error looks like, they just know that there was somewhere an invalid character.

From a user POV the same can also be achieved by iterating with std::ranges::all_of over the sequence and test if the characters are all valid.

I'm pro for some diagnostic "library" functionality to give a user a good reason why a char-sequence is invalid, like gcc or clang does for parsing a file, but I don't think a VIEW is the right place for that.

h-2 · 2021-11-26T16:33:39Z

I don't know if this was discussed, but is this REALLY necessary?

Yes, from my point of view it is. I don't want to have page-long arguments on Github for adding effectively 6 lines of code to the library. It has got one approval already, and another team member has been automatically assigned for a second review. If you feel that this change is so fundamental that Enrico is not qualified to approve it and that it needs to be discussed by the entire core-team, please bring it up at the next meeting of the core-team.
In that case, please give me a heads-up, so I can send you a detailed reasoning.

eseiler · 2021-11-28T10:32:48Z

Yes, from my point of view it is. I don't want to have page-long arguments on Github for adding effectively 6 lines of code to the library. It has got one approval already, and another team member has been automatically assigned for a second review. If you feel that this change is so fundamental that Enrico is not qualified to approve it and that it needs to be discussed by the entire core-team, please bring it up at the next meeting of the core-team. In that case, please give me a heads-up, so I can send you a detailed reasoning.

My first question when I saw the PR was (Why) Do we want this? - and this answer did not add anything useful to ... well, anything.

I wanted to discuss this PR in last week's core meeting, but there were only two members present, so we cancelled it.

I think it is very valid to question the usefulness: "Oh wow, there is a wrong character somewhere in my 20 GiB file"- which also translates to whether this should be API or detail. What are even the use cases? Apparently you have some, but decided not to share them.
However, this point is not crucial to me. That what's \experimentalapi is for.

Since from my current understanding, both assign_char_strictly_to and views::char_strictly_to are at least experimental, and hence open for debate in the next minor release, it might not hurt to provide some insight from your point of view.

In my opinion, this view should either be \experimentalapi with some explicit statement of uncertainty, or just in detail for now.
It's OK for me to add this view as-is (either experimental or detail), after providing an iota of reasoning. However, if you have more to say about the whole strictly_to mechanism, I would be delighted to read more than No description provided.

h-2 · 2021-11-29T11:52:15Z

My first question when I saw the PR was (Why) Do we want this? - and this answer did not add anything useful to ... well, anything.

Why didn't you ask me this question?

I think it is very valid to question the usefulness: "Oh wow, there is a wrong character somewhere in my 20 GiB file"- which also translates to whether this should be API or detail.

That is not how I am using it.

What are even the use cases? Apparently you have some, but decided not to share them.

My IO-Code returns records with views into the stream-buffer. The IO does not verify the characters and instead returns a view to the user that converts to the requested alphabet type. There needs to be a user-friendly mechanism of notifying the user when there is an illegal character in the data (user-friendly means not telling the user to run some extra algorithms on the data). Since this is a type that end-users see, it should not be detail, because it means they don't know what is going. Also, this is not some "hidden implementation", it is a very simple view with a 5-line implementation that works exactly like all the other views.

Since from my current understanding, both assign_char_strictly_to and views::char_strictly_to are at least experimental, and hence open for debate in the next minor release, it might not hurt to provide some insight from your point of view.

I was not aware that you removed these requirements from the concept. I thought that it was rather self-explanatory that one might want to check which characters "are in" the alphabet?

In my opinion, this view should either be \experimentalapi with some explicit statement of uncertainty, or just in detail for now.

I write external library and application code, so things in detail, NOAPI or experimental are not very helpful to me. In that case, it is easier to just have things in my own code.

eseiler · 2021-11-29T12:17:12Z

Why didn't you ask me this question?

I wanted to ask in the Core-Meeting, and afterwards it slipped my mind :)
I didn't "read" the PR, I just glanced over it.

That is not how I am using it.

My IO-Code returns records with views into the stream-buffer. The IO does not verify the characters and instead returns a view to the user that converts to the requested alphabet type. There needs to be a user-friendly mechanism of notifying the user when there is an illegal character in the data (user-friendly means not telling the user to run some extra algorithms on the data). Since this is a type that end-users see, it should not be detail, because it means they don't know what is going. Also, this is not some "hidden implementation", it is a very simple view with a 5-line implementation that works exactly like all the other views.

That was my main problem with the PR. There was no description/reasoning, so I was left to wonder what that view might be used for. And as I suspected, you have a use case where this view is useful.

I was not aware that you removed these requirements from the concept. I thought that it was rather self-explanatory that one might want to check which characters "are in" the alphabet?

The "use" of the view itself is self-explanatory. What I was missing was a use case that highlights how the view solves some problem. Without any use case, we will fall back to implementing "everything that might seem useful", and this did not work out too well.

I write external library and application code, so things in detail, NOAPI or experimental are not very helpful to me. In that case, it is easier to just have things in my own code.

The only thing I was missing was some context, which you provided.

In general:

Not marking an entity means \noapi (See preview of docs)
Everything new is \experimentalapi{Experimental since version 3.2.}

In this case, assign_char_strictly_to is currently experimental, so the view cannot be stable. But that also means that one could argue that if the CPO is stable, the view instantly follows (i.e. stable with 3.2).
Our main problem with assign_char_strictly_to was, IIRC, that we didn't have use cases for this and therefore weren't sure if it really should be stable - or even part of the API - just now (3.1).
Now that you provide use cases, it's (from my POV) unlikely that assign_char_strictly_to changes (will remain as is and will be stable with 3.2).

eseiler

Basically just documentation 💅

Can you add a snippet, please?

Something like this(untested)?

test/snippet/alphabet/views/char_strictly_to.cpp

#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/alphabet/views/char_strictly_to.hpp>
#include <seqan3/core/debug_stream.hpp>

int main()
{
    std::string str{"ACTTTGATAN"};
    try
    {
        seqan3::debug_stream << (str | seqan3::views::char_strictly_to<seqan3::dna4>); // ACTTTGATA
    }
    catch (seqan3::invalid_char_assignment)
    {
       seqan3::debug_stream << "\n[ERROR] Invalid char!\n"; // Will throw on parsing 'N'
    }
}

test/snippet/alphabet/views/char_strictly_to.err

ACTTTGATA
[ERROR] Invalid char!

eseiler · 2021-11-29T12:27:49Z

include/seqan3/alphabet/views/char_strictly_to.hpp

+// -----------------------------------------------------------------------------------------------------
+// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin
+// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik
+// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
+// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md
+// -----------------------------------------------------------------------------------------------------


Suggested change

// -----------------------------------------------------------------------------------------------------

// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin

// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik

// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License

// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md

// -----------------------------------------------------------------------------------------------------

// -----------------------------------------------------------------------------------------------------

// Copyright (c) 2006-2021, Knut Reinert & Freie Universität Berlin

// Copyright (c) 2016-2021, Knut Reinert & MPI für molekulare Genetik

// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License

// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md

// -----------------------------------------------------------------------------------------------------

eseiler · 2021-11-29T12:28:27Z

include/seqan3/alphabet/views/char_strictly_to.hpp

+
+/*!\name Alphabet related views
+ * \{
+ */
+


Suggested change

/*!\name Alphabet related views

* \{

*/

include/seqan3/alphabet/views/char_strictly_to.hpp

eseiler · 2021-11-29T12:31:03Z

include/seqan3/alphabet/views/char_strictly_to.hpp

+ * \tparam alphabet_t   The alphabet to convert to; must satisfy seqan3::alphabet.
+ * \param[in] urange    The range being processed. [parameter is omitted in pipe notation]
+ * \returns             A range of converted elements. See below for the properties of the returned range.
+ * \ingroup views


Suggested change

* \ingroup views

* \ingroup alphabet_views

include/seqan3/alphabet/views/char_strictly_to.hpp

h-2 · 2021-11-30T16:21:49Z

The only thing I was missing was some context, which you provided.

Yes, I had thought that assign_char_strictly was still in the concept, so I thought missing this view was an oversight. I hope the use-case is now clear. If you have any questions, please let me know.

marehr · 2021-11-30T16:39:08Z

@h-2 would it be also an alternative to have a view that just checks whether a character is part of the "domain", as we already have a view that has the converting effect?

Something like this:

auto assign_char_strictly = views::char_is_valid | views::assign_char;

name would be up-to-discussion (throw_on_invalid_char, ...)

h-2 · 2021-11-30T20:21:56Z

would it be also an alternative to have a view that just checks whether a character is part of the "domain", as we already have a view that has the converting effect?

Let me think about it!

h-2 · 2021-12-01T11:14:45Z

I wouldn't mind splitting the implementation into two separate views, but I would still like to have the combined view (defined as you proposed above) for usability. Would that be agreeable?

eseiler · 2021-12-02T11:27:39Z

I like both proposals, i.e. having a view that checks for validity, and then having the combined view.

I just noticed that we probably need to restrict the view to not just take alphabet.
We need to check for (depending on implementation) either:

assign_char_strictly_to
assign_char_to + char_is_valid_for

I think a static_assert would be enough.

assign_char_strictly_to looks like this:

seqan3/include/seqan3/alphabet/concept.hpp

Lines 729 to 750 in 67873e4

    
           //!\brief Function object definition for seqan3::assign_char_strictly_to. 
        
           //!\ingroup alphabet 
        
           struct assign_char_strictly_to_fn 
        
           { 
        
               //!\brief Operator overload for rvalues. 
        
               template <typename alphabet_t> 
        
               constexpr decltype(auto) operator()(seqan3::alphabet_char_t<alphabet_t> const chr, alphabet_t && alphabet) const 
        
               //!\cond 
        
                   requires requires () 
        
                   { 
        
                       SEQAN3_RETURN_TYPE_CONSTRAINT(seqan3::assign_char_to(chr, std::forward<alphabet_t>(alphabet)), 
        
                                                     std::convertible_to, alphabet_t); 
        
                       SEQAN3_RETURN_TYPE_CONSTRAINT(seqan3::char_is_valid_for<alphabet_t>(chr), std::same_as, bool); 
        
                   } 
        
               //!\endcond 
        
               { 
        
                   if (!seqan3::char_is_valid_for<alphabet_t>(chr)) 
        
                       throw seqan3::invalid_char_assignment{detail::type_name_as_string<alphabet_t>, chr}; 
        
                   return seqan3::assign_char_to(chr, std::forward<alphabet_t>(alphabet)); 
        
               } 
        
           };

h-2 · 2021-12-06T11:16:19Z

I just noticed that we probably need to restrict the view to not just take alphabet

Unless someone changed it, char_is_valid_for has a works-by-default implementation for all alphabets that don't customise it. So no additional requirements are needed from my POV.

eseiler · 2021-12-06T11:45:33Z

I just noticed that we probably need to restrict the view to not just take alphabet

Unless someone changed it, char_is_valid_for has a works-by-default implementation for all alphabets that don't customise it. So no additional requirements are needed from my POV.

You're right, I missed the default implementation in the CPO.

eseiler

One filename is wrong.

Only snippets are failing.
GitHub seems to have trouble serving the Cmake download today, that's why CI sometimes fails.

eseiler · 2021-12-08T13:14:02Z

test/snippet/alphabet/views/valid_char_for.err

@@ -0,0 +1,2 @@
+ACTTTGATA


Can you rename this file to validate_char_for.err?

vercel bot temporarily deployed to Preview November 22, 2021 14:32 Inactive

h-2 requested a review from Irallia November 23, 2021 09:53

Irallia requested changes Nov 23, 2021

View reviewed changes

include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved

include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved

include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved

h-2 force-pushed the views_chars_strictly_to branch from 01b434a to 4174626 Compare November 23, 2021 12:04

vercel bot temporarily deployed to Preview November 23, 2021 12:04 Inactive

h-2 requested a review from Irallia November 23, 2021 12:04

Irallia approved these changes Nov 23, 2021

View reviewed changes

Irallia requested review from a team and eseiler and removed request for a team November 23, 2021 15:21

eseiler requested changes Nov 29, 2021

View reviewed changes

h-2 force-pushed the views_chars_strictly_to branch from 4174626 to a4b431b Compare November 30, 2021 16:17

vercel bot temporarily deployed to Preview November 30, 2021 16:17 Inactive

h-2 force-pushed the views_chars_strictly_to branch from a4b431b to 35000bb Compare November 30, 2021 20:19

vercel bot temporarily deployed to Preview November 30, 2021 20:19 Inactive

h-2 force-pushed the views_chars_strictly_to branch from 35000bb to 732d106 Compare December 8, 2021 12:20

vercel bot temporarily deployed to Preview December 8, 2021 12:20 Inactive

h-2 requested a review from eseiler December 8, 2021 12:20

h-2 force-pushed the views_chars_strictly_to branch from 732d106 to 915dedb Compare December 8, 2021 12:28

vercel bot temporarily deployed to Preview December 8, 2021 12:28 Inactive

eseiler requested changes Dec 8, 2021

View reviewed changes

test/snippet/alphabet/views/valid_char_for.err Outdated

@@ -0,0 +1,2 @@

ACTTTGATA

Copy link

Member

eseiler Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this file to validate_char_for.err?

[feature] views::validate_char_for and views::char_strictly_to

9263bf5

h-2 force-pushed the views_chars_strictly_to branch from 915dedb to 9263bf5 Compare December 8, 2021 13:43

vercel bot temporarily deployed to Preview December 8, 2021 13:43 Inactive

eseiler approved these changes Dec 8, 2021

View reviewed changes

eseiler enabled auto-merge December 8, 2021 13:45

eseiler merged commit 999352e into seqan:master Dec 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] views::char_strictly_to #2898

[feature] views::char_strictly_to #2898

h-2 commented Nov 22, 2021

vercel bot commented Nov 22, 2021 •

edited

Loading

codecov bot commented Nov 22, 2021 •

edited

Loading

h-2 commented Nov 22, 2021

Irallia left a comment

marehr commented Nov 26, 2021 •

edited

Loading

h-2 commented Nov 26, 2021

eseiler commented Nov 28, 2021 •

edited

Loading

h-2 commented Nov 29, 2021

eseiler commented Nov 29, 2021 •

edited

Loading

eseiler left a comment •

edited

Loading

eseiler Nov 29, 2021

eseiler Nov 29, 2021

eseiler Nov 29, 2021

h-2 commented Nov 30, 2021

marehr commented Nov 30, 2021

h-2 commented Nov 30, 2021

h-2 commented Dec 1, 2021

eseiler commented Dec 2, 2021

h-2 commented Dec 6, 2021

eseiler commented Dec 6, 2021

eseiler left a comment

eseiler Dec 8, 2021

[feature] views::char_strictly_to #2898

[feature] views::char_strictly_to #2898

Conversation

h-2 commented Nov 22, 2021

vercel bot commented Nov 22, 2021 • edited Loading

codecov bot commented Nov 22, 2021 • edited Loading

Codecov Report

h-2 commented Nov 22, 2021

Irallia left a comment

Choose a reason for hiding this comment

marehr commented Nov 26, 2021 • edited Loading

h-2 commented Nov 26, 2021

eseiler commented Nov 28, 2021 • edited Loading

h-2 commented Nov 29, 2021

eseiler commented Nov 29, 2021 • edited Loading

eseiler left a comment • edited Loading

Choose a reason for hiding this comment

eseiler Nov 29, 2021

Choose a reason for hiding this comment

eseiler Nov 29, 2021

Choose a reason for hiding this comment

eseiler Nov 29, 2021

Choose a reason for hiding this comment

h-2 commented Nov 30, 2021

marehr commented Nov 30, 2021

h-2 commented Nov 30, 2021

h-2 commented Dec 1, 2021

eseiler commented Dec 2, 2021

h-2 commented Dec 6, 2021

eseiler commented Dec 6, 2021

eseiler left a comment

Choose a reason for hiding this comment

eseiler Dec 8, 2021

Choose a reason for hiding this comment

vercel bot commented Nov 22, 2021 •

edited

Loading

codecov bot commented Nov 22, 2021 •

edited

Loading

marehr commented Nov 26, 2021 •

edited

Loading

eseiler commented Nov 28, 2021 •

edited

Loading

eseiler commented Nov 29, 2021 •

edited

Loading

eseiler left a comment •

edited

Loading