Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] views::char_strictly_to #2898

Merged
merged 1 commit into from
Dec 8, 2021
Merged

Conversation

h-2
Copy link
Member

@h-2 h-2 commented Nov 22, 2021

No description provided.

@vercel
Copy link

vercel bot commented Nov 22, 2021

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/seqan/seqan3/3vgWuKpASAkbALqQj1iZXPwNpifK
✅ Preview: https://seqan3-git-fork-h-2-viewscharsstrictlyto-seqan.vercel.app

@codecov
Copy link

codecov bot commented Nov 22, 2021

Codecov Report

Merging #2898 (915dedb) into master (b30f4c7) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 915dedb differs from pull request most recent head 9263bf5. Consider uploading reports for the commit 9263bf5 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2898      +/-   ##
==========================================
- Coverage   98.28%   98.28%   -0.01%     
==========================================
  Files         266      267       +1     
  Lines       11455    11459       +4     
==========================================
+ Hits        11259    11262       +3     
- Misses        196      197       +1     
Impacted Files Coverage Δ
...nclude/seqan3/alphabet/views/validate_char_for.hpp 100.00% <100.00%> (ø)
...de/seqan3/argument_parser/detail/version_check.hpp 93.12% <0.00%> (-0.77%) ⬇️
include/seqan3/argument_parser/validators.hpp 100.00% <0.00%> (ø)
...nclude/seqan3/io/sam_file/input_format_concept.hpp 100.00% <0.00%> (ø)
...clude/seqan3/io/sam_file/output_format_concept.hpp 100.00% <0.00%> (ø)
...e/seqan3/alphabet/container/bitpacked_sequence.hpp 100.00% <0.00%> (ø)
...qan3/alphabet/container/concatenated_sequences.hpp 97.23% <0.00%> (ø)
...an3/alignment/pairwise/edit_distance_algorithm.hpp 100.00% <0.00%> (ø)
...n3/alignment/pairwise/policy/affine_gap_policy.hpp 100.00% <0.00%> (ø)
...ignment/pairwise/policy/simd_affine_gap_policy.hpp 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b30f4c7...9263bf5. Read the comment docs.

@h-2
Copy link
Member Author

h-2 commented Nov 22, 2021

I don't know why Codecov claims that coverage is reduced.

@h-2 h-2 requested a review from Irallia November 23, 2021 09:53
Copy link
Contributor

@Irallia Irallia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM, just some spelling and style things.

include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved
include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved
include/seqan3/alphabet/views/char_strictly_to.hpp Outdated Show resolved Hide resolved
@Irallia Irallia requested review from a team and eseiler and removed request for a team November 23, 2021 15:21
@marehr
Copy link
Member

marehr commented Nov 26, 2021

I don't know if this was discussed, but is this REALLY necessary? What's the use case, what's the semantic?

This view mangles input validation with transformation, and it should be a niche use-case for most users. Most users will get their sequences by reading in a file by using our IO. Of course our IO should provide some level of diagnostics if the read in characters are wrong. But should this API really be public in the alphabet module?

If you want to use it in IO, please just make it detail for now.

I would even argue that the current functionality is kind of lacking for a user. It just throws an exception, but the user won't be able to recover from that. They don't know where the offending character is, they don't know how the context around the error looks like, they just know that there was somewhere an invalid character.

From a user POV the same can also be achieved by iterating with std::ranges::all_of over the sequence and test if the characters are all valid.

I'm pro for some diagnostic "library" functionality to give a user a good reason why a char-sequence is invalid, like gcc or clang does for parsing a file, but I don't think a VIEW is the right place for that.

@h-2
Copy link
Member Author

h-2 commented Nov 26, 2021

I don't know if this was discussed, but is this REALLY necessary?

Yes, from my point of view it is. I don't want to have page-long arguments on Github for adding effectively 6 lines of code to the library. It has got one approval already, and another team member has been automatically assigned for a second review. If you feel that this change is so fundamental that Enrico is not qualified to approve it and that it needs to be discussed by the entire core-team, please bring it up at the next meeting of the core-team.
In that case, please give me a heads-up, so I can send you a detailed reasoning.

@eseiler
Copy link
Member

eseiler commented Nov 28, 2021

Yes, from my point of view it is. I don't want to have page-long arguments on Github for adding effectively 6 lines of code to the library. It has got one approval already, and another team member has been automatically assigned for a second review. If you feel that this change is so fundamental that Enrico is not qualified to approve it and that it needs to be discussed by the entire core-team, please bring it up at the next meeting of the core-team. In that case, please give me a heads-up, so I can send you a detailed reasoning.

My first question when I saw the PR was (Why) Do we want this? - and this answer did not add anything useful to ... well, anything.

I wanted to discuss this PR in last week's core meeting, but there were only two members present, so we cancelled it.

I think it is very valid to question the usefulness: "Oh wow, there is a wrong character somewhere in my 20 GiB file"- which also translates to whether this should be API or detail. What are even the use cases? Apparently you have some, but decided not to share them.
However, this point is not crucial to me. That what's \experimentalapi is for.

Since from my current understanding, both assign_char_strictly_to and views::char_strictly_to are at least experimental, and hence open for debate in the next minor release, it might not hurt to provide some insight from your point of view.

In my opinion, this view should either be \experimentalapi with some explicit statement of uncertainty, or just in detail for now.
It's OK for me to add this view as-is (either experimental or detail), after providing an iota of reasoning. However, if you have more to say about the whole strictly_to mechanism, I would be delighted to read more than No description provided.

@h-2
Copy link
Member Author

h-2 commented Nov 29, 2021

My first question when I saw the PR was (Why) Do we want this? - and this answer did not add anything useful to ... well, anything.

Why didn't you ask me this question?

I think it is very valid to question the usefulness: "Oh wow, there is a wrong character somewhere in my 20 GiB file"- which also translates to whether this should be API or detail.

That is not how I am using it.

What are even the use cases? Apparently you have some, but decided not to share them.

My IO-Code returns records with views into the stream-buffer. The IO does not verify the characters and instead returns a view to the user that converts to the requested alphabet type. There needs to be a user-friendly mechanism of notifying the user when there is an illegal character in the data (user-friendly means not telling the user to run some extra algorithms on the data). Since this is a type that end-users see, it should not be detail, because it means they don't know what is going. Also, this is not some "hidden implementation", it is a very simple view with a 5-line implementation that works exactly like all the other views.

Since from my current understanding, both assign_char_strictly_to and views::char_strictly_to are at least experimental, and hence open for debate in the next minor release, it might not hurt to provide some insight from your point of view.

I was not aware that you removed these requirements from the concept. I thought that it was rather self-explanatory that one might want to check which characters "are in" the alphabet?

In my opinion, this view should either be \experimentalapi with some explicit statement of uncertainty, or just in detail for now.

I write external library and application code, so things in detail, NOAPI or experimental are not very helpful to me. In that case, it is easier to just have things in my own code.

@eseiler
Copy link
Member

eseiler commented Nov 29, 2021

Why didn't you ask me this question?

I wanted to ask in the Core-Meeting, and afterwards it slipped my mind :)
I didn't "read" the PR, I just glanced over it.

That is not how I am using it.

My IO-Code returns records with views into the stream-buffer. The IO does not verify the characters and instead returns a view to the user that converts to the requested alphabet type. There needs to be a user-friendly mechanism of notifying the user when there is an illegal character in the data (user-friendly means not telling the user to run some extra algorithms on the data). Since this is a type that end-users see, it should not be detail, because it means they don't know what is going. Also, this is not some "hidden implementation", it is a very simple view with a 5-line implementation that works exactly like all the other views.

That was my main problem with the PR. There was no description/reasoning, so I was left to wonder what that view might be used for. And as I suspected, you have a use case where this view is useful.

I was not aware that you removed these requirements from the concept. I thought that it was rather self-explanatory that one might want to check which characters "are in" the alphabet?

The "use" of the view itself is self-explanatory. What I was missing was a use case that highlights how the view solves some problem. Without any use case, we will fall back to implementing "everything that might seem useful", and this did not work out too well.

I write external library and application code, so things in detail, NOAPI or experimental are not very helpful to me. In that case, it is easier to just have things in my own code.

The only thing I was missing was some context, which you provided.

In general:

  • Not marking an entity means \noapi (See preview of docs)
  • Everything new is \experimentalapi{Experimental since version 3.2.}

In this case, assign_char_strictly_to is currently experimental, so the view cannot be stable. But that also means that one could argue that if the CPO is stable, the view instantly follows (i.e. stable with 3.2).
Our main problem with assign_char_strictly_to was, IIRC, that we didn't have use cases for this and therefore weren't sure if it really should be stable - or even part of the API - just now (3.1).
Now that you provide use cases, it's (from my POV) unlikely that assign_char_strictly_to changes (will remain as is and will be stable with 3.2).

Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically just documentation 💅

Can you add a snippet, please?

Something like this(untested)?

test/snippet/alphabet/views/char_strictly_to.cpp

#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/alphabet/views/char_strictly_to.hpp>
#include <seqan3/core/debug_stream.hpp>

int main()
{
    std::string str{"ACTTTGATAN"};
    try
    {
        seqan3::debug_stream << (str | seqan3::views::char_strictly_to<seqan3::dna4>); // ACTTTGATA
    }
    catch (seqan3::invalid_char_assignment)
    {
       seqan3::debug_stream << "\n[ERROR] Invalid char!\n"; // Will throw on parsing 'N'
    }
}

test/snippet/alphabet/views/char_strictly_to.err

ACTTTGATA
[ERROR] Invalid char!

Comment on lines 1 to 6
// -----------------------------------------------------------------------------------------------------
// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin
// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md
// -----------------------------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// -----------------------------------------------------------------------------------------------------
// Copyright (c) 2006-2020, Knut Reinert & Freie Universität Berlin
// Copyright (c) 2016-2020, Knut Reinert & MPI für molekulare Genetik
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md
// -----------------------------------------------------------------------------------------------------
// -----------------------------------------------------------------------------------------------------
// Copyright (c) 2006-2021, Knut Reinert & Freie Universität Berlin
// Copyright (c) 2016-2021, Knut Reinert & MPI für molekulare Genetik
// This file may be used, modified and/or redistributed under the terms of the 3-clause BSD-License
// shipped with this file and also available at: https://github.com/seqan/seqan3/blob/master/LICENSE.md
// -----------------------------------------------------------------------------------------------------

Comment on lines 23 to 27

/*!\name Alphabet related views
* \{
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*!\name Alphabet related views
* \{
*/

* \tparam alphabet_t The alphabet to convert to; must satisfy seqan3::alphabet.
* \param[in] urange The range being processed. [parameter is omitted in pipe notation]
* \returns A range of converted elements. See below for the properties of the returned range.
* \ingroup views
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* \ingroup views
* \ingroup alphabet_views

@h-2
Copy link
Member Author

h-2 commented Nov 30, 2021

The only thing I was missing was some context, which you provided.

Yes, I had thought that assign_char_strictly was still in the concept, so I thought missing this view was an oversight. I hope the use-case is now clear. If you have any questions, please let me know.

@marehr
Copy link
Member

marehr commented Nov 30, 2021

@h-2 would it be also an alternative to have a view that just checks whether a character is part of the "domain", as we already have a view that has the converting effect?

Something like this:

auto assign_char_strictly = views::char_is_valid | views::assign_char;

name would be up-to-discussion (throw_on_invalid_char, ...)

@h-2
Copy link
Member Author

h-2 commented Nov 30, 2021

would it be also an alternative to have a view that just checks whether a character is part of the "domain", as we already have a view that has the converting effect?

Let me think about it!

@h-2
Copy link
Member Author

h-2 commented Dec 1, 2021

I wouldn't mind splitting the implementation into two separate views, but I would still like to have the combined view (defined as you proposed above) for usability. Would that be agreeable?

@eseiler
Copy link
Member

eseiler commented Dec 2, 2021

I like both proposals, i.e. having a view that checks for validity, and then having the combined view.

I just noticed that we probably need to restrict the view to not just take alphabet.
We need to check for (depending on implementation) either:

  • assign_char_strictly_to
  • assign_char_to + char_is_valid_for

I think a static_assert would be enough.

assign_char_strictly_to looks like this:

//!\brief Function object definition for seqan3::assign_char_strictly_to.
//!\ingroup alphabet
struct assign_char_strictly_to_fn
{
//!\brief Operator overload for rvalues.
template <typename alphabet_t>
constexpr decltype(auto) operator()(seqan3::alphabet_char_t<alphabet_t> const chr, alphabet_t && alphabet) const
//!\cond
requires requires ()
{
SEQAN3_RETURN_TYPE_CONSTRAINT(seqan3::assign_char_to(chr, std::forward<alphabet_t>(alphabet)),
std::convertible_to, alphabet_t);
SEQAN3_RETURN_TYPE_CONSTRAINT(seqan3::char_is_valid_for<alphabet_t>(chr), std::same_as, bool);
}
//!\endcond
{
if (!seqan3::char_is_valid_for<alphabet_t>(chr))
throw seqan3::invalid_char_assignment{detail::type_name_as_string<alphabet_t>, chr};
return seqan3::assign_char_to(chr, std::forward<alphabet_t>(alphabet));
}
};

@h-2
Copy link
Member Author

h-2 commented Dec 6, 2021

I just noticed that we probably need to restrict the view to not just take alphabet

Unless someone changed it, char_is_valid_for has a works-by-default implementation for all alphabets that don't customise it. So no additional requirements are needed from my POV.

@eseiler
Copy link
Member

eseiler commented Dec 6, 2021

I just noticed that we probably need to restrict the view to not just take alphabet

Unless someone changed it, char_is_valid_for has a works-by-default implementation for all alphabets that don't customise it. So no additional requirements are needed from my POV.

You're right, I missed the default implementation in the CPO.

Copy link
Member

@eseiler eseiler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One filename is wrong.

Only snippets are failing.
GitHub seems to have trouble serving the Cmake download today, that's why CI sometimes fails.

@@ -0,0 +1,2 @@
ACTTTGATA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this file to validate_char_for.err?

@eseiler eseiler merged commit 999352e into seqan:master Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants