Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[I/O] Remove column interface for sequence files #1412

Merged
merged 3 commits into from
Dec 19, 2019

Conversation

joergi-w
Copy link
Member

I have to extend the record's get_or_ignore interface for common_pair, which is the result of ranges::zip with two arguments. Therefore, I use the template template tuple_like to cover both std::tuple and common_pair.

Resolves #1411.

@joergi-w
Copy link
Member Author

Rebased on master because the CHANGELOG caused a merge conflict.

@joergi-w
Copy link
Member Author

I had to change the solution of an alignment I/O tutorial exercise, because it was using the column interface. It is now realised with a for-loop over the file.
Additionally there was an error with the snippet output: The reference and read were both taken from index 0 of the alignment. I fixed this in the code as well as in the expected output.

Irallia added a commit to Irallia/seqan3 that referenced this pull request Dec 10, 2019
@codecov
Copy link

codecov bot commented Dec 10, 2019

Codecov Report

Merging #1412 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1412      +/-   ##
==========================================
- Coverage   97.59%   97.58%   -0.02%     
==========================================
  Files         232      232              
  Lines        8831     8783      -48     
==========================================
- Hits         8619     8571      -48     
  Misses        212      212
Impacted Files Coverage Δ
include/seqan3/io/sequence_file/input.hpp 100% <ø> (ø) ⬆️
include/seqan3/io/structure_file/output.hpp 100% <ø> (ø) ⬆️
include/seqan3/io/sequence_file/output.hpp 100% <ø> (ø) ⬆️
include/seqan3/io/detail/record.hpp 100% <100%> (ø) ⬆️
.../seqan3/range/container/concatenated_sequences.hpp 96.84% <0%> (-0.02%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36e09b3...78ab4d2. Read the comment docs.

Irallia added a commit to Irallia/seqan3 that referenced this pull request Dec 11, 2019
@joergi-w joergi-w requested review from eseiler and removed request for marehr December 12, 2019 12:02

auto mapq_filter = std::views::filter([] (auto & rec) { return get<field::MAPQ>(rec) >= 30; });
auto mapq_filter = std::views::filter([] (auto & rec) { return seqan3::get<seqan3::field::MAPQ>(rec) >= 30; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seqan3::get works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I see no error...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, but I'm wondering why :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess because this get has a seqan3::record as argument...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this get is not a hidden friend of records

doc/tutorial/alignment_file/index.md Show resolved Hide resolved
@joergi-w joergi-w requested review from smehringer and rrahn and removed request for smehringer December 12, 2019 14:27
Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work. Some questions and improvements that I have.


for (auto const & record : reference_file)
{
ref_ids.push_back(seqan3::get<seqan3::field::ID>(record));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use std::move and capture the record by &&

std::vector<std::vector<dna5>> ref_seqs = get<field::SEQ>(reference_file);
seqan3::sequence_file_input reference_file{tmp_dir/"reference.fasta"};
seqan3::concatenated_sequences<std::string> ref_ids{};
std::vector<std::vector<seqan3::dna5>> ref_seqs{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the sequences not stored in an concatenated set? It would make more sense here than for the ids

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I swap them!

seqan3::alignment_file_input mapping_file{tmp_dir/"mapping.sam",
ref_ids,
ref_seqs,
seqan3::fields<seqan3::field::ID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define the fields type befofe


auto mapq_filter = std::views::filter([] (auto & rec) { return get<field::MAPQ>(rec) >= 30; });
auto mapq_filter = std::views::filter([] (auto & rec) { return seqan3::get<seqan3::field::MAPQ>(rec) >= 30; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this get is not a hidden friend of records

auto & ref = get<0>(alignment);
size_t sum_ref{};
std::ranges::for_each(ref.begin(), ref.end(), [&sum_ref] (auto c) { if (c == gap{}) ++sum_ref; });
std::ranges::for_each(ref.begin(), ref.end(), [&sum_ref] (auto c) { if (c == seqan3::gap{}) ++sum_ref; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::ranges::begin/end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I also use the range based for loop here.

size_t sum_read{};
std::ranges::for_each(read.begin(), read.end(), [&sum_read] (auto c) { if (c == gap{}) ++sum_read; });
std::ranges::for_each(read.begin(), read.end(), [&sum_read] (auto c) { if (c == seqan3::gap{}) ++sum_read; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use range based for loop here

@@ -135,8 +136,8 @@ auto const & get_or_ignore(record<field_types, field_ids> const & r)
}

//!\copydoc seqan3::detail::get_or_ignore
template <size_t i, typename ...types>
auto & get_or_ignore(std::tuple<types...> & t)
template <size_t i, template <tuple_like ...types_> typename tuple_like_t, typename ...types>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the this should be a tuple of tulles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is not a tuple of tuples. The function was only valid for std::tuple as input argument and I want to extend it for seqan3::common_pair. This is needed when we call for instance

fout = seqan3::views::zip(ids, seq_quals);

Then parameter t has type seqan3::common_pair<string, vector<qualified>>.

Instead of writing two additional overloads I generalised it to tuple_like.

@@ -152,7 +152,7 @@ namespace seqan3
* The record-based interface treats the file as a range of tuples (the records), but in certain situations
* you might have the data as columns, i.e. a tuple-of-ranges, instead of range-of-tuples.
*
* You can use column-based writing in that case, it uses operator=() :
* You can use column-based writing in that case, it uses operator=() and views::zip():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* You can use column-based writing in that case, it uses operator=() and views::zip():
* You can use column-based writing in that case, it uses operator=() and seqan3::views::zip():

@joergi-w joergi-w requested a review from rrahn December 17, 2019 10:55
@joergi-w
Copy link
Member Author

I rebased on master to solve merge conflicts with the fields renaming (to lowercase).

Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change 💅 . Thank you.

seqan3::field::ref_id,
seqan3::field::mapq,
seqan3::field::alignment>{}};
auto const fields = seqan3::fields<seqan3::field::id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only declare a type here with using. Then construct the type inplace of the constructor argument.

@joergi-w joergi-w requested a review from rrahn December 17, 2019 14:35
Copy link
Contributor

@rrahn rrahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rrahn
Copy link
Contributor

rrahn commented Dec 18, 2019

Can you please squash the last two commits.

@joergi-w
Copy link
Member Author

Can you please squash the last two commits.

Sure!

@rrahn rrahn merged commit 48e0461 into seqan:master Dec 19, 2019
@joergi-w joergi-w deleted the remove_column_interface branch March 13, 2020 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Sequence IO] Remove file column interface
3 participants