Skip to content

Commit

Permalink
[FIX] I/O: Add seqan::field::cigar to the default fields.
Browse files Browse the repository at this point in the history
Set for seqan3::alignment_file_input and seqan3::alignment_file_output.
  • Loading branch information
smehringer committed Apr 14, 2020
1 parent 045b11a commit bbb9796
Show file tree
Hide file tree
Showing 8 changed files with 110 additions and 45 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,16 @@ Note that 3.1.0 will be the first API stable release and interfaces in this rele

## Notable Bug-fixes

### I/O

* The `seqan3::field::cigar` was added to the default fields for reading and writing alignment files.
This has the following impact:
1. Reading and writing in one line is now possible without additional reference information:
`seqan3::alignment_file_output{"foo.sam"} = seqan3::alignment_file_input{"bar.sam"};`
2. The `seqan3::alignment_file_output` now accepts `seqan3::field::cigar` and `seqan3::field::alignment`
although they store redundant information. For the SAM/BAM format this ambiguity is handled by favoring the CIGAR
information at all times if present.

# 3.0.1

Note that 3.1.0 will be the first API stable release and interfaces in this release might still change.
Expand Down
6 changes: 3 additions & 3 deletions doc/tutorial/alignment_file/alignment_file_read_cigar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ int main()
{
std::filesystem::path tmp_dir = std::filesystem::temp_directory_path(); // get the temp directory

seqan3::alignment_file_input fin{tmp_dir/"my.sam", seqan3::fields<seqan3::field::cigar>{}};
seqan3::alignment_file_input fin{tmp_dir/"my.sam"}; // default fields

for (auto & [cigar_vector] : fin)
seqan3::debug_stream << cigar_vector << '\n';
for (auto & rec : fin)
seqan3::debug_stream << seqan3::get<seqan3::field::cigar>(rec) << '\n'; // access cigar vector
}
//![code]
19 changes: 9 additions & 10 deletions doc/tutorial/alignment_file/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,10 +188,16 @@ If you want to read up more about cigar strings,
take a look at the [SAM specifications](https://samtools.github.io/hts-specs/SAMv1.pdf)
or the [SAMtools paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2723002/).

## Completing reference information
## Reading the CIGAR string

By default, the `seqan3::alignment_file_input` will always read the `seqan3::field::cigar` and store it
into a `std::vector<seqan3::cigar>`:

\snippet doc/tutorial/alignment_file/alignment_file_read_cigar.cpp code

## Reading the CIGAR information into an actual alignment

In SeqAn the conversion from a CIGAR string to an alignment (two *aligned_sequences*) is done automatically for you.
You can just pass the alignment object to the alignment file:
In SeqAn, the conversion from a CIGAR string to an alignment (two *aligned_sequences*) is done automatically for you. You can access it by querying `seqan3::field::alignment` from the record:

\snippet doc/tutorial/alignment_file/alignment_file_snippets.cpp main
\snippet doc/tutorial/alignment_file/alignment_file_snippets.cpp alignments_without_ref
Expand Down Expand Up @@ -246,13 +252,6 @@ r004 mapped against 1 with 14 gaps in the read sequence and 0 gaps in the refere

\endsolution

## Reading the CIGAR string

If you are accustomed to the raw CIGAR information, we also provide reading the cigar information into a
`std::vector<seqan3::cigar>` if you specify the `seqan3::field::cigar`.

\snippet doc/tutorial/alignment_file/alignment_file_read_cigar.cpp code

# Writing alignment files

## Writing records
Expand Down
1 change: 1 addition & 0 deletions include/seqan3/io/alignment_file/input.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ template <
field::ref_id,
field::ref_offset,
field::alignment,
field::cigar,
field::mapq,
field::qual,
field::flag,
Expand Down
8 changes: 1 addition & 7 deletions include/seqan3/io/alignment_file/output.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ template <detail::fields_specialisation selected_field_ids_ =
field::ref_id,
field::ref_offset,
field::alignment,
field::cigar,
field::mapq,
field::qual,
field::flag,
Expand Down Expand Up @@ -202,13 +203,6 @@ class alignment_file_output
"please refer to the documentation of "
"seqan3::alignment_file_output::field_ids for the accepted values.");

static_assert([] () constexpr
{
return !(selected_field_ids::contains(field::alignment) &&
selected_field_ids::contains(field::cigar));
}(),
"You may not select field::alignment and field::cigar at the same time.");

/*!\name Range associated types
* \brief Most of the range associated types are `void` for output ranges.
* \{
Expand Down
66 changes: 50 additions & 16 deletions test/unit/io/alignment_file/alignment_file_format_test_template.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,6 @@ using seqan3::operator""_dna5;
using seqan3::operator""_phred42;
using seqan3::operator""_tag;

using sam_fields = seqan3::fields<seqan3::field::header_ptr,
seqan3::field::id,
seqan3::field::flag,
seqan3::field::ref_id,
seqan3::field::ref_offset,
seqan3::field::mapq,
seqan3::field::alignment,
seqan3::field::offset,
seqan3::field::mate,
seqan3::field::seq,
seqan3::field::qual,
seqan3::field::tags>;

// global variables for reuse
seqan3::alignment_file_input_options<seqan3::dna5> input_options;
seqan3::alignment_file_output_options output_options;
Expand Down Expand Up @@ -392,6 +379,23 @@ TYPED_TEST_P(alignment_file_read, format_error_ref_id_not_in_reference_informati
// alignment_file_write
// ----------------------------------------------------------------------------

// Note that these differ from the alignment_file_output default fields:
// 1. They don't contain field::bit_score and field::evalue since these belong to the BLAST format.
// 2. field::alignment and field::cigar are redundant. Since field::alignment is the more complex one it is chosen here.
// The behaviour if both are given is tested in a separate test.
using sam_fields = seqan3::fields<seqan3::field::header_ptr,
seqan3::field::id,
seqan3::field::flag,
seqan3::field::ref_id,
seqan3::field::ref_offset,
seqan3::field::mapq,
seqan3::field::alignment,
seqan3::field::offset,
seqan3::field::mate,
seqan3::field::seq,
seqan3::field::qual,
seqan3::field::tags>;

template <typename format_type>
struct alignment_file_write : public alignment_file_read<format_type>
{
Expand Down Expand Up @@ -541,11 +545,41 @@ TYPED_TEST_P(alignment_file_write, cigar_vector)
{1, 'M'_cigar_op}, {1, 'I'_cigar_op}, {1, 'D'_cigar_op}, {1, 'M'_cigar_op}, {1, 'S'_cigar_op}}
};

this->tag_dicts[0]["NM"_tag] = 7;
this->tag_dicts[0]["AS"_tag] = 2;
this->tag_dicts[1]["xy"_tag] = std::vector<uint16_t>{3,4,5};

{
this->tag_dicts[0]["NM"_tag] = 7;
this->tag_dicts[0]["AS"_tag] = 2;
this->tag_dicts[1]["xy"_tag] = std::vector<uint16_t>{3,4,5};
seqan3::alignment_file_output fout{this->ostream, TypeParam{}}; // default fields contain CIGAR and alignment
for (size_t i = 0ul; i < 3ul; ++i)
{
ASSERT_NO_THROW(fout.emplace_back(this->seqs[i],
this->ids[i],
this->offsets[i],
this->ref_seq,
0/*ref_id*/,
this->ref_offsets[i],
this->alignments[i],
cigar_v[i],
this->mapqs[i],
this->quals[i],
this->flags[i],
this->mates[i],
this->tag_dicts[i],
0/*evalue*/,
0/*bitscore*/,
&(this->header)));
}
}

this->ostream.flush();
// compare to original input because hard clipping is preserved when writing the cigar vector directly
EXPECT_EQ(this->ostream.str(), this->simple_three_reads_input);

this->ostream = std::ostringstream{}; // clear

// 2. Write only the cigar, not the alignment
{
seqan3::alignment_file_output fout{this->ostream, TypeParam{}, seqan3::fields<seqan3::field::header_ptr,
seqan3::field::id,
seqan3::field::flag,
Expand Down
19 changes: 15 additions & 4 deletions test/unit/io/alignment_file/alignment_file_input_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,10 +147,21 @@ TEST_F(alignment_file_input_f, construct_from_stream)
TEST_F(alignment_file_input_f, default_template_args_and_deduction_guides)
{
using comp0 = seqan3::alignment_file_input_default_traits<>;
using comp1 = seqan3::fields<seqan3::field::seq, seqan3::field::id, seqan3::field::offset, seqan3::field::ref_seq,
seqan3::field::ref_id, seqan3::field::ref_offset, seqan3::field::alignment,
seqan3::field::mapq, seqan3::field::qual, seqan3::field::flag, seqan3::field::mate,
seqan3::field::tags, seqan3::field::evalue, seqan3::field::bit_score,
using comp1 = seqan3::fields<seqan3::field::seq,
seqan3::field::id,
seqan3::field::offset,
seqan3::field::ref_seq,
seqan3::field::ref_id,
seqan3::field::ref_offset,
seqan3::field::alignment,
seqan3::field::cigar,
seqan3::field::mapq,
seqan3::field::qual,
seqan3::field::flag,
seqan3::field::mate,
seqan3::field::tags,
seqan3::field::evalue,
seqan3::field::bit_score,
seqan3::field::header_ptr>;
using comp2 = seqan3::type_list<seqan3::format_sam, seqan3::format_bam>;
using comp3 = char;
Expand Down
26 changes: 21 additions & 5 deletions test/unit/io/alignment_file/alignment_file_output_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ TEST(general, default_template_args_and_deduction_guides)
seqan3::field::ref_id,
seqan3::field::ref_offset,
seqan3::field::alignment,
seqan3::field::cigar,
seqan3::field::mapq,
seqan3::field::qual,
seqan3::field::flag,
Expand Down Expand Up @@ -486,14 +487,29 @@ read2 42 ref 2 62 7M1D1M1S ref 10 300 AGGCTGNAG !##$&'()* xy:B:S,3,4,5
read3 43 ref 3 63 1S1M1D1M1I1M1I1D1M1S ref 10 300 GGAGTATA !!*+,-./
)";

seqan3::alignment_file_input fin{std::istringstream{comp}, ref_ids, ref_seqs, seqan3::format_sam{}};
seqan3::alignment_file_output fout{std::ostringstream{}, seqan3::format_sam{}};
// with reference information
{
seqan3::alignment_file_input fin{std::istringstream{comp}, ref_ids, ref_seqs, seqan3::format_sam{}};
seqan3::alignment_file_output fout{std::ostringstream{}, seqan3::format_sam{}};

fout = fin;
fout = fin;

fout.get_stream().flush();
fout.get_stream().flush();

EXPECT_EQ(reinterpret_cast<std::ostringstream &>(fout.get_stream()).str(), comp);
EXPECT_EQ(reinterpret_cast<std::ostringstream &>(fout.get_stream()).str(), comp);
}

// without reference information
{
seqan3::alignment_file_input fin{std::istringstream{comp}, seqan3::format_sam{}};
seqan3::alignment_file_output fout{std::ostringstream{}, seqan3::format_sam{}};

fout = fin;

fout.get_stream().flush();

EXPECT_EQ(reinterpret_cast<std::ostringstream &>(fout.get_stream()).str(), comp);
}
}

TEST(rows, assign_alignment_file_pipes)
Expand Down

0 comments on commit bbb9796

Please sign in to comment.