Skip to content

Commit

Permalink
[FIX] not accepting BAM with empty header
Browse files Browse the repository at this point in the history
  • Loading branch information
eseiler committed Apr 20, 2021
1 parent 50af89f commit 3007ff5
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ If possible, provide tooling that performs the changes, e.g. a shell-script.
* Requesting the alignment without also requesting the sequence for BAM files containing empty CIGAR strings does now
not result in erroneous parsing ([\#2418](https://github.com/seqan/seqan3/pull/2418)).
* BAM files with 64 references are now parsed correctly ([\#2423](https://github.com/seqan/seqan3/pull/2423)).
* BAM files not containing a plain text header are now accepted ([\#2536](https://github.com/seqan/seqan3/pull/2536)).
* Writing `gz`-compressed output no longer results in `bgzf`-compressed output. This change may have following effects
([\#2458](https://github.com/seqan/seqan3/pull/2458)):
* A noticeable slowdown when writing `gz`-compressed content since, in contrast to `bgzf`, `gz` does not feature
Expand Down
13 changes: 12 additions & 1 deletion include/seqan3/io/sam_file/format_bam.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,9 @@ inline void format_bam::read_alignment_record(stream_type & stream,
int32_t tmp32{};
read_field(stream_view, tmp32);

if (tmp32 > 0) // header text is present
bool const header_present{tmp32 > 0};

if (header_present) // header text is present
read_header(stream_view | views::take_exactly_or_throw(tmp32), header, ref_seqs);

int32_t n_ref;
Expand All @@ -352,6 +354,15 @@ inline void format_bam::read_alignment_record(stream_type & stream,

read_field(stream_view, tmp32); // l_ref (length of reference sequence)

// If there was no header text, we parse reference sequences block as header information
if (!header_present)
{
header.ref_ids().push_back(string_buffer);
header.ref_id_info.emplace_back(tmp32, "");
header.ref_dict[(header.ref_ids())[(header.ref_ids()).size() - 1]] = (header.ref_ids()).size() - 1;
continue;
}

auto id_it = header.ref_dict.find(string_buffer);

// sanity checks of reference information to existing header object:
Expand Down
20 changes: 20 additions & 0 deletions test/unit/io/sam_file/format_bam_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -666,3 +666,23 @@ TEST_F(bam_format, issue2417)

EXPECT_EQ(num_records, 1u);
}

// https://github.com/seqan/seqan3/issues/1201
TEST_F(bam_format, issue1201)
{
// A BAM file with an l_text of 0
std::string const input{
// read1 117 ref 1 0 * = 1 0 ACGTA IIIII
'\x42', '\x41', '\x4d', '\x01', '\x00', '\x00', '\x00', '\x00', '\x01', '\x00', '\x00', '\x00', '\x04', '\x00',
'\x00', '\x00', '\x72', '\x65', '\x66', '\x00', '\x70', '\x07', '\x00', '\x00', '\x2e', '\x00', '\x00', '\x00',
'\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x06', '\x00', '\x49', '\x12', '\x00', '\x00',
'\x75', '\x00', '\x05', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00', '\x00',
'\x00', '\x00', '\x00', '\x00', '\x72', '\x65', '\x61', '\x64', '\x31', '\x00', '\x12', '\x48', '\x10', '\x28',
'\x28', '\x28', '\x28', '\x28'
};

std::istringstream stream{input};
seqan3::sam_file_input fin{stream, seqan3::format_bam{}};

fin.header().format_version;
}

0 comments on commit 3007ff5

Please sign in to comment.