Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

l_qname overflows for QNAMEs of 252 or more characters #520

Closed
jmarshall opened this issue Apr 22, 2017 · 2 comments · Fixed by #900
Closed

l_qname overflows for QNAMEs of 252 or more characters #520

jmarshall opened this issue Apr 22, 2017 · 2 comments · Fixed by #900
Labels

Comments

@jmarshall
Copy link
Member

5d114eb added extra NULs in memory to ensure that CIGAR data is 4-byte aligned. To avoid breaking bad code that does not use the accessor macros, these extra bytes were included in in-memory l_qname. For QNAMES of length ≥252 characters, this means that l_qname needs to have the value 256, but it is a uint8_t.

There are various ways of fixing this, though it's less straightforward to do so while minimising binary compatibility implications.

This is the cause of pysam-developers/pysam#447.

@jmarshall jmarshall added the bug label Apr 22, 2017
@dpryan79
Copy link
Contributor

@jmarshall: Thanks for so quickly noticing this issue on the pysam repo!

jmarshall added a commit to jmarshall/htslib that referenced this issue Apr 24, 2017
Since 5d114eb, l_qname includes between
1 and 4 NULs so that the following data is aligned.  When QNAME has 252 or
more characters, this means that l_qname is 256, which overflows uint8_t.

Change l_qname to be a uint16_t.  Fixes samtools#520, at the cost of wasting most
of the bits previously in unused1.  Continue to set l_qname_old, so that
code built against previous (e.g., 1.4 release) headers and library
continues to work (for length <252 QNAMEs) when using a fixed libhts.so.
TODO When the libhts sover is next bumped, l_qname_old can be removed
entirely and whether the extra NULs should be counted in l_qname can
be revisited.
@jmarshall
Copy link
Member Author

PR #523 suggests a fix for this that I believe maximises binary compatibility. If the libhts .so-version were to be bumped, less convoluted fixes become available.

jmarshall added a commit to jmarshall/htslib that referenced this issue Jul 24, 2019
So that CIGAR data is 32-bit aligned, there are extra padding NULs
in memory after qname (see 5d114eb).
For simplicity in the bam_get_cigar() et al macros, these NULs are
counted in both l_qname and l_extranul -- but this means that the
maximum value for l_qname is 256 (namely a QNAME of length >= 252,
plus one NUL terminator, plus more NULs to get to a multiple of 4).

Make l_qname a uint16_t and (to retain the simplicity) leave the
l_qname/l_extranul representation as it is. Rearrange the two fields
so that the larger field takes over the previous unused1's space, so
that sizeof(bam1_core_t) is unchanged.

Revert the checks added by 079c7b8,
which are no longer needed. In bam_construct_seq(), just trust the
length obtained from the CRAM record; in bam_read1(), the l_qname
value has come from a 1-byte raw field, so there is no longer any
overflow risk; in sam_parse1(), revert the check to check against
the SAM spec's exact 254 maximum length (+ 1 for the following TAB).

Add test cases with long read names.

Fixes samtools#520.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants