-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
l_qname overflows for QNAMEs of 252 or more characters #520
Labels
Comments
@jmarshall: Thanks for so quickly noticing this issue on the pysam repo! |
jmarshall
added a commit
to jmarshall/htslib
that referenced
this issue
Apr 24, 2017
Since 5d114eb, l_qname includes between 1 and 4 NULs so that the following data is aligned. When QNAME has 252 or more characters, this means that l_qname is 256, which overflows uint8_t. Change l_qname to be a uint16_t. Fixes samtools#520, at the cost of wasting most of the bits previously in unused1. Continue to set l_qname_old, so that code built against previous (e.g., 1.4 release) headers and library continues to work (for length <252 QNAMEs) when using a fixed libhts.so. TODO When the libhts sover is next bumped, l_qname_old can be removed entirely and whether the extra NULs should be counted in l_qname can be revisited.
Closed
PR #523 suggests a fix for this that I believe maximises binary compatibility. If the libhts .so-version were to be bumped, less convoluted fixes become available. |
jmarshall
added a commit
to jmarshall/htslib
that referenced
this issue
Jul 24, 2019
So that CIGAR data is 32-bit aligned, there are extra padding NULs in memory after qname (see 5d114eb). For simplicity in the bam_get_cigar() et al macros, these NULs are counted in both l_qname and l_extranul -- but this means that the maximum value for l_qname is 256 (namely a QNAME of length >= 252, plus one NUL terminator, plus more NULs to get to a multiple of 4). Make l_qname a uint16_t and (to retain the simplicity) leave the l_qname/l_extranul representation as it is. Rearrange the two fields so that the larger field takes over the previous unused1's space, so that sizeof(bam1_core_t) is unchanged. Revert the checks added by 079c7b8, which are no longer needed. In bam_construct_seq(), just trust the length obtained from the CRAM record; in bam_read1(), the l_qname value has come from a 1-byte raw field, so there is no longer any overflow risk; in sam_parse1(), revert the check to check against the SAM spec's exact 254 maximum length (+ 1 for the following TAB). Add test cases with long read names. Fixes samtools#520.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
5d114eb added extra NULs in memory to ensure that CIGAR data is 4-byte aligned. To avoid breaking bad code that does not use the accessor macros, these extra bytes were included in in-memory
l_qname
. For QNAMES of length ≥252 characters, this means thatl_qname
needs to have the value 256, but it is auint8_t
.There are various ways of fixing this, though it's less straightforward to do so while minimising binary compatibility implications.
This is the cause of pysam-developers/pysam#447.
The text was updated successfully, but these errors were encountered: