You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed a possible issue with psyam with BAM file. I noticed for alignments that did not have query sequences stored (e.g., "*", according to the BAM standard, query_sequence is not required) that query_alignment_end (qend) and query_alignment_length (qlen) return an exception:
SystemError: error return without exception set
It seems that soft clipping of 1 (e.g., 124=1S) with an alignment without a query sequence will cause this exception. Changing the cigar string to have soft clipping of 2 (e.g., 123=2S), results in -2 being returned for qend.
I think what is happening here is that since the query sequence is not present, the length = 0. cdef uint32_t end_offset = src.core.l_qseq
Then later when end offset is modified for soft clipping: elif op == BAM_CSOFT_CLIP: end_offset -= cigar_p[k] >> BAM_CIGAR_SHIFT
end_offset is being set to -1, returning the exception. The result of -2 for qend when soft clipping is 2, seems to confirm this.
The behavior I expected was for pysam to calculate the qend using the cigar string when the query sequence is not present.
Here's a representative portion of the sam file that caused be to discover the issue. NODE_666_length_125_cov_612.438_ID_49376^I0^INC_001141_chr_09^I433573^I60^I1S4=1X6=1X10=1X5=1I44=1X50=^I*^I0^I0^IGGAAACGGGTTTCATAAGGCACCCCGGCACCCCTATAATTGGCATTCCCACATTCTGCGCATACGAATACACATAAGTGCCATAAAACAATACTCCACATACGAAGTCTATGAAAGATGTATGAT^I*^INM:i:5^IAS:i:108^IXB:f:200.558^IXE:Z:5.35e-52$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001141_chr_09^I9389^I60^I13=1X36=1X26=1X23=1X8=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001139_chr_07^I6886^I60^I39=1X10=1X41=1I8=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:6^IAS:i:105^IXB:f:195.019^IXE:Z:2.49e-50$ NODE_666_length_125_cov_612.438_ID_49376^I256^INC_001135_chr_03^I307444^I60^I2S3=1X6=3X59=1X30=2X18=^I*^I0^I0^I*^I*^INM:i:7^IAS:i:102^IXB:f:189.479^IXE:Z:1.16e-48$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001133_chr_01^I2400^I60^I6=1X11=2X15=1X11=1X53=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:100^IXB:f:185.785^IXE:Z:1.5e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001144_chr_12^I13680^I60^I3=1X14=2X30=1X59=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I256^INC_001144_chr_12^I1062563^I60^I2S3=1X6=3X4=1X9=1X9=1X34=1X50=^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001142_chr_10^I9372^I60^I13=1X36=1X26=1X23=1X8=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001143_chr_11^I2415^I60^I35=1X14=1X17=1X8=1X14=1I8=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001134_chr_02^I8203^I60^I18=2X30=1X34=1X18=2D6=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:10^IAS:i:94^IXB:f:174.705^IXE:Z:3.24e-44$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001140_chr_08^I12530^I60^I2S3=1X7=1I5=1X1=1X6=1X1=1X1=1D10=1X3=1D2X3=1I3=1X29=1X14=1X1=1X9=1X6=1X5=^I*^I0^I0^I*^I*^INM:i:18^IAS:i:69^IXB:f:128.539^IXE:Z:2.56e-30$
The text was updated successfully, but these errors were encountered:
Hello !
I've noticed a possible issue with psyam with BAM file. I noticed for alignments that did not have query sequences stored (e.g., "*", according to the BAM standard, query_sequence is not required) that
query_alignment_end
(qend
) andquery_alignment_length
(qlen
) return an exception:SystemError: error return without exception set
It seems that soft clipping of 1 (e.g., 124=1S) with an alignment without a query sequence will cause this exception. Changing the cigar string to have soft clipping of 2 (e.g., 123=2S), results in -2 being returned for
qend
.In https://github.com/pysam-developers/pysam/blob/master/pysam/calignedsegment.pyx:
The function definition specifies an exception is raised when -1 is returned.
cdef inline int32_t getQueryEnd(bam1_t *src) except -1:
I think what is happening here is that since the query sequence is not present, the length = 0.
cdef uint32_t end_offset = src.core.l_qseq
Then later when end offset is modified for soft clipping:
elif op == BAM_CSOFT_CLIP: end_offset -= cigar_p[k] >> BAM_CIGAR_SHIFT
end_offset is being set to -1, returning the exception. The result of -2 for
qend
when soft clipping is 2, seems to confirm this.The behavior I expected was for pysam to calculate the
qend
using the cigar string when the query sequence is not present.Here's a representative portion of the sam file that caused be to discover the issue.
NODE_666_length_125_cov_612.438_ID_49376^I0^INC_001141_chr_09^I433573^I60^I1S4=1X6=1X10=1X5=1I44=1X50=^I*^I0^I0^IGGAAACGGGTTTCATAAGGCACCCCGGCACCCCTATAATTGGCATTCCCACATTCTGCGCATACGAATACACATAAGTGCCATAAAACAATACTCCACATACGAAGTCTATGAAAGATGTATGAT^I*^INM:i:5^IAS:i:108^IXB:f:200.558^IXE:Z:5.35e-52$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001141_chr_09^I9389^I60^I13=1X36=1X26=1X23=1X8=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001139_chr_07^I6886^I60^I39=1X10=1X41=1I8=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:6^IAS:i:105^IXB:f:195.019^IXE:Z:2.49e-50$ NODE_666_length_125_cov_612.438_ID_49376^I256^INC_001135_chr_03^I307444^I60^I2S3=1X6=3X59=1X30=2X18=^I*^I0^I0^I*^I*^INM:i:7^IAS:i:102^IXB:f:189.479^IXE:Z:1.16e-48$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001133_chr_01^I2400^I60^I6=1X11=2X15=1X11=1X53=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:100^IXB:f:185.785^IXE:Z:1.5e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001144_chr_12^I13680^I60^I3=1X14=2X30=1X59=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I256^INC_001144_chr_12^I1062563^I60^I2S3=1X6=3X4=1X9=1X9=1X34=1X50=^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001142_chr_10^I9372^I60^I13=1X36=1X26=1X23=1X8=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001143_chr_11^I2415^I60^I35=1X14=1X17=1X8=1X14=1I8=1X10=1X6=1X4=1S^I*^I0^I0^I*^I*^INM:i:8^IAS:i:99^IXB:f:183.939^IXE:Z:5.38e-47$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001134_chr_02^I8203^I60^I18=2X30=1X34=1X18=2D6=3X6=1X3=2S^I*^I0^I0^I*^I*^INM:i:10^IAS:i:94^IXB:f:174.705^IXE:Z:3.24e-44$ NODE_666_length_125_cov_612.438_ID_49376^I272^INC_001140_chr_08^I12530^I60^I2S3=1X7=1I5=1X1=1X6=1X1=1X1=1D10=1X3=1D2X3=1I3=1X29=1X14=1X1=1X9=1X6=1X5=^I*^I0^I0^I*^I*^INM:i:18^IAS:i:69^IXB:f:128.539^IXE:Z:2.56e-30$
The text was updated successfully, but these errors were encountered: