Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange alignment (and seemingly incorrect score) for ONT cDNA alignment #1257

Closed
rob-p opened this issue Nov 12, 2024 · 11 comments
Closed

Strange alignment (and seemingly incorrect score) for ONT cDNA alignment #1257

rob-p opened this issue Nov 12, 2024 · 11 comments

Comments

@rob-p
Copy link
Contributor

rob-p commented Nov 12, 2024

Hi Heng,

I was trying to track down a number of small differences I encounter between standard alignment using minimap2 and using the minimap2-rs crate in our oarfish tool. My working assumption is that command-line minimap2 is the "ground truth", and so any differences are strange (since minimap2-rs just calls to minimap2 as a library).

Anyway, I found a minimal reproducible example, and when I investigated, much to my surprise, I can't even really understand the minimap2 output itself. I'm aligning against a transcriptome reference (which can be obtained here). A file containing the read is here.

I map with the following command:

minimap2 --eqx -N 100 -ax map-ont assembled_txps.fa read.fq | samtools view -b -o read.bam

then, if I look at the resulting bam file, I get:

6cccc3e4-b1b3-4da4-9b62-b663c91f85cc    16      ENST00000450931 1941    31      2798S6=1X2=2I6=1I1=1X4=1X1=2X10=5D1=1D2=1X18=1I17=2I1=2X10=4D5=1X3=1X1=1X6=1I2=3
D2=3D2=1X1=3D3=2I2=1I1=1I6=2D1X4=1X2=9D1=1X5=2I11=2D8=1D4=2I2X3=3I1=1X2=1X2=1I2=2I2=3X2=1I3=1X4=9I2=1X1=3D2=2X1=2I3=1X2=1X1=2I1=1X1=1X2=2X1=1X2=2I4=1X1=2I3=4D5=
1D1=1X3=1X1=1I7=1I1=1X1=1X2=86I6=1X1=1D1X1=1X10=2D4=1D7=1D1X3=11D3=9D6=2X1=2X4=1D6=1D1X2=1X1=1I3=1X4=8D1=2X3=1D1=1D1=1X1=1D3=1D3=3D1X5=3X10=8D1=1D4=1X12=3X11=35
D2=1X6=3D2=2X18=4D9=3D39=1D5=4D1=2X4=1I3=1D21=1X1=1X1=2X1=1X2=1D1=1X9=1D7=1I5=1X2=1X2=1X3=5D5=1D11=2D6=2X4=1D5=1I2=2D7=2I1=5D9=2D4=2X1=1D8=11D3=3D13=7D5=2D10=1I
2=1X2=2D12=33S  *       0       0       AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGACGACGATATCGACCGCCATTGGTCGACGATAACAAGAACGGATTGGTTGACATCGAAGACCCAGAT
TGTTTGATGGCATTACGACACTTTCTCAGAAACCTTCACATCTTGTCAAGGTAACCAACTGTTTGATTATTTAATTCATCGAAAGCGTTTACCTAACAACGTAAAGACGAACAAAAAGATCCAAGATCTCCATGAGCTTTGCGGATCGAACCTGATTGTC
TTTGTGGCCATAATTTGGCAGGCGATCGATCACAGATGAGCCTGTCGTTCGATCACAGCACGACGAAGAATTTAAGCATCTGACATCATTACACCGAAATGTCGGTTGAATTTACTGTTTATCCCGATACCTGATGTTGACAATTTAATGACCAGATTAA
AAGCTGCGACATTACAAAATTGCTGAATCGAGCTTTAATTGTTTTGAAAAACTGATGACGACTTGTTATGGATCCGACACCCAGCACAATTCCATGGTAACAAACAGCTCGATTGACTGTTCTTATAAACAAGAGAAAATGTGAAGCGATCAGTTGAACT
ATCTGAAGACGTTCAATGTTGACATCATCGATTCCAAATGATATGTATTAATGATGACCCGAAAACATCGATTGAATCACCATTAGACTTTGTTATCACAGGAATTTAACATTGACAAAAACACTACAACGACATCGAGAACTGATCGAATAGTAAAACT
TGTTTGTTCCGTTAAGCACACCCAGTCAGAATGGCAATAAAGACCAGATGAAATGAGGAAGAAGCAGGTCTTGAAAACTCGGAATTTCGCGACGAAATACGATTGAAAATTCCTTATGTCTGAAAAGTTTGCGAATATCGACCATCCAATCCTGAACAAA
TGAACGAATGAAGATTCATAACAAAATCAAATCGACACAATCTGAAAGTTATGATTCCACTTAACTGTTCCATGTCAGCGTTTGTTCAAGATCATCGTCTTGACGCAATGACTACAATTGAAGAAAACTGGCTACTTGAAGATTCAGAACGATACACAAC
GAAGAGAGCCATTGAACAGGGAGTCACAAATGTCACTTTGCAAGATGGCAGAAGATTAAAACGACCAGCGGCGATGACGAACGGCTATCAAGTTGAACCCCACGACCAACTAACTTGATGAACCTTCAGTCCAACCGCTGAGTCACAACGACAGATGAAC
GAACAATTTTAAGAAACAAACAAAGTTGAAGGATGAATTTTGCTTGATCCACACCGGTTTGACGTATTGACACCGATTATCCAAAGAGTTCTGTTATTTGAAGACTTTGAATCATAACTTCTTGCAGACCACTGAAAGCTCAATTCGCAGAGTCGACGGA
TCCCATTATTTGAATCGACAGGATCATCCATCTGTCTGATCCGTCTACAGCGCTGGAATGAGTTTTGACCACTGAAATTAATGAACAATCGAAGACCTGTTCTATCTTGGCGCAGACGCAAAAGAAATTTTATTGAACAGAAATGCAGTTGAAGTTTATG
AGATAAGCAGGTTCATCCAGACAATTACCCATTTATCAAAAGACATCCAAGACATTATCTACAGAATGAAGCACCAGTTTGAAATTCGAAGAGGTGGTTCATGGTCGCCCATTTTTTGACCGATCAACGATCTTCCAGAACTACATTGATTGAACATATT
TGATTATTACCATTGAATGGTTCCATCATAAACAGGCGAGTCTGACGAATTATCTTTATTGAGGTCACATGAATGATTCACTGACTTTATTGAAAAGTTCCACCGCATTAGAAAACGTAACAACGAAGAGCGACACGTAACCGAACGAAGAAA      "
""##""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""###"""#""""""##""#""""#"""###""######"###"#######""""#"#""##""""##"""""""""""""""
""#""""""""##""""#"""""""##""""""""""""""""""""""""""""""""""#"""#####"##########""#####"###""########""#""####""##"##""########"""#"#"#""##""""#"#"""#"""""""#"
""""#""""##"""""""#"""#"""#"""""""""##""""""""""#""""""""#"##""#""""##"""#"#""##""""#####$$""##""##""####"#################""##"#$#""###"######$$###"""#####$#$$
#""####"#"#######$$$#""######"##""###"##$$#####"###""##""#$$#"##""##$$#""###$$#"###""#####""###"##$#"##""""####"####"""##"######$#$##$$$$$###$##"##############"
####"###"""""#"#""""#"#"""""#"""""""""""""""""""""#"""""""#""""""""""""""#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""$#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""##""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#"""""""""""
"""""""""""""""""""""""""""""""""""""""$"""""""""""""""""""""#$#++%$$#$"#$&$&&')$''(+$#("(,#%###,&*#+-*)(%##%&%*'/()%(#%$$&#)(*('"%&/*,--**&&))%&'"'+0+'&'$$&(--
*2(.-)(%%#"#'#%&/$&%$'$$&,*'($$"(""&'&$$$%$""$),&&)'"%'"""##"$-/+-&(--&.189/999*3)+(-)'(/*&#)*&"+.*%"#%""#$$(#%"#"$$,-(-/*####""(%*$"""""""$#"#%"%$&#%%$##&&'"""
#"#$"('$$*(&$$"""""""%$"#*#*##%&"#""))+&721.+**)%$#'%"$%'&"+%%%%"''*$&%"##"#"%$%%&""""###""##$))-&%"&%*(##%"#/''$%%"#$(,)'#''#'%"&""%"$&&'$#&"')*%"%'$(""""#&&%.
523*(.441($$.2.76543*+%#%&&)62,')*'*%&%+''(&)&&)#%+%(%$#$('#&+#""""#""""&#%$%)+%&%&%$*9-()%$&%)72,#%"#""""#%$%&'$$&)#""&%%"$%%#'$#$#)*("""$"&""#"#$%#$$')&'+'$'&
##"#"&$&&%#,+(+(('$$$%)+"%"%'$%'$&/(*%&#%$%$(&%37-&%$#"#"""'&,'+,'&)#&))&(#+56:/43($%"%(#"""""""""#%$#$&#%&'$"###%&(,%+--('%$&(,&$#*"$""+$(')'''&&$#)+$5+'')$%.*
')$#"##"$#(&$,,-021'()%$$).(&&$$().%'&&'&0,(*'$##''"""$#"$$"""""""$("$'$&$$"/*-%.341*')&$('()($)$%%&&#&&#($'"%*'+-&"(5)'&+"'$'+%'(*$*,$+*(-0'2=4*"#%+0"""#"%"&/"
#%$%-+""""""%#)&((&%##'""""")&+*,('"""%#%$%$%&*+(--*+)%()'"""""$"%+"%*$"&&*.)*##""#%&+""##"'%$$%&40+*$$""$.'+0+&%"$"$&"#%,&+*%#""&()'(*%$&-/,0/,)&*-$%,./.*,1&#"
#")-.*&##$-2--*/&08-*+-)''""'+$'&"$$$#&$#"#"%,-,('(##&++%-+#$"#%&*&'$"'*#1---(&'$"$#%$%##())-#"$""'$"""""$("$""$""#""'""$*#*/*21(*0(:;1)1:/%'/+%##"###$*&('("#&(
""%&$&'&$&"%#%#&&(&&&"%#(&&"$"(##$%$#((&'$$""""")#%#"$$&,&$$$%'#%'&#(%$#&%&)"$%#&'+&,%%"%'&%%((&"""#"(""##"$##"$""""$'&'*3,2913*2"""$"$$""""#('&%#'%)+**11((('((
&)*%#$"&'$&$"$&%$$"""""*(")$#*'(((-+%'"&")'%""'"#$$$%%$&"$"#&#'$#$#%$"$""'(*&#"$%%&/1,'$$##"$$,'$$#'22*,*&#"$%(*)%''""#&""""""""$%-$"%.01-,,%&(%%,()%"""#(('')*'
&&%(-&%%(%'%,/+/0&',(+),%"('-%#)'*%()+$#%%$"$"%*)##$($*('$&%&*'*...-(')-..+$$&%%"$%$$$%"%'%$(-+''-,&"&$#$%###$%(,#&#"#)("#""%"",,*)(*%'(&%&)()"%&&(",-*"#%&#&#''
%$"#"#$+')+/%*)%$$'"$$(&'&"$#$&'"&$$$(.+()'####"%*&(.-)%%,%+,#%$"'&)#%'*0*&#"',$/-("$$"$$#%"""""$(+(&++&((%#""""        NM:i:402        ms:i:271        AS:i:-19
nn:i:0  tp:A:P  cm:i:14 s1:i:122        s2:i:59 de:f:0.2115     SA:Z:ENST00000447232,1428,-,2509S91M21D1033S,0,21;      rl:i:0
6cccc3e4-b1b3-4da4-9b62-b663c91f85cc    2064    ENST00000447232 1428    0       2509H16=5D24=2D16=4D5=10D30=1033H       *       0       0       ATAGTAAAACTTGTTT
GTTCCGTTAAGCACACCCAGTCAGAATGGCAATAAAGACCAGATGAAATGAGGAAGAAGCAGGTCTTGAAAACTC     $"###%&(,%+--('%$&(,&$#*"$""+$(')'''&&$#)+$5+'')$%.*')$#"##"$#(&$,,-021'()%$$).(
&&$$().%'&&     NM:i:21 ms:i:146        AS:i:124        nn:i:0  tp:A:P  cm:i:7  s1:i:68 s2:i:127        de:f:0.0421     zd:i:1  SA:Z:ENST00000450931,1941,-,2798
S802M55D33S,31,402;     rl:i:0
6cccc3e4-b1b3-4da4-9b62-b663c91f85cc    272     ENST00000418224 1770    0       2509S16=5D24=2D16=4D5=10D30=1033S       *       0       0       *       *      N
M:i:21  ms:i:146        AS:i:124        nn:i:0  tp:A:S  cm:i:13 s1:i:124        de:f:0.0421     rl:i:0
6cccc3e4-b1b3-4da4-9b62-b663c91f85cc    272     ENST00000409593 713     0       2509S16=5D24=2D16=4D5=10D30=1033S       *       0       0       *       *      N
M:i:21  ms:i:146        AS:i:124        nn:i:0  tp:A:S  cm:i:7  s1:i:68 de:f:0.0421     zd:i:1  rl:i:0

While there may be several strange things, it's the first alignment that most confuses me (the one marked as primary)

6cccc3e4-b1b3-4da4-9b62-b663c91f85cc    16      ENST00000450931 1941    31      2798S6=1X2=2I6=1I1=1X4=1X1=2X10=5D1=1D2=1X18=1I17=2I1=2X10=4D5=1X3=1X1=1X6=1I2=3
D2=3D2=1X1=3D3=2I2=1I1=1I6=2D1X4=1X2=9D1=1X5=2I11=2D8=1D4=2I2X3=3I1=1X2=1X2=1I2=2I2=3X2=1I3=1X4=9I2=1X1=3D2=2X1=2I3=1X2=1X1=2I1=1X1=1X2=2X1=1X2=2I4=1X1=2I3=4D5=
1D1=1X3=1X1=1I7=1I1=1X1=1X2=86I6=1X1=1D1X1=1X10=2D4=1D7=1D1X3=11D3=9D6=2X1=2X4=1D6=1D1X2=1X1=1I3=1X4=8D1=2X3=1D1=1D1=1X1=1D3=1D3=3D1X5=3X10=8D1=1D4=1X12=3X11=35
D2=1X6=3D2=2X18=4D9=3D39=1D5=4D1=2X4=1I3=1D21=1X1=1X1=2X1=1X2=1D1=1X9=1D7=1I5=1X2=1X2=1X3=5D5=1D11=2D6=2X4=1D5=1I2=2D7=2I1=5D9=2D4=2X1=1D8=11D3=3D13=7D5=2D10=1I
2=1X2=2D12=33S  *       0       0       AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGACGACGATATCGACCGCCATTGGTCGACGATAACAAGAACGGATTGGTTGACATCGAAGACCCAGAT
TGTTTGATGGCATTACGACACTTTCTCAGAAACCTTCACATCTTGTCAAGGTAACCAACTGTTTGATTATTTAATTCATCGAAAGCGTTTACCTAACAACGTAAAGACGAACAAAAAGATCCAAGATCTCCATGAGCTTTGCGGATCGAACCTGATTGTC
TTTGTGGCCATAATTTGGCAGGCGATCGATCACAGATGAGCCTGTCGTTCGATCACAGCACGACGAAGAATTTAAGCATCTGACATCATTACACCGAAATGTCGGTTGAATTTACTGTTTATCCCGATACCTGATGTTGACAATTTAATGACCAGATTAA
AAGCTGCGACATTACAAAATTGCTGAATCGAGCTTTAATTGTTTTGAAAAACTGATGACGACTTGTTATGGATCCGACACCCAGCACAATTCCATGGTAACAAACAGCTCGATTGACTGTTCTTATAAACAAGAGAAAATGTGAAGCGATCAGTTGAACT
ATCTGAAGACGTTCAATGTTGACATCATCGATTCCAAATGATATGTATTAATGATGACCCGAAAACATCGATTGAATCACCATTAGACTTTGTTATCACAGGAATTTAACATTGACAAAAACACTACAACGACATCGAGAACTGATCGAATAGTAAAACT
TGTTTGTTCCGTTAAGCACACCCAGTCAGAATGGCAATAAAGACCAGATGAAATGAGGAAGAAGCAGGTCTTGAAAACTCGGAATTTCGCGACGAAATACGATTGAAAATTCCTTATGTCTGAAAAGTTTGCGAATATCGACCATCCAATCCTGAACAAA
TGAACGAATGAAGATTCATAACAAAATCAAATCGACACAATCTGAAAGTTATGATTCCACTTAACTGTTCCATGTCAGCGTTTGTTCAAGATCATCGTCTTGACGCAATGACTACAATTGAAGAAAACTGGCTACTTGAAGATTCAGAACGATACACAAC
GAAGAGAGCCATTGAACAGGGAGTCACAAATGTCACTTTGCAAGATGGCAGAAGATTAAAACGACCAGCGGCGATGACGAACGGCTATCAAGTTGAACCCCACGACCAACTAACTTGATGAACCTTCAGTCCAACCGCTGAGTCACAACGACAGATGAAC
GAACAATTTTAAGAAACAAACAAAGTTGAAGGATGAATTTTGCTTGATCCACACCGGTTTGACGTATTGACACCGATTATCCAAAGAGTTCTGTTATTTGAAGACTTTGAATCATAACTTCTTGCAGACCACTGAAAGCTCAATTCGCAGAGTCGACGGA
TCCCATTATTTGAATCGACAGGATCATCCATCTGTCTGATCCGTCTACAGCGCTGGAATGAGTTTTGACCACTGAAATTAATGAACAATCGAAGACCTGTTCTATCTTGGCGCAGACGCAAAAGAAATTTTATTGAACAGAAATGCAGTTGAAGTTTATG
AGATAAGCAGGTTCATCCAGACAATTACCCATTTATCAAAAGACATCCAAGACATTATCTACAGAATGAAGCACCAGTTTGAAATTCGAAGAGGTGGTTCATGGTCGCCCATTTTTTGACCGATCAACGATCTTCCAGAACTACATTGATTGAACATATT
TGATTATTACCATTGAATGGTTCCATCATAAACAGGCGAGTCTGACGAATTATCTTTATTGAGGTCACATGAATGATTCACTGACTTTATTGAAAAGTTCCACCGCATTAGAAAACGTAACAACGAAGAGCGACACGTAACCGAACGAAGAAA      "
""##""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""###"""#""""""##""#""""#"""###""######"###"#######""""#"#""##""""##"""""""""""""""
""#""""""""##""""#"""""""##""""""""""""""""""""""""""""""""""#"""#####"##########""#####"###""########""#""####""##"##""########"""#"#"#""##""""#"#"""#"""""""#"
""""#""""##"""""""#"""#"""#"""""""""##""""""""""#""""""""#"##""#""""##"""#"#""##""""#####$$""##""##""####"#################""##"#$#""###"######$$###"""#####$#$$
#""####"#"#######$$$#""######"##""###"##$$#####"###""##""#$$#"##""##$$#""###$$#"###""#####""###"##$#"##""""####"####"""##"######$#$##$$$$$###$##"##############"
####"###"""""#"#""""#"#"""""#"""""""""""""""""""""#"""""""#""""""""""""""#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""$#""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""##""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#"""""""""""
"""""""""""""""""""""""""""""""""""""""$"""""""""""""""""""""#$#++%$$#$"#$&$&&')$''(+$#("(,#%###,&*#+-*)(%##%&%*'/()%(#%$$&#)(*('"%&/*,--**&&))%&'"'+0+'&'$$&(--
*2(.-)(%%#"#'#%&/$&%$'$$&,*'($$"(""&'&$$$%$""$),&&)'"%'"""##"$-/+-&(--&.189/999*3)+(-)'(/*&#)*&"+.*%"#%""#$$(#%"#"$$,-(-/*####""(%*$"""""""$#"#%"%$&#%%$##&&'"""
#"#$"('$$*(&$$"""""""%$"#*#*##%&"#""))+&721.+**)%$#'%"$%'&"+%%%%"''*$&%"##"#"%$%%&""""###""##$))-&%"&%*(##%"#/''$%%"#$(,)'#''#'%"&""%"$&&'$#&"')*%"%'$(""""#&&%.
523*(.441($$.2.76543*+%#%&&)62,')*'*%&%+''(&)&&)#%+%(%$#$('#&+#""""#""""&#%$%)+%&%&%$*9-()%$&%)72,#%"#""""#%$%&'$$&)#""&%%"$%%#'$#$#)*("""$"&""#"#$%#$$')&'+'$'&
##"#"&$&&%#,+(+(('$$$%)+"%"%'$%'$&/(*%&#%$%$(&%37-&%$#"#"""'&,'+,'&)#&))&(#+56:/43($%"%(#"""""""""#%$#$&#%&'$"###%&(,%+--('%$&(,&$#*"$""+$(')'''&&$#)+$5+'')$%.*
')$#"##"$#(&$,,-021'()%$$).(&&$$().%'&&'&0,(*'$##''"""$#"$$"""""""$("$'$&$$"/*-%.341*')&$('()($)$%%&&#&&#($'"%*'+-&"(5)'&+"'$'+%'(*$*,$+*(-0'2=4*"#%+0"""#"%"&/"
#%$%-+""""""%#)&((&%##'""""")&+*,('"""%#%$%$%&*+(--*+)%()'"""""$"%+"%*$"&&*.)*##""#%&+""##"'%$$%&40+*$$""$.'+0+&%"$"$&"#%,&+*%#""&()'(*%$&-/,0/,)&*-$%,./.*,1&#"
#")-.*&##$-2--*/&08-*+-)''""'+$'&"$$$#&$#"#"%,-,('(##&++%-+#$"#%&*&'$"'*#1---(&'$"$#%$%##())-#"$""'$"""""$("$""$""#""'""$*#*/*21(*0(:;1)1:/%'/+%##"###$*&('("#&(
""%&$&'&$&"%#%#&&(&&&"%#(&&"$"(##$%$#((&'$$""""")#%#"$$&,&$$$%'#%'&#(%$#&%&)"$%#&'+&,%%"%'&%%((&"""#"(""##"$##"$""""$'&'*3,2913*2"""$"$$""""#('&%#'%)+**11((('((
&)*%#$"&'$&$"$&%$$"""""*(")$#*'(((-+%'"&")'%""'"#$$$%%$&"$"#&#'$#$#%$"$""'(*&#"$%%&/1,'$$##"$$,'$$#'22*,*&#"$%(*)%''""#&""""""""$%-$"%.01-,,%&(%%,()%"""#(('')*'
&&%(-&%%(%'%,/+/0&',(+),%"('-%#)'*%()+$#%%$"$"%*)##$($*('$&%&*'*...-(')-..+$$&%%"$%$$$%"%'%$(-+''-,&"&$#$%###$%(,#&#"#)("#""%"",,*)(*%'(&%&)()"%&&(",-*"#%&#&#''
%$"#"#$+')+/%*)%$$'"$$(&'&"$#$&'"&$$$(.+()'####"%*&(.-)%%,%+,#%$"'&)#%'*0*&#"',$/-("$$"$$#%"""""$(+(&++&((%#""""        NM:i:402        ms:i:271        AS:i:-19
nn:i:0  tp:A:P  cm:i:14 s1:i:122        s2:i:59 de:f:0.2115     SA:Z:ENST00000447232,1428,-,2509S91M21D1033S,0,21;      rl:i:0

The alignment score is -19. In minimap2-rs this shows up as u32::MAX - 18 because the score is treated as an unsigned integer when it is returned (which obviously is not right). However, I don't see how the reported score of -19 makes sense here. If I process the CIGAR string itself, I get a much higher score.

Any thoughts about why the reported alignment score is what it is, or how to fix this? Seemingly, an alignment with a score of -19 shouldn't even be reported in this context.

Thanks!
Rob

@lh3
Copy link
Owner

lh3 commented Nov 12, 2024

If the sequences between two anchors can't be aligned well, the score can be negative. This is expected.

@rob-p
Copy link
Contributor Author

rob-p commented Nov 12, 2024

Thanks Heng,

One more related query if it's not too much trouble. I can't seem to get to -19 using the CIGAR string, using the default match, mismatch, and gap penalties listed in the manpage. Is the gap cost gap open + (gap len * gap extend) (which seems to underestimate the score) or gap open + ((gap len - 1) * gap extend) (which seems to overestimate the score)?

Thanks,
Rob

@lh3
Copy link
Owner

lh3 commented Nov 12, 2024

It is open+len*ext. Nonetheless, the longest gap length is 86. This will trigger the dual gap penalty.

@rob-p
Copy link
Contributor Author

rob-p commented Nov 12, 2024

Thank you, @lh3! I can confirm that I, in fact, get the appropriate score (-19) when properly accounting for the dual gap penalty.

While I realize that you're not in anyway affiliated with minimap2-rs, I noticed that mappy also fails to return 2 of the same mappings as command line minimap2 for this read.

Specifically, mappy gives me the following output:

33      835     -       ENST00000450931 2817    1940    2797    589     991     31      tp:A:P  ts:A:.  cg:Z:9M2I6M1I20M5D1M1D21M1I17M2I13M4D18M1I2M3D2M3D4M3D3M2I2M1I1M1I6M2D8M9D7M2I11M2D8M1D4M2I5M3I7M1I2M2I7M1I8M9I4M3D5M2I8M2I12M2I6M2I3M4D5M1D7M1I7M1I6M86I8M1D13M2D4M1D7M1D4M11D3M9D15M1D
6M1D5M1I8M8D6M1D1M1D3M1D3M1D3M3D19M8D1M1D31M35D9M3D22M4D9M3D39M1D5M4D7M1I3M1D31M1D11M1D7M1I15M5D5M1D11M2D12M1D5M1I2M2D7M2I1M5D9M2D7M1D8M11D3M3D13M7D5M2D10M1I5M2D12M
1033    1124    -       ENST00000418224 3018    1769    1881    91      112     0       tp:A:P  ts:A:.  cg:Z:16M5D24M2D16M4D5M10D30M

Do you have any idea why mappy may be missing the two additional (non-primary) alignments returned by minimap2?

Edit: Additionally, I observe the same behavior (only 2 mappings) if I map the read using minimap2-lite. So, I presume this must have to do with some particular option being set in "real" minimap2, but I have no idea what it might be.

@lh3
Copy link
Owner

lh3 commented Nov 12, 2024

The other two alignments are secondary.

@rob-p
Copy link
Contributor Author

rob-p commented Nov 13, 2024

Hi Heng,

But is there a reason they would not be reported? Is there a way I can get them (via mappy, or minimap2-lite, or via minimap2 as a library more generally)? I am not interested in supplementary alignments in this case (though I am getting one), but do want secondary alignments. In mappy and minimap2-rs I am requesting up to 100 alignments for each query.

@lh3
Copy link
Owner

lh3 commented Nov 13, 2024

What is your mappy script?

@lh3
Copy link
Owner

lh3 commented Nov 13, 2024

Note you added -N100 on the minimap2 command line. Have you applied the same setting to mappy?

@rob-p
Copy link
Contributor Author

rob-p commented Nov 13, 2024

Hi Heng,

My mappy script is very minimal:

In [71]: import mappy

In [72]: a = mappy.Aligner('assembled_txps.mmi', preset='ont', best_n=100)

In [73]: r = list(mappy.fastx_read('read.fq'))[0]

In [74]: m = list(a.map(r[1], MD=True))

In [75]: for aln in m:
    ...:     print(aln)
    ...:
33      835     -       ENST00000450931 2817    1940    2797    589     991     31      tp:A:P  ts:A:.  cg:Z:9M2I6M1I20M5D1M1D21M1I17M2I13M4D18M1I2M3D2M3D4M3D3M
2I2M1I1M1I6M2D8M9D7M2I11M2D8M1D4M2I5M3I7M1I2M2I7M1I8M9I4M3D5M2I8M2I12M2I6M2I3M4D5M1D7M1I7M1I6M86I8M1D13M2D4M1D7M1D4M11D3M9D15M1D6M1D5M1I8M8D6M1D1M1D3M1D3M1D3M3D
19M8D1M1D31M35D9M3D22M4D9M3D39M1D5M4D7M1I3M1D31M1D11M1D7M1I15M5D5M1D11M2D12M1D5M1I2M2D7M2I1M5D9M2D7M1D8M11D3M3D13M7D5M2D10M1I5M2D12M
1033    1124    -       ENST00000418224 3018    1769    1881    91      112     0       tp:A:P  ts:A:.  cg:Z:16M5D24M2D16M4D5M10D30M

Where, above the mmi is the minimap2 index built on the same reference I shared with the parameter -x ont. I realize it's therefore redundant to set the ont preset when I load this index, but the above output is identical either way.

@rob-p
Copy link
Contributor Author

rob-p commented Nov 14, 2024

Hi @lh3,

Thanks for your help on this. I debugged it a bit more and I discovered that the differences here are due to the fact that both mappy and minimap2-lite don't allow passing the name of the read into the mapping function. Since the read name is used to break ties, this can actually affect subsequent processing and even which / if certain mappings are reported.

This accounts for the issue I was observing. In order to allow matching the output of command-line minimap2 if the user desires, I've made some modifications to mappy to allow (optionally) passing the read name, and propagating it to the underlying mapping function. That change is present in this PR if you're open to it. I'm happy to discuss that here, or, if you'd prefer, feel free to close this issue and make any comments on that PR directly.

Thanks!
Rob

@lh3
Copy link
Owner

lh3 commented Nov 15, 2024

Closing as #1260 has been merged.

@lh3 lh3 closed this as completed Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants