Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird traceback/cigar #22

Closed
ksahlin opened this issue May 23, 2018 · 3 comments
Closed

Weird traceback/cigar #22

ksahlin opened this issue May 23, 2018 · 3 comments

Comments

@ksahlin
Copy link

ksahlin commented May 23, 2018

Hi, I'm observing strange cigars on some of my alignments. They usually have a long deletion at the end and matches the very last base pair. I've attached one of the examples below and have a handful more if you want additional cases.

>>> import parasail
>>> parasail.__version__
'1.1.11'
>>> user_matrix = parasail.matrix_create("ACGT", 2, -3)
>>> result = parasail.nw_trace_scan_16(s1, s2, 10, 1, user_matrix)
>>> s2 = "AGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGGTTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAGAACCAGGACGATTTAAGGATTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAGTCAATTTCTGCTGGCTGGACTGAACTGAAGATCAATCCTCACAATTCAGACTGAGGGTTGAGACAAAACTTTAAGGATACATCTTGGACCATATCGTATTTCATTCTTCTAATGGTGGTTTGGGCTTGTCTTCTAGTCTGGGCCGCTCTAAACATTTATAATTCCAACATTGTGGATTTCATCTTATATCTGTGGACCATCCTAGTTTATTCTCCCATAAGTCTTAGAAGCTTTATGGTGATTATTTTGAGGTTTTCATTCTCGCATAAAGCACAATGCTGTCTTCATCAGAAAACAGTTGGCATAAGAATTAAACATATG"
>>> s1 = "AGCTGTTACTGCAAGTCCATGCCTCTACCATGAGGGTGGAGGAGTTAAGATCAACAGATCCACATGTACCTTGAGGTGACAGACTGGCTCTGAACAAGTTGAAATCATCGCAGAAGGATAAAGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGAGTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAAGAACCAGGACGATTTAAGGATTTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAG"
>>> result = parasail.nw_trace_scan_16(s1, s2, 10, 1, user_matrix)
>>> cigar_string = str(result.cigar.decode).decode('utf-8')
>>> cigar_string
u'121I96=1I217=2X96=1I22=1I374=321D1='

I would not expect the last 321D stretch as s1 and s2 matches perfectly in the ends. Here is the result:

AGCTGTTACTGCAAGTCCATGCCTCTACCATGAGGGTGGAGGAGTTAAGATCAACAGATCCACATGTACCTTGAGGTGACAGACTGGCTCTGAACAAGTTGAAATCATCGCAGAAGGATAAAGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAATTTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCGAG-TTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTGAAAGAACCAGGACGATTTAAGGATTTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACA---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------G
-------------------------------------------------------------------------------------------------------------------------AGTTCGTCAGTTTATGATCTTCACACAATCTAGTGAAAAAACAGCAGTAAGTTGTCTTTCTCAAAATGACTGGAAGTTAGATGTTGCAACAGATAA-TTTTTCCAAAATCCTGAACTTTATATACGAGAGAGTGTAAAAGGATCATTGGACAGGAAGAAGTTAGAACAGCTGTACAATAGATACAAAGACCCTCAAGATGAGAATAAAATTGGAATAGATGGCATACAGCAGTTCTGTGATGACCTGGCACTCGATCCAGCCAGCATTAGTGTGTTGATTATTGCATGGAAGTTCAGAGCAGCAACACAGTGCG-GTTTCTCCAAACAGGAGTTCATGGATGGCATGACAGAATTAGGATGTGACAGCATAGAAAAACTAAAGGCCCAGATACCCAAGATGGAACAAGAATTG-AAGAACCAGGACGATTTAAGGA-TTTTACCAGTTTACTTTTAATTTTGCAAAGAATCCAGGACAAAAAGGATTAGATCTAGAAATGGCCATTGCCTACTGGAACTTAGTGCTTAATGGAAGATTTAAATTCTTAGACTTATGGAATAAATTTTTGTTGGAACATCATAAACGATCAATACCAAAAGACACTTGGAATCTTCTTTTAGACTTCAGTACGATGATTGCAGATGACATGTCTAATTATGATGAAGAAGGAGCATGGCCTGTTCTTATTGATGACTTTGTGGAATTTGCACGCCCTCAAATTGCTGGGACAAAAAGTACAACAGTGTAGCACTAAAGGAACCTTCTAGAATGTACATAGTCTGTACAATAAATACAACAGAAAATTGCACAGTCAATTTCTGCTGGCTGGACTGAACTGAAGATCAATCCTCACAATTCAGACTGAGGGTTGAGACAAAACTTTAAGGATACATCTTGGACCATATCGTATTTCATTCTTCTAATGGTGGTTTGGGCTTGTCTTCTAGTCTGGGCCGCTCTAAACATTTATAATTCCAACATTGTGGATTTCATCTTATATCTGTGGACCATCCTAGTTTATTCTCCCATAAGTCTTAGAAGCTTTATGGTGATTATTTTGAGGTTTTCATTCTCGCATAAAGCACAATGCTGTCTTCATCAGAAAACAGTTGGCATAAGAATTAAACATATG

Any idea what causes this? Thanks for your time!

@ksahlin
Copy link
Author

ksahlin commented May 23, 2018

To help troubleshooting, I can add that when I simply reverse the strings, i.e., aligning s1[::-1] to s2[::-1], the one base pair match now has the correct placement, but the other end instead has the problem. That is, the problem seems to happen in the end of the cigar.

>>> result = parasail.nw_trace_scan_16(s1[::-1], s2[::-1], 10, 1, user_matrix)
>>> cigar_string = str(result.cigar.decode, 'utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: str() takes at most 1 argument (2 given)
>>> cigar_string = str(result.cigar.decode).decode('utf-8')
>>> cigar_string
u'321D371=1I24=1I98=2X212=1I99=121I2='

@ksahlin
Copy link
Author

ksahlin commented May 24, 2018

oh, my bad! I just realized that I need to use semi-global to get the results that I want..

@ksahlin ksahlin closed this as completed May 24, 2018
@jeffdaily
Copy link
Owner

jeffdaily commented May 24, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants