Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment coordinates out of range #2

Closed
kingralph80 opened this issue May 2, 2020 · 7 comments
Closed

Alignment coordinates out of range #2

kingralph80 opened this issue May 2, 2020 · 7 comments

Comments

@kingralph80
Copy link

kingralph80 commented May 2, 2020

Hi,

We continued using GSAlignment but when converting the maf output to axt or psl we found the following error:
Coordinates out of range line 3034523 of B73v5.CML322.50.100.full.maf

The alignment in question was:

a score=111
s ref.scaf_395 59559 111 + 59635 ttttcataaaaaatgggggttgtgtggccatttatcatcgactagaggctcataaacctcaccccacatatgtttccacattcttggatttctggtggagaccatttcttg
s qry.scaf_63 310118 111 + 315365 ttttcataaaaaatgggggttgtgtggccatttatcatcgactagaggctcataaacctcaccccacatatgtttccacattcttggatttctggtggagaccatttcttg

( s ref.scaf_395 59559 111 + 59635 ) seems indeed out of range as 59559 +111 is larger than 59635. Is it possible that this is a bug when choosing the coordinate to print?
We used the latest commit GenAlign v1.0.16

Cheers.

@kingralph80
Copy link
Author

kingralph80 commented May 2, 2020

In case it helps, here is a second alignment that caused the same error:

a score=274
s ref.scaf_62 133360 274 + 133633 tattattgaaaatggtcgctcatggctattttcaaggtcgctcatggctattttcataaaaaatgggggttgtgtggccatttatcatcgactagaggctcataaacctcaccccacatatgtttccttgccatagattacattcttggatttctggtggaaaccatttcttggttaaaaactcgtacgtgttagccttcggtattattgaaaatggtcattcatggctattttcggcaaaatgggggttgtgtggccattgatcgtcgaccaa
s qry.scaf_462 1559 274 + 67495 tattattgaaaatggtcgctcatggctattttcaaggtcgctcatggctattttcataaaaaatgggggttgtgtggccatttatcatcgactagaggctcataaacctcaccccacatatgtttccttgccatagattacattcttggatttctggtggaaaccatttcttggttaaaaactcgtacgtgttagccttcggtattattgaaaatggtcattcatggctattttcggcaaaatgggggttgtgtggccattgatcgtcgaccaa

Here is seems to be out of range by 1 bp. (GenAlign v1.0.16)

@hsinnan75
Copy link
Owner

Thank you for reporting the problematic cases. Could you show me where I can download the two genome sequences? It'd be better if I could have the two genomes for debugging. Thank you!

@kingralph80
Copy link
Author

I send you via email. Let me know in case you did not get it or download from gdrive gives an error.

@kingralph80
Copy link
Author

kingralph80 commented May 2, 2020 via email

@hsinnan75
Copy link
Owner

hsinnan75 commented May 2, 2020

Thank you for the test data. I've found and fixed a bug. Please update GSAlign to 1.0.18. The bug was due to the matching strings may mistakenly span multiple reference sequences. Thank you for letting me know this bug.

@hsinnan75
Copy link
Owner

Hi, GSAlign was previously designed to perform one on one alignment, that is it only aligned to the most similar reference sequence. In the updated version (1.0.18), I removed this strategy and let GSAlign aligns a query sequence to all locally similar sequences. However, it would take much longer time if the two genomes have many duplicons (repetitive sequences). Thus, in the latest version (1.0.19), I added an option (-one) to let user decide which alignment mode GSAlign performs. If "-one" is set, GSAlign will perform one-on-one alignment, otherwise, it will perform all-against-all alignment.

@kingralph80
Copy link
Author

Thank you a lot! The new output from 1.0.19 did not cause any error converting it to chain. Also the addition of the new alignment mode is very appreciated. The alignment step did not take much longer, maybe a few extra minutes, but mapping length increase ~25-30%.

When I compare now the uplift of variants between GSAlign and progressive cactus, both lifted almost the same amount of variants! This is great as cactus is very good but took us almost 2 weeks run time.

There are still some bugs with the VCF output but I would open a new thread as this problem has been fixed and can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants