Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sometimes quick merge does not merge contigs? #67

Open
zmz1988 opened this issue Oct 21, 2021 · 3 comments
Open

sometimes quick merge does not merge contigs? #67

zmz1988 opened this issue Oct 21, 2021 · 3 comments

Comments

@zmz1988
Copy link

zmz1988 commented Oct 21, 2021

Dear developers, thanks a lot for writing this nice tool! I used the quickmerge frequently to merge my assemblies from PacBio and Nanopore. Most of the time I see a successful big improvement of NG values, but sometimes quickmerge doesn't seem to merge the query contigs, though no errors was reported and all files were generated.

For example, I used Nanopore assembly (NG50 ~ 10M) as the reference to merge contigs from PacBio assemblies (NG50 ~ 4M). In the failed case, the resulted merged assembly has the same NG50 value (or only several k bp difference) and the same number of contigs as the query assembly, even if the parameter -l was set to the N50 value of the reference (Nanopore). In this case, if I lower the -l value significantly, say 2M, then the continuity of the resulted assembly is improved. But I'm kind of hesitated to use the merged ones generated with a lower -l value...

I had merged around 8 genomes, among which I had three failed cases. I don't know where could be the problem, as the contigs seem aligned well between nanopore and PacBio assemblies, when I aligned them by mummer outside of quickmerge. Could you please give me some hints where the problem could be? Or how should I deal with this problem?

Thanks a lot in advance!

@mahulchak
Copy link
Owner

mahulchak commented Oct 22, 2021 via email

@zmz1988
Copy link
Author

zmz1988 commented Feb 22, 2022

Hi,
Thanks for replying me. Yes, I found that lowering the cutoff 2M doesn't introduce more duplicated sequence. So it's fine. But I recently realised that those places that can't be merged are mostly heterozygous places. For example, the reference assembly has seq1(haplotype A) + seq2(haplotype A) in one contig, however the query assembly has contig1(haplotype A) and contig2(haplotype B). Though the reference assembly remains the alternative allele of seq2 (could be aligned to contig2(haplotype B) in query assembly) in the whole genome file as a small contig, but the contig1 and contig2 from query assembly will still not be merged together, as it lacks hints where this haplotype B should be placed.

I'm not sure how I can solve this problem without generating a phased assembly (our species is highly inbred). But I do have quite some gaps because of this reason, though the reference genome are pretty gapless but not with high QV. Do you think whether we could employ gfa file in this case?

@mahulchak
Copy link
Owner

I have not really experimented with gfa file in this context. I will have to think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants