Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aligning HiFi data to ONT reads #801

Closed
dcopetti opened this issue Aug 9, 2021 · 3 comments
Closed

aligning HiFi data to ONT reads #801

dcopetti opened this issue Aug 9, 2021 · 3 comments
Labels

Comments

@dcopetti
Copy link

dcopetti commented Aug 9, 2021

Hello,

I would like to align PacBio HiFi reads to ONT reads with the goal of making a consensus of the alignment against the long read backbone (to correct the errors).
First, I wonder whether Winnowmap is more suitable than minimap2, or if any other aligner (lra?) will work best.

Then, in my case the reference is of lower quality (median Q score 11) and the query has much higher accuracy (Q 31): will this affect the choice of the alignment parameters? Can I use -x map-ont or asm20 as presets?

Lastly, I have more than 200 Gb of raw ONT reads that I would like to error correct with ~60 Gb HiFi data: I thought of splitting the "reference" to have small jobs and shorter time to compute the index. Is this a good approach or will splitting affect the representativeness of the minimizers?

Thanks,
Dario

@lh3
Copy link
Owner

lh3 commented Aug 9, 2021

If the HiFi coverage is high enough, the better approach is to map nanopore reads to HiFi unitigs. 200G vs 60G will take a very long time. Splitting files will complicate downstream analysis.

@lh3 lh3 added the question label Aug 9, 2021
@dcopetti
Copy link
Author

dcopetti commented Aug 9, 2021

good point!
this will work well for ONT reads that are contained in a unitig, but how will it work for a ONT read whose two ends align to two different unitigs? Those are the most important reads for me since are the ones that would scaffold unitgs in a phased manner. It sounds like a scaffolding exercise at this point, but it may lose allele specificity/

The bigger picture is: I am working with a highly heterozygous plant, and I want to:

  • get an assembly with as many as possible bases in haplotig and few bases in diplotigs as possible
  • have two sets of within- and between- phased haplotigs
  • get a total assembly size as close as possible to the 2n genome size
    I wonder if by going through HiFi unitigs I will lose bases of the second (though highly similar = homozygous) allele.

Regarding splitting the "reference": where do you think the complicated part would be?
I am thinking of e.g. splitting the 200 Gb of ONT data in 100 files (the issue of k-mers and index will persist though), get 100 bam outputs, then error correct the 100 references individually, get a fasta and feed an assembler.
What part will be problematic?
Thanks.
Dario

@lh3
Copy link
Owner

lh3 commented Aug 9, 2021

If the heterozygosity is a few percent, most contigs will be haplotigs with HiFi reads only. HiFi+Hi-C is the best automated solution you can use now. HiFi+Nanopore should work better in theory but there are no good tools yet.

but how will it work for a ONT read whose two ends align to two different unitigs?

minimap2 does local alignment. It can find partial matches. You should add an option something like -p.3. This helps when the right haplotype is broken but the wrong haplotype is complete. It may be better to use option --qstrand. This is a new option in v2.22 but not advertised.

scaffold unitgs in a phased manner

Then you should map nanopore reads to unitigs because in comparison to HiFi reads, unitigs carry phasing information in longer range. If you correct Nanopore reads with HiFi reads, you will need to connect the phases of HiFi reads, which is effectively an assembly – it is hard.

Regarding splitting the "reference": where do you think the complicated part would be?

First, the phasing issue above. Second, you need to merge results properly. When you split the reference, a HiFi read may hit to irrelevant Nanopore reads. You need to filter them out.

Based on your description, I strongly recommend to map nanopore reads to unitigs. You may start with option --qstrand --cs -cx map-ont -p.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants