aligning HiFi data to ONT reads #801

dcopetti · 2021-08-09T20:51:27Z

Hello,

I would like to align PacBio HiFi reads to ONT reads with the goal of making a consensus of the alignment against the long read backbone (to correct the errors).
First, I wonder whether Winnowmap is more suitable than minimap2, or if any other aligner (lra?) will work best.

Then, in my case the reference is of lower quality (median Q score 11) and the query has much higher accuracy (Q 31): will this affect the choice of the alignment parameters? Can I use -x map-ont or asm20 as presets?

Lastly, I have more than 200 Gb of raw ONT reads that I would like to error correct with ~60 Gb HiFi data: I thought of splitting the "reference" to have small jobs and shorter time to compute the index. Is this a good approach or will splitting affect the representativeness of the minimizers?

Thanks,
Dario

The text was updated successfully, but these errors were encountered:

lh3 · 2021-08-09T21:40:21Z

If the HiFi coverage is high enough, the better approach is to map nanopore reads to HiFi unitigs. 200G vs 60G will take a very long time. Splitting files will complicate downstream analysis.

dcopetti · 2021-08-09T22:13:52Z

good point!
this will work well for ONT reads that are contained in a unitig, but how will it work for a ONT read whose two ends align to two different unitigs? Those are the most important reads for me since are the ones that would scaffold unitgs in a phased manner. It sounds like a scaffolding exercise at this point, but it may lose allele specificity/

The bigger picture is: I am working with a highly heterozygous plant, and I want to:

get an assembly with as many as possible bases in haplotig and few bases in diplotigs as possible
have two sets of within- and between- phased haplotigs
get a total assembly size as close as possible to the 2n genome size
I wonder if by going through HiFi unitigs I will lose bases of the second (though highly similar = homozygous) allele.

Regarding splitting the "reference": where do you think the complicated part would be?
I am thinking of e.g. splitting the 200 Gb of ONT data in 100 files (the issue of k-mers and index will persist though), get 100 bam outputs, then error correct the 100 references individually, get a fasta and feed an assembler.
What part will be problematic?
Thanks.
Dario

lh3 · 2021-08-09T23:29:42Z

If the heterozygosity is a few percent, most contigs will be haplotigs with HiFi reads only. HiFi+Hi-C is the best automated solution you can use now. HiFi+Nanopore should work better in theory but there are no good tools yet.

but how will it work for a ONT read whose two ends align to two different unitigs?

minimap2 does local alignment. It can find partial matches. You should add an option something like -p.3. This helps when the right haplotype is broken but the wrong haplotype is complete. It may be better to use option --qstrand. This is a new option in v2.22 but not advertised.

scaffold unitgs in a phased manner

Then you should map nanopore reads to unitigs because in comparison to HiFi reads, unitigs carry phasing information in longer range. If you correct Nanopore reads with HiFi reads, you will need to connect the phases of HiFi reads, which is effectively an assembly – it is hard.

Regarding splitting the "reference": where do you think the complicated part would be?

First, the phasing issue above. Second, you need to merge results properly. When you split the reference, a HiFi read may hit to irrelevant Nanopore reads. You need to filter them out.

Based on your description, I strongly recommend to map nanopore reads to unitigs. You may start with option --qstrand --cs -cx map-ont -p.3.

lh3 added the question label Aug 9, 2021

dcopetti mentioned this issue Aug 10, 2021

aligning HiFi reads to single pass long reads marbl/Winnowmap#23

Open

dcopetti mentioned this issue Sep 30, 2021

optimizing unitig size and amount of allelic sequences chhylp123/hifiasm#190

Open

lh3 closed this as completed Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aligning HiFi data to ONT reads #801

aligning HiFi data to ONT reads #801

dcopetti commented Aug 9, 2021

lh3 commented Aug 9, 2021

dcopetti commented Aug 9, 2021

lh3 commented Aug 9, 2021

aligning HiFi data to ONT reads #801

aligning HiFi data to ONT reads #801

Comments

dcopetti commented Aug 9, 2021

lh3 commented Aug 9, 2021

dcopetti commented Aug 9, 2021

lh3 commented Aug 9, 2021