Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No -I option #40

Open
soisa001 opened this issue Aug 19, 2023 · 7 comments
Open

No -I option #40

soisa001 opened this issue Aug 19, 2023 · 7 comments

Comments

@soisa001
Copy link

When running winnowmap, the -I option is not recognized.
e.g. after generating the repetitive_k15.txt with meryl:

winnowmap -W repetitive_k15.txt -a -x map-pb -Y -L --eqx --cs -I 32G ref.fa.gz reads.fastq.gz | samtools view -hb | samtools sort -@8 > alignment_sorted.bam

Yields the following error:

[ERROR] unknown option in "-I"

The -I option is needed for a multi-part index.
Thanks.

@cjain7
Copy link
Contributor

cjain7 commented Aug 21, 2023

Sorry, multi-part indexing is not supported yet.

@diego-rt
Copy link

Hi @cjain7

Does this mean that it's not possible to map to genomes larger than 4G while getting accurate mapQs? For minimap2 this would be the case in the absence of the -I flag.

Thanks!

@skoren
Copy link
Member

skoren commented May 23, 2024

I saw this change was added post the last v2.0.3 release version so the condo-installed versions allow using the -I option. I do see slight differences in alignments when increasing -I on genomes w/>4gb genome size. I wanted to confirm if it is safe to use this option assuming no saved index is used or was it removed because it was not working correctly in v2.0.3 as well?

@cjain7
Copy link
Contributor

cjain7 commented Jun 9, 2024

Hi Sergey,
I looked at this now; sorry for the delay in responding. Your question is best answered at the minimap2 help page https://lh3.github.io/minimap2/minimap2.html

Increasing the -I value will help you get slightly more accurate alignments because having the entire reference is helpful to identify the best alignment for a read, and also for computing the mapping qualities. In my view, -I option should not be given to the user during read-to-genome mapping. If it is provided, it is best to ensure that the value is more than the genome size. Most likely, this was the reason why I omitted -I from the development code.

My guess is that minimap2 has -I parameter because it is also used as a read overlapper, and for mapping reads to very large reference databases. Even then, having -I is sub-optimal but it is necessary to control RAM usage.

@skoren
Copy link
Member

skoren commented Jun 10, 2024

The issue is the default -I is only 4gb so even a diploid human genome is too big and we'd want to increase -I (in fact we do when mapping for all our T2T analysis to both haplotypes: https://github.com/arangrhie/T2T-Polish/blob/master/winnowmap/map.sh). There are also much larger genomes (see marbl/verkko#252) which is what made me start looking into this. In these cases of large references it sounds like the -I option would be important to set to be larger than the genome size so it'd be nice to keep it available in future releases since it is being used.

@diego-rt
Copy link

Sorry to jump in, but as a heavy user of giant genomes (30 Gbp and more), I think it is absolutely indispensable to have the -I option enabled.

cjain7 added a commit that referenced this issue Jun 13, 2024
@cjain7
Copy link
Contributor

cjain7 commented Jun 13, 2024

Understood, thank you!
The -I option is now back :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants