Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple empty hits? #164

Closed
mw55309 opened this issue May 14, 2018 · 5 comments
Closed

Multiple empty hits? #164

mw55309 opened this issue May 14, 2018 · 5 comments
Labels

Comments

@mw55309
Copy link

mw55309 commented May 14, 2018

Hello

I'm using minimap2 to map against NCBI nt. The SAM output contains multiple entries for the same query that are basically "no hit" i.e. column 3 is a *.

The last read I checked had 27 such entries (the run hadn't finished) which were identical.

Is this a feature? What am I missing? Why would minimap2 output multiple "no hit" lines for the same read?

I have checked and the input only has one copy of this read.

It's not unique to this read, it happens to all of them

Cheers
Mick

@lh3 lh3 added the question label May 14, 2018
@lh3
Copy link
Owner

lh3 commented May 14, 2018

When you have a huge database like "nt", minimap2 will split it into multiple parts and align all queries against each part independently. For most parts, minimap2 will print unmapped records. This is a limitation of minimap2.

I think @hasindu2008 has a preliminary solution to this issue. See #141.

@mw55309
Copy link
Author

mw55309 commented May 14, 2018

Thanks Heng, I specifically gave the job 4x as much RAM as the size of nt (uncompressed) so technically no reason for it to split the index.

is the creation of a multi-part index (in memory) turned on by default?

@lh3
Copy link
Owner

lh3 commented May 14, 2018

Yes, multi-part is turned on by default. You can increase option -I to 500G (a number larger than the total size of nt) to force minimap2 to generate one index. Depending on -k and -w in use, 4x as much might not be enough.

@hasindu2008
Copy link
Contributor

@mw55309

You can try the merging solution I implemented from the following method. It is still preliminary and only works for single ended reads at the moment. It basically merges the results from a multi-part index to achieve a considerably similar output from a single-part index. I have done testing only on some simulated pacbio reads.
git clone https://github.com/hasindu2008/minimap2 && cd minimap2 && git checkout multipart-merge-tmp && make

Make sure that you use the option --multi-prefix to enable mergine
eg :
./minimap2 -x <profile> -I<partsize> --multi-prefix tmp ref.fa read.fastq

@lh3
Copy link
Owner

lh3 commented Aug 6, 2018

The latest minimap2 has an --split-prefix option:

minimap2 -a --split-prefix tmp ref.fa query.fa

With this option, minimap2 won't write a unmapped record multiple times.

@lh3 lh3 closed this as completed Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants