Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WARNING] For a multi-part index, no @SQ lines will be outputted. Please use --split-prefix. #301

Open
dangchenyuan opened this issue Dec 27, 2018 · 11 comments
Labels

Comments

@dangchenyuan
Copy link

when I used the nanopore data to map the Illunima raw data (150bp), the minimap2 was warnning as blow:
[WARNING] For a multi-part index, no @sq lines will be outputted. Please use --split-prefix.
My code is :
minimap2 -ax sr all_nanopore_0-10reads.racon.consensus.fasta ../4-h-0-10_R1.fastq ../4-h-0-10_R2.fastq > Nano_NGS_aln.sam

@lh3 lh3 added the question label Dec 27, 2018
@lh3
Copy link
Owner

lh3 commented Dec 27, 2018

As the warning message said – apply option --split-index.

@lh3 lh3 closed this as completed Dec 27, 2018
@dangchenyuan
Copy link
Author

Thank you!
But I find another question that the different potsion of "--split-prefix" can make different results.
for example,
If I used the code:
minimap2 -ax sr --split-prefix 4pf_0-10.clean.fasta 4PF-0-10_R1_prinseq_good_4pFy.fastq 4PF-0-10_R2_prinseq_good_M_b_.fastq > Nano4pfclean_NGS_aln.sam
I can get a 53G sam file.

But if I used the code:
minimap2 -ax sr 4pf_0-10.clean.fasta --split-prefix 4PF-0-10_R1_prinseq_good_4pFy.fastq 4PF-0-10_R2_prinseq_good_M_b_.fastq > Nano4pfclean_NGS_aln_smp.sam
I just can get a 41G sam file.

The two codes are just different in "--split-prefix " positions, why did I get different results?

@lh3
Copy link
Owner

lh3 commented Jan 12, 2019

Read the manual. --split-prefix requires an argument.

@hasindu2008
Copy link
Contributor

@dangchenyuan

eg :

minimap2 -ax sr --split-prefix temp_name 4pf_0-10.clean.fasta 4PF-0-10_R1_prinseq_good_4pFy.fastq 4PF-0-10_R2_prinseq_good_M_b_.fastq > Nano4pfclean_NGS_aln_smp.sam

This will create temporary files in the current directory under the prefix temp_name.
To use the /tmp as the location for temporary files you can give --split-prefix /tmp/temp_name instead.

The problem in your ocmmands are as follows :

minimap2 -ax sr --split-prefix 4pf_0-10.clean.fasta 4PF-0-10_R1_prinseq_good_4pFy.fastq 4PF-0-10_R2_prinseq_good_M_b_.fastq > Nano4pfclean_NGS_aln.sam : 4pf_0-10.clean.fast is taken as the temporary file prefix and 4PF-0-10_R2_prinseq_good_M_b_.fastq will be mapped against 4PF-0-10_R1_prinseq_good_4pFy.fastq

minimap2 -ax sr 4pf_0-10.clean.fasta --split-prefix 4PF-0-10_R1_prinseq_good_4pFy.fastq 4PF-0-10_R2_prinseq_good_M_b_.fastq > Nano4pfclean_NGS_aln_smp.sam : 4PF-0-10_R1_prinseq_good_4pFy.fastq will be the temporary file prefix and 4PF-0-10_R2_prinseq_good_M_b_.fastq will be mapped against 4pf_0-10.clean.fasta.

@dangchenyuan
Copy link
Author

@hasindu2008 @lh3 thank you very much

@rsharris
Copy link

I'm trying to map some hifi reads and am getting that same "For a multi-part index ... use --split-prefix" message.

What confuses me is that I am using a modification of a pipeline that I used before. The modification is that I changed "-x map-pb" to "-x map-hifi". I changed that both in my index creation command and my mapping command.

Ahh ... It took me quite a while to figure out what a "multi-part index" is. The man page doesn't explicitly describe this concept (not by name anyway). Eventually I figured out that this is what the "-I NUM" option does. But, I thought, I didn't use that option. Then I realized that the default for NUM is 4G, and my target is larger than that. So I guess I have a multi-part index. Looking back through my logs from the index build, I find that minimap2 didn't inform me that I have a multi-part index.

So how would a user know she has a multi-part index?

Suggestions:
(1) In the man page, under indexing options, for the -I option, consider adding some text that says this option will create a multi-part index. I mean, that seems like the main reason that option exists.
(1a) Same text would be more helpful if it noted that targets larger than the 4G default will cause a multi-part index. Yes, the user can eventually figure this out, but stating it explicitly will save people's time.
(2) A warning, during index construction, that a multi-part index has been created, would be most helpful.
(3) Consider checking for the absence of the --split-prefix option up front, when the user has a multi-part index. As near as I can tell, my ten minute run was useless (≈150 byte bam file). At this point I don't know whether that was the alignment truth or whether it resulting from this warning. I won't know until I fix the command line and rerun this.

@lh3
Copy link
Owner

lh3 commented Jun 13, 2022

(1) done via 767556b. (2) If the output format is SAM, minimap2 will warn you about multi-part index. (3) Not easy to implement as minimap2 doesn't know whether the reference or index is multi-part without reaching the next part.

See also FAQ.

@rsharris
Copy link

Thanks,

Consider including the term "multi-part index" somewhere in FAQ item 3. Part of the problem I had in figuring out my problem was because the warning message was using that term and I found very little that explained what it meant. (Thinking out loud) Maybe some of the item 3 text could be moved to a new FAQ item titled "references larger than 4Gbp".

For (2), what I meant was a command like "minimap2 -x map-hifi -d orange.idx orange.fasta", which builds the index, could let the user know that a multi-part index had been created. AFAIK there would be no way for me to set the output format to SAM when building the index.

For (3) it sounds like the index file format must be such that there isn't any way to find out how many parts it has without reading the entire first part. If that's true I guess there's nothing that can be done to report the error up front.

Thanks again.

@Dmitry-Antipov
Copy link

Had exactly the issue with index size as @rsharris

For the clarity I'd suggest to change the warning "For a multi-part index, no @sq lines will be outputted. Please use --split-prefix"
to something like "For a multi-part index, no @sq lines will be outputted. Please use --split-prefix or increase index size (-I)" - after I've seen this warning I've just used --split-prefix as I was instructed to do, which was totally wrong in my case (since I rely on mapping quality which does not work with multi-index).

@hasindu2008
Copy link
Contributor

@Dmitry-Antipov Sometime ago when I implemented a part of this --split-prefix strategy and tested, MAPQ values from --split-prefix were close to that from using an increased -I size [see https://www.nature.com/articles/s41598-019-40739-8/figures/3].
Are they quite different now - I haven't run with this option recently.

@Dmitry-Antipov
Copy link

@hasindu2008 Did not actually check since minimap's temporary files run out of my disk space(I have huge dataset, but majority of the alignments are multimappers I planned to filter piping minimap with samtools view -q) , and that helped me understand that I'm doing something wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants