New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erroneous input parsing with paired-reads + singletons #92
Comments
With that command line, minimap2 assumes each fragment has three reads. The correct way to achieve what you want is: (seqtk mergepe read1.fq.gz read2.fq.gz; cat single.fq.gz) | minimap2 -ax sr db.mmi - Anyway, I agree that minimap2 should throw an warning if one file has fewer reads than others. I will implement it at some point. |
When reading from a pipe and mapping against a chunked index, does minimap2 still expect to read the input more than once or does it buffer the content in memory? |
Indeed I can confirm that using a chunked index (32GB) and piped streams results in undefined behavior. With:
minimap2 segfaults once it reaches the second chunk of |
I didn't notice you were using a multi-part index. For such a index, you can't feed data through a pipe. Minimap2 is designed this way to reduce peak memory. It is often impractical to hold either all target or all query sequences in memory. If you have enough memory, build a uni-part index by increasing |
Redirecting the output of seqtk to a file and using that as input is a workable solution. |
master now reports an error if you specify three query files. It also warns if one file contains fewer records than others. |
It might also be useful to abort early if the database is a split one and minimap2 is being executed with piped input. Thanks for the fix. |
While running:
and monitoring I/O using pipe viewer (pv -d pid) I noticed a somewhat strange behavior.
db.mmi
is read in chunks and for every chunkread1.fq.gz
,read2.fq.gz
andsingle.fq.gz
are re-read. This is a bit wasteful since it has to decompress the inputs several times but not necessarily a problem.However,
single.fq.gz
is much smaller than the other 2 files.minimap2
seems to stop reading all inputs as soon as it reaches the end of one of them. Because of thissingle.fq.gz
is read in its entirety while only the first 5-10% ofread1.fq.gz
andread2.fq.gz
seem to be read at which pointminimap2
moves on to the next index chunk.The documentation doesn't explicitly mention support for
single.fq
butminimap2
didn't complain either and a result is still produced.The text was updated successfully, but these errors were encountered: