New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
state of streaming I/O in khmer, mark 2 #700
Comments
Also see: https://twitter.com/lh3lh3/status/545364380963966977 from @lh3 |
Updated to point out that screed will also support streaming of zip'd sequence files (unlike SeqAn which doesn't support zipped files at all). Not likely to be of common use but I wanted to be complete. |
zip has now been removed from screed, correct? dib-lab/screed#11 |
Yep, zip is no more. On the documentation front: should we mention in the epilog for each script their support for steaming or leave that for the Sphinx docs only? |
On Sun, Dec 21, 2014 at 07:20:55AM -0800, Michael R. Crusoe wrote:
Let's put in comments for the scripts that DON'T support streaming :) |
+1 There is also a difference of what compression type are supported with On Sun, Dec 21, 2014, 10:23 C. Titus Brown notifications@github.com wrote:
|
Let's soft launch the streaming support as it is now and work on improved docs for the next release. |
+1 |
Now that SeqAn has landed we support reading from a streamed FASTQ or FASTA using the ReadParser interface: both uncompressed and gziped. Streaming of a bzip2 compressed file using ReadParser does not work natively though if it is a single file it can be piped through a decompressor and into our scripts. The bzip2 issue is fixed in the currently under-development version of SeqAn (2.0), see seqan/seqan#707 (comment) for a discussion.
Screed supports streaming from uncompressed FASTQ and FASTA files in dib-lab/screed#11 which is waiting for review and merging. Bzip2 compressed files are also supported in streaming mode but due to deficiencies in Python 2.x gzip files are not natively streamable. We could backport code from Python 3.x to work around that. http://bugs.python.org/file15619/gzip_7471_py27.diff
How to stream: specify
/dev/stdin
as the input filename and/or/dev/stdout
as the output filename.Scripts that use ReadParser can abbreviate
/dev/stdin
as-
. This could be ported to screed as well as supported as the output filename.[edited to remove zip archive support as that has been dropped]
The text was updated successfully, but these errors were encountered: