Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state of streaming I/O in khmer, mark 2 #700

Closed
mr-c opened this issue Dec 17, 2014 · 9 comments · Fixed by #1186
Closed

state of streaming I/O in khmer, mark 2 #700

mr-c opened this issue Dec 17, 2014 · 9 comments · Fixed by #1186

Comments

@mr-c
Copy link
Contributor

mr-c commented Dec 17, 2014

Now that SeqAn has landed we support reading from a streamed FASTQ or FASTA using the ReadParser interface: both uncompressed and gziped. Streaming of a bzip2 compressed file using ReadParser does not work natively though if it is a single file it can be piped through a decompressor and into our scripts. The bzip2 issue is fixed in the currently under-development version of SeqAn (2.0), see seqan/seqan#707 (comment) for a discussion.

Screed supports streaming from uncompressed FASTQ and FASTA files in dib-lab/screed#11 which is waiting for review and merging. Bzip2 compressed files are also supported in streaming mode but due to deficiencies in Python 2.x gzip files are not natively streamable. We could backport code from Python 3.x to work around that. http://bugs.python.org/file15619/gzip_7471_py27.diff

How to stream: specify /dev/stdin as the input filename and/or /dev/stdout as the output filename.

Scripts that use ReadParser can abbreviate /dev/stdin as -. This could be ported to screed as well as supported as the output filename.

[edited to remove zip archive support as that has been dropped]

@mr-c
Copy link
Contributor Author

mr-c commented Dec 17, 2014

This update was requested by @ctb and is a follow up to #654

@ctb
Copy link
Member

ctb commented Dec 18, 2014

@mr-c
Copy link
Contributor Author

mr-c commented Dec 18, 2014

Updated to point out that screed will also support streaming of zip'd sequence files (unlike SeqAn which doesn't support zipped files at all). Not likely to be of common use but I wanted to be complete.

@ctb
Copy link
Member

ctb commented Dec 21, 2014

zip has now been removed from screed, correct? dib-lab/screed#11

@mr-c
Copy link
Contributor Author

mr-c commented Dec 21, 2014

Yep, zip is no more.

On the documentation front: should we mention in the epilog for each script their support for steaming or leave that for the Sphinx docs only?

@ctb
Copy link
Member

ctb commented Dec 21, 2014

On Sun, Dec 21, 2014 at 07:20:55AM -0800, Michael R. Crusoe wrote:

Yep, zip is no more.

On the documentation front: should we mention in the epilog for each script their support for steaming or leave that for the Sphinx docs only?

Let's put in comments for the scripts that DON'T support streaming :)

@mr-c
Copy link
Contributor Author

mr-c commented Dec 21, 2014

+1

There is also a difference of what compression type are supported with
streaming which varies depending on if it is a Screed or ReadParser based
script. This may call for a brief writeup in the docs to explain and
demonstrate (including workarounds using decompressors piped into our
scripts)

On Sun, Dec 21, 2014, 10:23 C. Titus Brown notifications@github.com wrote:

On Sun, Dec 21, 2014 at 07:20:55AM -0800, Michael R. Crusoe wrote:

Yep, zip is no more.

On the documentation front: should we mention in the epilog for each
script their support for steaming or leave that for the Sphinx docs only?

Let's put in comments for the scripts that DON'T support streaming :)


Reply to this email directly or view it on GitHub
#700 (comment).

@mr-c
Copy link
Contributor Author

mr-c commented Dec 21, 2014

Let's soft launch the streaming support as it is now and work on improved docs for the next release.

@ctb
Copy link
Member

ctb commented Dec 21, 2014

+1

@ctb ctb changed the title state of streaming in khmer state of streaming I/O in khmer Jan 18, 2015
@ctb ctb changed the title state of streaming I/O in khmer state of streaming I/O in khmer, mark 2 Jan 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants