Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BasicFastqWriter.write() flushes every call and writes inconsistent newlines #1105

Open
d-cameron opened this issue Apr 16, 2018 · 2 comments

Comments

@d-cameron
Copy link
Contributor

The writer.checkError() in BasicFastqWriter.write() actually causes the underlying PrintStream to flush thus resulting in terrible fastq write performance. This performance hit can be avoided by directly using the underlying OutputStream instead of wrapping with a PrintStream that swallows the IOException that needs to be raised in BasicFastqWriter.write().

Additionally, the current implementation writes inconsisent newlines on Windows systems. The call to println() writes CRLF which results in a FASTQ file with \n within each 4-line record, and \r\n between records.

I've written an implementation that fixes the above two issues at https://github.com/PapenfussLab/gridss/tree/dev/src/main/java/htsjdk/samtools/fastq/NonFlushingBasicFastqWriter.java. Let me know if you want me to adapt this into a PR for htsjdk.

@magicDGS
Copy link
Member

I am using BasicFastqWriter in downstream and improving performance sounds really good. But I would recommend to implement it in a different way:

  • FastqEncoder using a OutputStream to encode. This will allow to re-use the code
  • Support for java.nio.Path to allow other filesystems (e.g., HDFS/GCS)
  • If maintainers allow a change in the contract, do not wrap IOExceptions with SAMException, to get clearer stack-traces, messages and causes of the errors (quite important when debugging other filesystems)

@james-d
Copy link
Contributor

james-d commented Apr 24, 2018

I'm fairly new to using this library, but it seems like the more appropriate underlying object to use would be a Writer, rather than either a PrintStream or OutputStream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants