You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The writer.checkError() in BasicFastqWriter.write() actually causes the underlying PrintStream to flush thus resulting in terrible fastq write performance. This performance hit can be avoided by directly using the underlying OutputStream instead of wrapping with a PrintStream that swallows the IOException that needs to be raised in BasicFastqWriter.write().
Additionally, the current implementation writes inconsisent newlines on Windows systems. The call to println() writes CRLF which results in a FASTQ file with \n within each 4-line record, and \r\n between records.
I am using BasicFastqWriter in downstream and improving performance sounds really good. But I would recommend to implement it in a different way:
FastqEncoder using a OutputStream to encode. This will allow to re-use the code
Support for java.nio.Path to allow other filesystems (e.g., HDFS/GCS)
If maintainers allow a change in the contract, do not wrap IOExceptions with SAMException, to get clearer stack-traces, messages and causes of the errors (quite important when debugging other filesystems)
I'm fairly new to using this library, but it seems like the more appropriate underlying object to use would be a Writer, rather than either a PrintStream or OutputStream.
The
writer.checkError()
inBasicFastqWriter.write()
actually causes the underlying PrintStream to flush thus resulting in terrible fastq write performance. This performance hit can be avoided by directly using the underlyingOutputStream
instead of wrapping with aPrintStream
that swallows theIOException
that needs to be raised inBasicFastqWriter.write()
.Additionally, the current implementation writes inconsisent newlines on Windows systems. The call to println() writes CRLF which results in a FASTQ file with
\n
within each 4-line record, and\r\n
between records.I've written an implementation that fixes the above two issues at https://github.com/PapenfussLab/gridss/tree/dev/src/main/java/htsjdk/samtools/fastq/NonFlushingBasicFastqWriter.java. Let me know if you want me to adapt this into a PR for htsjdk.
The text was updated successfully, but these errors were encountered: