Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress temporary files #22

Closed
luispedro opened this issue Mar 27, 2017 · 3 comments

Comments

@luispedro
Copy link
Member

commented Mar 27, 2017

This should use something like lz4 or zstandard so that there won't be a disk/speed tradeoff.

@unode

This comment has been minimized.

Copy link
Member

commented Mar 19, 2018

Initial steps to support lz4 in https://github.com/unode/conduit-algorithms/tree/lz4.
Currently waiting on bigmac2k/lz4-conduit#2 for conduit-1.3.0 compatibility.

@luispedro

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2018

Somewhat related, conduit-zstd is now on stackage, so should be available in the next major version: commercialhaskell/stackage#3993

@luispedro

This comment has been minimized.

Copy link
Member Author

commented Jan 12, 2019

Another step: conduit-algorithms now supports zstd, which should make it into the next stackage LTS in a few days to a week. Then, it'll be trivial to finally close this issue: luispedro/conduit-algorithms@72b27f9

I also ran into this issue myself this week, while profiling very large samples, which were failing due to running out of temporary disk space for the large SAM files.

Finally, functions like count quickly become I/O bound if you give them a handful of threads, so it's even possible that compressing and decompressing with zstd will make the process faster.

@luispedro luispedro closed this in 8331066 Jan 16, 2019

luispedro added a commit that referenced this issue Feb 22, 2019
RLS Release 0.11.0
A collection of several bugfixes and performance improvements over the
last few months.

Full ChangeLog:

    * Switch to diagrams package for plotting
    * Update minimap2 version to 2.14
    * Module samtools (version 0.1) now includes samtools_view
    * Update to LTS-13 (GHC 8.6)
    * Fix bug with orf_find & prots_out argument
    * Call bwa/minimap2 with interleaved fastq files
    * Add --verbose flag to check-install mode
    * Avoid leaving open file descriptors after FastQ encoding detection
    * Fix bug in garbage collection
    * Compress intermediate SAM files (#22)
    * Tar extraction uses much less memory (#77)
    * Add early checks for input files in more situations (#33)
    * Support compression in collect() output (#42)
    * Fix CIGAR (#92) for select() blocks
luispedro added a commit that referenced this issue Mar 15, 2019
RLS Release 0.11.0
A collection of several bugfixes and performance improvements over the
last few months.

Full ChangeLog:

    * Switch to diagrams package for plotting
    * Update minimap2 version to 2.14
    * Module samtools (version 0.1) now includes samtools_view
    * Fix bug with orf_find & prots_out argument
    * Call bwa/minimap2 with interleaved fastq files
    * Add --verbose flag to check-install mode
    * Avoid leaving open file descriptors after FastQ encoding detection
    * Fix bug in garbage collection
    * Compress intermediate SAM files (#22)
    * Tar extraction uses much less memory (#77)
    * Add early checks for input files in more situations (#33)
    * Support compression in collect() output (#42)
    * Fix CIGAR (#92) for select() blocks
    * Update to LTS-13 (GHC 8.6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.