Skip to content

omics-rust/rsomics-bgzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rsomics-bgzip

Block-compress or decompress a file in BGZF — Rust port of htslib bgzip.

rsomics-bgzip ref.fa                 # → ref.fa.gz, removes ref.fa
rsomics-bgzip -d ref.fa.gz           # → ref.fa, removes ref.fa.gz
rsomics-bgzip -c ref.fa > ref.fa.gz  # compress to stdout, keep ref.fa
rsomics-bgzip -d -c ref.fa.gz        # decompress to stdout
rsomics-bgzip --test ref.fa.gz       # verify integrity
cat ref.fa | rsomics-bgzip > ref.fa.gz   # stdin → stdout

BGZF is the concatenation of independent <64 KiB gzip blocks, so it is both a valid gzip file and randomly seekable. The output ends with the canonical 28-byte BGZF EOF marker; the framing is byte-compatible with htslib, so samtools/tabix/bgzip and any BGZF reader accept the result.

Options

Flag Meaning
-d, --decompress Decompress instead of compress.
--test Verify integrity (decompress, discard). htslib spells this -t; -t is --threads here, so it is long-only.
-c, --stdout Write to stdout, keeping the input file.
-k, --keep Keep (do not delete) the input file.
-f, --force Overwrite the output file if it exists.
-l, --compress-level INT DEFLATE level 0-9 (default 6).
-t, --threads INT Worker threads (rsomics convention; htslib spells this -@).

When given a file and not -c, the input is removed on success unless -k is passed — matching htslib. With no file argument (or -), it reads stdin and writes stdout.

Scope

This crate is the compress/decompress operation. BGZF GZI random-access (bgzip -b/-s/-I), reindex (-r), and rebgzip (-g) are a distinct index-building operation and live in their own crate, per the one-operation-per-crate rule.

How it is fast

Compression is the hot path. Each 64 KiB block deflates independently, so the work parallelises across -t workers with no inter-block dependency; the writer thread frames and emits blocks while the worker pool keeps deflating. The DEFLATE backend is libdeflate (the same library htslib bgzip uses), so the per-byte deflate cost matches the upstream and the win comes from the pipeline overlap and a zero-copy io::copy feed. Single-thread (-t1) competes directly against bgzip -@1; multi-thread scales past it.

Origin

This crate is an independent Rust reimplementation of htslib bgzip, informed by the upstream MIT-licensed source (bgzip.c): the .gz/.bgz/.bgzf suffix handling for -d, the in-place compress-then-unlink default, the -c/-k/-f semantics, and the 28-byte BGZF EOF marker. BGZF block framing follows the SAMv1 spec (§4.1).

License: MIT OR Apache-2.0. Upstream credit: htslib (MIT/Expat).

About

Block-compress or decompress a file in BGZF — htslib-bgzip-compatible Rust port (libdeflate)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages