Block-compress or decompress a file in BGZF — Rust port of htslib bgzip.
rsomics-bgzip ref.fa # → ref.fa.gz, removes ref.fa
rsomics-bgzip -d ref.fa.gz # → ref.fa, removes ref.fa.gz
rsomics-bgzip -c ref.fa > ref.fa.gz # compress to stdout, keep ref.fa
rsomics-bgzip -d -c ref.fa.gz # decompress to stdout
rsomics-bgzip --test ref.fa.gz # verify integrity
cat ref.fa | rsomics-bgzip > ref.fa.gz # stdin → stdoutBGZF is the concatenation of independent <64 KiB gzip blocks, so it is both a
valid gzip file and randomly seekable. The output ends with the canonical
28-byte BGZF EOF marker; the framing is byte-compatible with htslib, so
samtools/tabix/bgzip and any BGZF reader accept the result.
| Flag | Meaning |
|---|---|
-d, --decompress |
Decompress instead of compress. |
--test |
Verify integrity (decompress, discard). htslib spells this -t; -t is --threads here, so it is long-only. |
-c, --stdout |
Write to stdout, keeping the input file. |
-k, --keep |
Keep (do not delete) the input file. |
-f, --force |
Overwrite the output file if it exists. |
-l, --compress-level INT |
DEFLATE level 0-9 (default 6). |
-t, --threads INT |
Worker threads (rsomics convention; htslib spells this -@). |
When given a file and not -c, the input is removed on success unless -k is
passed — matching htslib. With no file argument (or -), it reads stdin and
writes stdout.
This crate is the compress/decompress operation. BGZF GZI random-access
(bgzip -b/-s/-I), reindex (-r), and rebgzip (-g) are a distinct
index-building operation and live in their own crate, per the one-operation-per-crate
rule.
Compression is the hot path. Each 64 KiB block deflates independently, so the
work parallelises across -t workers with no inter-block dependency; the
writer thread frames and emits blocks while the worker pool keeps deflating.
The DEFLATE backend is libdeflate (the same library htslib bgzip uses), so
the per-byte deflate cost matches the upstream and the win comes from the
pipeline overlap and a zero-copy io::copy feed. Single-thread (-t1) competes
directly against bgzip -@1; multi-thread scales past it.
This crate is an independent Rust reimplementation of htslib bgzip, informed
by the upstream MIT-licensed source (bgzip.c): the .gz/.bgz/.bgzf suffix
handling for -d, the in-place compress-then-unlink default, the -c/-k/-f
semantics, and the 28-byte BGZF EOF marker. BGZF block framing follows the
SAMv1 spec (§4.1).
License: MIT OR Apache-2.0. Upstream credit: htslib (MIT/Expat).