Skip to content

Releases: kowallus/mbgc

mbgc v2.0.1

07 Dec 08:01
Compare
Choose a tag to compare

MBGC 2.0.1: Copyright (c) 2023 Tomasz Kowalski, Szymon Grabowski : 2023-12-06

What's new in this revision:

  • macos and ARM builds supported
  • improved speed of decompression to gz archives option

NOTE: macos binaries are unsigned. To run them you need to open them using Finder application, according to steps described
here. This procedure needs to be repeated twice, because binaries are bundled with openmp dynamic library.

mbgc v2.0

31 Oct 14:13
Compare
Choose a tag to compare

MBGC 2.0: Copyright (c) 2023 Tomasz Kowalski, Szymon Grabowski : 2023-10-31

MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

What's new in this version (major changes):

  • improved compression ratio by 14% (without sacrificing speed)
  • around twice as fast decompression (due to multithreaded decoding)
  • smaller memory footprint (~10% compression, and ~18% decompression)
  • preserved DNA line length in "well-formed" FASTA*
  • decompression to gz archives option
  • archive appending and repacking
  • more flexible partial decompression (using list of patterns)
  • listing contents (filenames & headers)
  • no external dependencies (integrated libdeflate library)
  • redesigned repo compression mode (only ~2% ratio worse than the MBGC 2 max mode in ~40% less time)
  • fixed issues concerning extra large collections reported in 'Sebastian Deorowicz, Agnieszka Danek, Heng Li, AGC: compact representation of assembled genomes with fast queries and updates, Bioinformatics, Volume 39, Issue 3, March 2023, btad097, https://doi.org/10.1093/bioinformatics/btad097'

MBGC 2 is backward compatible and can handle decompression and repacking of MBGC 1 archives.

Note: The performance results are average values (given relatively to MBGC 1.2.2) from experiments performed on bacteria collections used in 'Szymon Grabowski, Tomasz M Kowalski, MBGC: Multiple Bacteria Genome Compressor, GigaScience, Volume 11, 2022, giab099, https://doi.org/10.1093/gigascience/giab099'
Test platform: Intel Core i9-10940X (14 cores) 3.3 GHz CPU, 128 GB of DDR4-RAM (2666 MHz, CL 16) and a fast SSD (ADATA 2 TB M.2 PCIe NVMe XPG SX8200 Pro), running Linux (Debian 11) OS.

*The term "well-formed" FASTA follows the definition in Nucleotide Archival Format specification (cf. section 2.6 Line length).

mbgc v1.2.2 Palindromic Edition

22 Feb 13:23
Compare
Choose a tag to compare

MBGC 1.2.2 (Palindromic Edition): Copyright (c) 2022 Szymon Grabowski, Tomasz Kowalski : 22-02-2022

MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

What's new in this revision:

  • fixes for builds for MS windows (thx to Eugene Shelwien for reporting the bug)
  • better ratio in the default mode (mainly for heterogeneous datasets)
  • reduced memory usage (mainly for larger datasets or larger FASTA files in the collection)
  • higher reference buffer size limit (2^40 bytes) in the default mode
  • improved memory management during encoding (to enable compression with less RAM)
  • improved error messaging

mbgc v1.2.1

02 Dec 12:41
Compare
Choose a tag to compare

MBGC 1.2.1: Copyright (c) 2021 Szymon Grabowski, Tomasz Kowalski : 2021-12-02

MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

What's new in this revision:

  • added new compression modes (speed & repo) oriented (mostly) toward the decompression speed
  • updated command-line help

Note: Contains lists of samples used in tests downloaded from US National Center for Biotechnology Information: https://www.ncbi.nlm.nih.gov/pathogens

Exceptions are the lists of S. cerevisiae and S. paradoxus genomes, which were taken from:
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/cere_assemblies.tgz
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/para_assemblies.tgz

and H. sapiens assemblies, which were taken from:
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg16/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes

mbgc v1.2.0

23 Oct 15:38
Compare
Choose a tag to compare

MBGC 1.2: Copyright (c) 2021 Szymon Grabowski, Tomasz Kowalski : 2021-10-22

MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

What's new in this release:

  • supports standard input and output during (de)compression,
  • option to compress collection of genomes stored in a single FASTA file,
  • higher reference buffer size limit (2^40 bytes) in max mode.

Known bugs:

  • during decompression MBGC may report a wrong number of extracted files (increased by one) in case of collections compressed in max mode.

Note: Contains lists of samples used in tests downloaded from US National Center for Biotechnology Information: https://www.ncbi.nlm.nih.gov/pathogens

Exceptions are the lists of S. cerevisiae and S. paradoxus genomes, which were taken from:
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/cere_assemblies.tgz
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/para_assemblies.tgz

and H. sapiens assemblies, which were taken from:
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg16/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes

mbgc v1.1.1

15 Jul 11:03
Compare
Choose a tag to compare

MBGC 1.1: Copyright (c) 2021 Szymon Grabowski, Tomasz Kowalski : 2021-07-15

MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

What's new in this release:

  • improved matching algorithm (resulting in around 30% compression ratio improvement on average),
  • faster compression and decompression (which is related to the improved compression ratio),
  • reference string is now a circular buffer,
  • added partial decompression option.

Note: Contains lists of samples used in tests downloaded from US National Center for Biotechnology Information: https://www.ncbi.nlm.nih.gov/pathogens

Exceptions are the lists of S. cerevisiae and S. paradoxus genomes, which were taken from:
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/cere_assemblies.tgz
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/para_assemblies.tgz

and H. sapiens assemblies, which were taken from:
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg16/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg17/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes
ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/chromosomes

mbgc v1.0.2

27 Nov 00:08
Compare
Choose a tag to compare

MBGC 1.0: Copyright (c) 2020 Szymon Grabowski, Tomasz Kowalski : 2020-11-20
MBGC: Multiple Bacteria Genome Compressor is a tool for compressing genomes in FASTA (or gzipped FASTA) input format.

Note: Contains lists of samples used in tests downloaded from US National Center for Biotechnology Information: https://www.ncbi.nlm.nih.gov/pathogens

Exceptions are the lists of S. cerevisiae and S. paradoxus genomes, which were taken from
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/cere_assemblies.tgz
ftp://ftp.sanger.ac.uk/pub/users/dmc/yeast/latest/para_assemblies.tgz