Skip to content

Blocked gzip (bgzf) vs igzip #89

Christian Fufezan edited this page Jun 2, 2018 · 1 revision

hroest commented I am trying to understand the difference between blocked gzip [1], [2] used in genomics and the igzip format used here. Is the implementation the same but named differently or this a completely different implementation? Will the tools developed for bgzf also work on igzip compressed mzML files? Given that the format is pretty common in genomics, I wonder whether it would make sense to support this as well?

http://www.htslib.org/doc/bgzip.html https://blastedbio.blogspot.ca/2011/11/bgzf-blocked-bigger-better-gzip.html

fu commented Hi Hannes,

it is not quite the same for two major reasons.

  1. our index is not an additional file (we were using distributed filesystems and the allocation block is so large compared to the index file size that it was really a waste of disk space. I am aware of alternatives that solve that problem but we, in house, had that problem at the time :)

  2. block size is not limited to 64 kb (and average MS1 on a newer machine is, as you most probably know 100k+). The blocks themselves can be variable in size; defined by the user during indexing.

The spirit of igzip is data centric not involving file position book keeping by the user. In other words, using alternative solutions, an interface has to created that converts the chunk size of the data one is interested in into file positions and the concatenate the right blocks. igzip stores the data as data blocks one is interested in and removes the need to tinker with file positions. The index can be any string.

Hope that helps

Cheers

.c

Clone this wiki locally