GitHub - hornos/pixz: Parallel, indexed xz compressor

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README		README
TODO		TODO
common.c		common.c
cpu.c		cpu.c
endian.c		endian.c
list.c		list.c
pixz.c		pixz.c
pixz.h		pixz.h
read.c		read.c
test.sh		test.sh
write.c		write.c

Repository files navigation

Pixz (pronounced 'pixie') is a parallel, indexing version of XZ.


The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

* They are single-threaded, while most users nowadays have multi-core computers.
* The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.


With pixz, both these problems are solved. The most useful commands:

$ pixz foo.tar foo.tpxz         # Compress and index a tarball, multi-core
$ pixz -l foo.tpxz              # Very quickly list the contents of the compressed tarball
$ pixz -d foo.tpxz foo.tar      # Decompress it, multi-core
$ pixz -x dir/file < foo.tpxz | tar x   # Very quickly extract a file, multi-core.
                                        # Also verifies that contents match index.

$ tar -Ipixz -cf foo.tpxz foo           # Create a tarball using pixz for multi-core compression

$ pixz bar bar.xz           # Compress a non-tarball, multi-core
$ pixz -d bar.xz bar        # Decompress it, multi-core


Specifying input and output:

$ pixz < foo.tar > foo.tpxz     # Same as 'pixz foo.tar foo.tpxz'
$ pixz -i foo.tar -o foo.tpxz   # Ditto. These both work for -x, -d and -l too, eg:

$ pixz -x -i foo.tpxz -o foo.tar file1 file2 ... # Extract the files from foo.tpxz into foo.tar

$ pixz foo.tar                  # Compress it to foo.tpxz, removing the original
$ pixz -d foo.tpxz              # Extract it to foo.tar, removing the original


Other flags:

$ pixz -1 foo.tar           # Faster, worse compression
$ pixz -9 foo.tar           # Better, slower compression

$ pixz -t foo.tar           # Compress but don't treat it as a tarball (don't index it)
$ pixz -d -t foo.tpxz       # Decompress foo, don't check that contents match index
$ pixz -l -t foo.tpxz       # List the xz blocks instead of files

Threadcount limiting:

pixz defaults to one thread per detected processor. You can
limit the number of threads via -p. If count specified is
greater than that available, pixz will complain and die.

$ pg_dumpall ... | pixz -t -p 7 > db_dump.sql.pxz # Limit to 7 worker threads.

WARNING: Running pixz without the -t flag will cause it to treat the input as a tarball, as long as it looks vaguely tarball-like. This means if the file starts with at least 1024 zero bytes, pixz will assume it's empty, and truncate the output! If your input files aren't tarballs, run with -t or face possible data-loss.


Compare to:
    plzip
        * About equally complex, efficient
        * lzip format seems less-used
        * Version 1 is theoretically indexable...I think
    ChopZip
        * Python, much simpler
        * More flexible, supports arbitrary compression programs
        * Uses streams instead of blocks, not indexable
        * Splits input and then combines output, much higher disk usage
    pxz
        * Simpler code
        * Uses OpenMP instead of pthreads
        * Uses streams instead of blocks, not indexable
        * Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage

Comparable tools for other compression algorithms:
    pbzip2
        * Not indexable
        * Appears slow
        * bzip2 algorithm is non-ideal
    pigz
        * Not indexable
    dictzip
        * Not parallel


Requirements:
    * libarchive 2.8 or later
    * liblzma 4.999.9-beta-212 or later (from the xz distribution)

About

Parallel, indexed xz compressor

Readme

BSD-2-Clause license

Activity

1 star

3 watching

0 forks

Report repository

Releases

No releases published

Packages

No packages published

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README

README

TODO

TODO

common.c

common.c

cpu.c

cpu.c

endian.c

endian.c

list.c

list.c

pixz.c

pixz.c

pixz.h

pixz.h

read.c

read.c

test.sh

test.sh

write.c

write.c

Repository files navigation

About

Releases

Packages

License

hornos/pixz

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks