Skip to content

hornos/pixz

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pixz (pronounced 'pixie') is a parallel, indexing version of XZ.


The existing XZ Utils ( http://tukaani.org/xz/ ) provide great compression in the .xz file format, but they have two significant problems:

* They are single-threaded, while most users nowadays have multi-core computers.
* The .xz files they produce are just one big block of compressed data, rather than a collection of smaller blocks. This makes random access to the original data impossible.


With pixz, both these problems are solved. The most useful commands:

$ pixz foo.tar foo.tpxz         # Compress and index a tarball, multi-core
$ pixz -l foo.tpxz              # Very quickly list the contents of the compressed tarball
$ pixz -d foo.tpxz foo.tar      # Decompress it, multi-core
$ pixz -x dir/file < foo.tpxz | tar x   # Very quickly extract a file, multi-core.
                                        # Also verifies that contents match index.

$ tar -Ipixz -cf foo.tpxz foo           # Create a tarball using pixz for multi-core compression

$ pixz bar bar.xz           # Compress a non-tarball, multi-core
$ pixz -d bar.xz bar        # Decompress it, multi-core


Specifying input and output:

$ pixz < foo.tar > foo.tpxz     # Same as 'pixz foo.tar foo.tpxz'
$ pixz -i foo.tar -o foo.tpxz   # Ditto. These both work for -x, -d and -l too, eg:

$ pixz -x -i foo.tpxz -o foo.tar file1 file2 ... # Extract the files from foo.tpxz into foo.tar

$ pixz foo.tar                  # Compress it to foo.tpxz, removing the original
$ pixz -d foo.tpxz              # Extract it to foo.tar, removing the original


Other flags:

$ pixz -1 foo.tar           # Faster, worse compression
$ pixz -9 foo.tar           # Better, slower compression

$ pixz -t foo.tar           # Compress but don't treat it as a tarball (don't index it)
$ pixz -d -t foo.tpxz       # Decompress foo, don't check that contents match index
$ pixz -l -t foo.tpxz       # List the xz blocks instead of files

Threadcount limiting:

pixz defaults to one thread per detected processor. You can
limit the number of threads via -p. If count specified is
greater than that available, pixz will complain and die.

$ pg_dumpall ... | pixz -t -p 7 > db_dump.sql.pxz # Limit to 7 worker threads.

WARNING: Running pixz without the -t flag will cause it to treat the input as a tarball, as long as it looks vaguely tarball-like. This means if the file starts with at least 1024 zero bytes, pixz will assume it's empty, and truncate the output! If your input files aren't tarballs, run with -t or face possible data-loss.


Compare to:
    plzip
        * About equally complex, efficient
        * lzip format seems less-used
        * Version 1 is theoretically indexable...I think
    ChopZip
        * Python, much simpler
        * More flexible, supports arbitrary compression programs
        * Uses streams instead of blocks, not indexable
        * Splits input and then combines output, much higher disk usage
    pxz
        * Simpler code
        * Uses OpenMP instead of pthreads
        * Uses streams instead of blocks, not indexable
        * Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage

Comparable tools for other compression algorithms:
    pbzip2
        * Not indexable
        * Appears slow
        * bzip2 algorithm is non-ideal
    pigz
        * Not indexable
    dictzip
        * Not parallel


Requirements:
    * libarchive 2.8 or later
    * liblzma 4.999.9-beta-212 or later (from the xz distribution)

About

Parallel, indexed xz compressor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published