PAQ8PX compression archiver
Switch branches/tags
Nothing to show
Clone or download
Latest commit f8826c1 Nov 11, 2018
Permalink
Failed to load latest commit information.
.gitignore Add AVX implementation of dot_product and train Feb 3, 2018
CHANGELOG paq8px_v171 Nov 11, 2018
DOC DOC Apr 23, 2018
README paq8px_v170 Nov 3, 2018
paq8px.cpp paq8px_v171 Nov 11, 2018

README

                              PAQ8PX README

---------
COPYRIGHT
---------

    Copyright (C) 2008 Matt Mahoney, Serge Osnach, Alexander Ratushnyak,
    Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson,
    Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, Kaido Orav, Simon Berger,
    Neill Corlett, Márcio Pais, Andrew Epstein, Mauro Vezzosi, Zoltán Gotthardt,
    Sebastian Lehmann

    We would like to express our gratitude for the endless support of many 
    contributors who encouraged PAQ8PX development with ideas, testing, 
    compiling, debugging:
    LovePimple, Skymmer, Darek,  Stephan Busch, m^2, Christian Schneider,
    pat357, Rugxulo, Gonzalo, a902cd23, pinguin2 - and many others.

-------
LICENCE
-------

    This program is free software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License as
    published by the Free Software Foundation; either version 2 of
    the License, or (at your option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    General Public License for more details at
    Visit <http://www.gnu.org/copyleft/gpl.html>.


    For a summary of your rights and obligations in plain language visit 
    https://tldrlegal.com/license/gnu-general-public-license-v2

-----
ABOUT
-----

PAQ is a series of experimental lossless data compression software that have gone 
through collaborative development to top rankings on several benchmarks measuring 
compression ratio (although at the expense of speed and memory usage).

PAQ8PX (this branch) is one of the longest living branch of the PAQ series started 
by Jan Ondrus in 2009.
For history and current state of development visit https://encode.ru/threads/342-paq8px

-----------------
HOW DOES IT WORK?
-----------------

Data compression is carried out by compressing the input file(s) bit by bit using 
context mixing i.e. mixing predictions from many models. For a longer (technical) 
explanation see the DOC file.

---------------
PAQ8PX ARCHIVES
---------------

A PAQ8PX archive contains files in a highly compressed format.

You can recognize a PAQ8PX archive from its file extension. The file extension reflects 
the exact PAQ8PX version that created the archive. You may also examine the first few
bytes of the file. If it reads "paq8px" than it is (most probably) a PAQ8PX archive. 
Exact version information cannot be inferred from the archive content.

A PAQ8PX archive may contain multiple files, but once created, you cannot
add to or remove files from the archive.

In single file mode (i.e. when compressing just one file) only file contents are 
stored. File paths, file names, creation or modification dates, file attributes, 
permissions and other metadata are not preserved. In multiple file mode however, 
you may store such metadata in the archive (see the help screen for more details).

Notes on pre-training:
(1)
 The exe pre-training function (see the command line flag "e") uses the paq8px.exe 
 file itself to pre-train the exe prediction model, so expect the resulting archives 
 to be different created by two different paq8px.exe executables - even when built
 with the same command line options from the same source. When the exe is different, 
 the predictions will be slightly different, thus the archives will be (completely) 
 different. As a result you must have the very executable that created an archive in
 order to extract its contents when you use this pre-training option.
 
(2)
 If you use the text pre-training function (see the command line flag "t"), the 
 word list (english.dic) and expression list (english.exp) files are only used for 
 pre-training the prediction model before the actual file compression would begin.
 However their content is not stored in the archive. Since these files are required 
 to pre-train the model before decompression as well, you will need to have them
 when extracting files from your archive.

More notes:
Files and archives larger than 2 GB are not (yet) supported.
PAQ8PX archives are not compatible with previous or future PAQ8PX releases.

----------
HOW TO USE
----------

This software is portable, you don't need to install it.

You may copy paq8px.exe to the folder where your files to be compressed are located
and launch PAQ8PX from the command line (cmd.exe in Windows, a terminal of your choice 
in Linux and macOS). 

----------------------
COMMAND LINE INTERFACE
----------------------

A graphical user interface is not provided, PAQ8PX runs from the command line.
See the help screen for command line options.
For the help screen just launch PAQ8PX from the command line without any options.

--------------
HOW TO COMPILE
--------------

If you are a Windows user you don't need to compile the source. Just grab the 
executable bundled with the source from the https://encode.ru/threads/342-paq8px thread.

If you are a Linux or macOS user, you will need to build the binary from source.
You will need:
  - the PAQ8PX source file (paq8px*.cpp),
  - zlib
  - a compiler.

The following compilers were tested and verified to work correctly in this release 
or in an earlier release:

  - Visual Studio 2017 Community Edition (Windows)
  - MinGW-w64 7.1.0 and 7.2.0 (Windows)
  - GCC 7.3.0 (macOS)
  - GCC 7.2.0 (Ubuntu)
  - clang 4.0.1-6 on (Ubuntu)

Note:

 We build and test 64-bit releases, 32-bit releases are seldom built or tested. A known
 limitation of 32-bit releases is the 2 GB memory barrier. As a consequence, compression 
 and decompression will not work on level 9 and you may experience "out of memory" errors 
 on level 8 where memory consumption is near the 2 GB limit.

gcc users on Linux/macOS may use the following commands to make a quick build (albeit with 
optimal parameters) targeted for their system (note the "-march=native" parameter):

 sudo apt-get install build-essential
 sudo apt-get install zlib1g-dev
 g++ -s -fno-rtti -fwhole-program -static -std=gnu++1z -O3 -m64 -march=native -Wall -floop-strip-mine -funroll-loops -ftree-vectorize -fgcse-sm -fprofile-use paq8px.cpp -opaq8px.exe -lz 

Remark: if the paq8px source file name includes version information (e.g. paq8px_v142.cpp) 
then rename it to "paq8px.cpp" before invoking gcc or change the file name in the 
gcc parameters to include the version information.

For detailed building instructions visit:
https://encode.ru/threads/2885-PAQ8PX-building-instructions

--------------
FOR DEVELOPERS
--------------

When you make a new release:

  - Please update the version number in the "Versioning" section in the source file.
  - Please append a short description of your modifications to the CHANGELOG file.
  - Always publish a static build (link the required dll files statically).
  - Always publish a build for the general public (e.g. don't use -march=native).
  - Before publishing your build, please carry out some basic tests. Run your tests
    with asserts on (see the "#define NDEBUG" line in the "Debug options" section 
    in the source file).