Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 2.2 KB

README.md

File metadata and controls

44 lines (32 loc) · 2.2 KB

Canonical Huffman Coding

Program that compresses and decompresses ASCII files based on Huffman Coding in a canonical manner.

Getting Started

  • In general, use the following to run the archiver program:

    $ make build \
      && make test \
      && make run
    $ cd build/src \
      && ./archiver --compress test.huff ../../tests/test_1.txt ../../tests/test_2.txt \
      && ./archiver --decompress test.huff
  • For local development, you can attempt to use:

    $ make local-init && make conan-build

Commands

  • ./archiver -h displays help for using the program.
  • ./archiver -c archive_name file1 [file2 ...] encodes the files fil1, file2, ... and saves the result to the file archive_name.
  • ./archiver -d archive_name decodes the files from the archive archive_name and puts them in the current directory.

File format

Nine-bit values are written in low-to-high order format (analogous to little-endian for bits). That is, the bit corresponding to 2^0 comes first, followed by 2^1, and so on, up to the bit corresponding to 2^9.

The archive file has the following format:

  1. A 9-bit number indicating the number of characters in the alphabet SYMBOLS_COUNT.

  2. Data block for recovering the canonical code:

    1. SYMBOLS_COUNT values of 9 bits (alphabet characters in the order of canonical codes).
    2. A list of MAX_SYMBOL_CODE_SIZE values of 9 bits, the i-th (when numbered from 0) element of which is the number of characters with the code length i + 1. MAX_SYMBOL_CODE_SIZE, the maximum code length in the current encoding, is not explicitly written to the file because it can be deduced from the available data.
  3. The encoded file name.

  4. The encoded content of the file.

  5. The encoded service symbol FILENAME_END.

  6. If there are additional files in the archive, the encoded service symbol ONE_MORE_FILE is used, and the encoding continues.

  7. The encoded service symbol ARCHIVE_END.