Skip to content

ish7o/huffman-coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

huff

CLI tool for file compression and decompression using Huffman coding, written in Go. Tested on Pan Tadeusz (466KB -> 265KB) (43% compression)

Build

git clone https://github.com/ish7o/huffman-coding.git
cd huffman-coding
make build

Usage

./bin/huff < file.txt > file.huff
./bin/unhuff < file.huff > file.txt

My notes

Prologue:

You can only write full bytes to a file, so if the compressed output isn't divisible by 8 there will be trailing 0s appended. To fix this, i add padding bits at the start, according to the table below. When decompressing, I read 1s until a 0s is found.

Padding bits at end Prefix bits you write Meaning
0 11111110 no padding (8 valid bits)
1 1111110 7 valid bits
2 111110 6 valid bits
3 11110 5 valid bits
4 1110 4 valid bits
5 110 3 valid bits
6 10 2 valid bits
7 0 1 valid bit

Important resources

How it works (stolen from wiki)

Given string:

"A_DEAD_DAD_CEDED_A_BAD_BABE_A_BEADED_ABACA_BED""
  1. Characters sorted by freq:
C: 2
B: 6
E: 7
_: 10
D: 10
A: 11
  1. Construct tree from first 2 (C, B):
  CB 8
 /0   \1
C 2  B: 6

And move it back to it's place:

E: 7
CB: 8
_: 10
D: 10
A: 11
  1. Construct tree from first 2 (E, CB)
  ECB 15
 /0    \1
E 7   CB 8
      /0 \1
    C 2   B 6
  1. Continue till it's:
      _DAECB 46
     /0        \1
   _D 20        AECB 26
  /0   \1      /0     \1
_ 10   D 10   A 11    ECB 15
                      /0    \1
                     E 7    CB 8
                           /0   \1
                         C 2     B 6

And you get:

_: 00
D: 01
A: 10
E: 110
C: 1110
B: 1111

brilliant!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors