# Signal and Image Processing (SIP_SS23)

### Research Group Neuroinformatics, Faculty of Computer Science,
### University of Vienna


###  Huffman coding

Lecturer: Prof. Moritz GROSSE-WENTRUP

Tutorial by: Sadiq A. ADEDAYO <sadiq.adedayo@univie.ac.at> <br> 
$\quad\quad\quad\quad$ Jakob PRAGER <jakob.prager@univie.ac.at>

In [98]:
#use huffman.py from moodle
import numpy as np
from huffman import *

Given a string text with characters, each with known prior probabilities. 

In [99]:
text = 'Fischers Fritz fischt frische Fische.'

First we define a function to calculate the frequency of each character in the text and save it in a dictionary


In [100]:
char_cnt_func = lambda string: dict((char, string.count(char)) for char in set(string))

In [101]:
cnt_dict = char_cnt_func(text)

Now we claculate the probability of each character and give it to the function "huffman"

In [107]:
n_char = sum(cnt_dict.values())
char_prob = {char: cnt_dict[char] / n_char for char in sorted(cnt_dict, key=cnt_dict.get)}
char_prob

{'.': 0.02702702702702703,
 'z': 0.02702702702702703,
 'f': 0.05405405405405406,
 't': 0.05405405405405406,
 'e': 0.08108108108108109,
 'F': 0.08108108108108109,
 'r': 0.08108108108108109,
 'c': 0.10810810810810811,
 ' ': 0.10810810810810811,
 'h': 0.10810810810810811,
 's': 0.13513513513513514,
 'i': 0.13513513513513514}

## Creating the huffman tree

We create a function that follows the algorithm shown in class and constructs the huffman tree. 

It follows this pseudocode:

function huffman(probability table): <br>
• 1. Convergence? I.e., only one pair of string is left in the table . à Assign 0 and 1. <br>
• 2. Find the pair of strings with lower probability, e.g., s_1 and s_2 <br>
• 3. Merge them into one new string and compute the corresponding probability, i.e., s_new = s_1 + s_2 <br>
• 4. Update the table with new string s_new <br>
• 5. Further merge this new table until convergence (I.e., repeat 1-4) <br>
• 6. Retrieve the huffman code for the new string s_new <br>
• 7. Append 0 and 1 to the code for s_new, which is the huffman code for s_1 and s_2 respectively. <br>
• return huffman code of this probability table

In [103]:
def huffman(str_dict):

    # str_dict = {string: probability}
    assert abs(sum(str_dict.values())-1.0) < 1e-6,  "The sum of probablities is not equal to 1."

    assert len(str_dict) >= 2, "The table contains less than two strings."
    
    # 1. Convergence?
    if len(str_dict) == 2:
        return dict(zip(str_dict.keys(), ['0', '1']))

    merged_str_dict = str_dict.copy()

    # 2. Find the pair of string with lowest probability
    str_1, str_2 = sorted(str_dict, key=str_dict.get)[:2]
    

    # 3. Merge into new string and compute the probability
    # 4. Update the dictionary (table)
    prob_str_1 = merged_str_dict.pop(str_1)
    prob_str_2 = merged_str_dict.pop(str_2)

    merged_str = str_1 + str_2
    merged_str_dict[merged_str] = prob_str_1 + prob_str_2

    # 5&6. Further construct the huffman tree and retrieve the huffman code for
    # this updated table

    huffman_code = huffman(merged_str_dict)
    
    # 7. Append 0& 1 to the huffman code for the updated table
    code_merged_str = huffman_code.pop(merged_str)
    huffman_code[str_1] = code_merged_str + '0'
    huffman_code[str_2] = code_merged_str + '1'

    return huffman_code

In [104]:
codebook = huffman(char_prob)
codebook

{'i': '100',
 's': '101',
 'h': '010',
 ' ': '000',
 'c': '001',
 'e': '1110',
 'r': '1111',
 'F': '1101',
 'f': '0110',
 't': '0111',
 '.': '11000',
 'z': '11001'}

To encode a message we now only need to get the respective code for each character and add them up.

In [105]:
def encode(text, codebook):
    code = []
    for x in text:
        code.append(codebook[x])
    return code

In [106]:
encode(text, codebook)

['1101',
 '100',
 '101',
 '001',
 '010',
 '1110',
 '1111',
 '101',
 '000',
 '1101',
 '1111',
 '100',
 '0111',
 '11001',
 '000',
 '0110',
 '100',
 '101',
 '001',
 '010',
 '0111',
 '000',
 '0110',
 '1111',
 '100',
 '101',
 '001',
 '010',
 '1110',
 '000',
 '1101',
 '100',
 '101',
 '001',
 '010',
 '1110',
 '11000']

With these three functions we can now encode every text we want