> ### Digital Communications Systems (3C) - Huffman Coding - Practical Assignment
>
> - Student Name: *Ibrahem Mouhamad*  
> - Student ID: *22PGR0004*

This program uses a priority queue to construct the Huffman tree, and then uses a recursive function to generate the Huffman codes for each character. Finally, it uses the generated codes to compress the original string, and fill the table according to the output of the program.

It can compress any string, not just the string in the assignment. We can provide the required string by assigning it to the `input` variable in the main function.

In [88]:
# Copyright 2023 Ibrahem Mouhamad
#

from collections import Counter
from functools import reduce
from queue import PriorityQueue
from itertools import count
# unique counter to use PriorityQueue without caring about comparisons
unique = count()

# Generate the Huffman codes using deoth first algorithm
def generate_codes(node, code, codes: dict = {}) -> dict:
    # if it is a leaf node then return the code
    if isinstance(node[2], str):
        codes[node[2]] = code
        return
    # visit the left node
    generate_codes(node[2], code + '0', codes)
    # visit the right node
    generate_codes(node[3], code + '1', codes)

    return codes

# Huffman Coding Algorithm
# takes as input the frequency list
# to get the frequency list uses: frequency = Counter(input)
def huffman_coding(frequency) -> dict:
    # Create a priority queue to store the characters and their frequencies
    pq = PriorityQueue()
    for char, count in frequency.items():
        pq.put((count, next(unique), char))

    # Build the Huffman tree
    while pq.qsize() > 1:
        # extract minimum value
        left_node = pq.get()
        # extract minimum value
        right_node = pq.get()
        # calculate the sum the minimum values and assign it to the value of newNode
        parent_node = (left_node[0] + right_node[0], next(unique), left_node, right_node)
        # insert the new node into the tree
        pq.put(parent_node)

    # get the tree root node
    root = pq.get()
    # collect the codes
    codes = generate_codes(root, '')
    return codes

# helper function to print the results table
def print_table(table) -> None:
    print ('| {:<15} | {:<9} | {:<9} | {:<9} |'.format('Character','Frequency', 'Code', 'Size'))
    print('-------------------------------------------------------')
    for row in table:
        print ('| {:<15} | {:<9} | {:<9} | {:<9} |'.format(row[0], row[1], row[2], row[3]))
        print('-------------------------------------------------------')

# calculate the compressed string
def compress(input, huffman_codes) -> str:
    compressed = ''
    for char in input:
        compressed += huffman_codes[char]
    return compressed

if __name__ == '__main__':
    input = 'DABBCAACCCDABBDCCA'
    print('\nThe input string is: \'{}\'\n{} bits are required to send this string.\n'.format(input, 8*len(input)))
    # Count the frequency of each character in the string
    frequency = Counter(input)
    # get Huffman coding
    codes = huffman_coding(frequency)

    # fill the table
    table = []
    compressed_size = 0
    for element in sorted(frequency):
        table.append([element, frequency[element], codes[element], frequency[element]*len(codes[element])])
        compressed_size += frequency[element]*len(codes[element])

    table.append([
        '{} bits'.format(len(frequency)*8),
        '{} bits'.format(len(input)*8),
        '',
        '{} bits'.format(compressed_size)
    ])

    # print the table
    print('Table:')
    print_table(table)
    # Compress the string
    compressed = compress(input, codes)
    print('\nThe compressed string is: \'{}\'\n{} bits are required to send the compressed string.\n'.format(compressed, len(compressed)))

    compression_ratio = 8*len(input)/len(compressed)
    print('The compression ratio = {}'.format(compression_ratio))
    print('The size of the compressed string is 100/{} = {}% of the original input'.format(compression_ratio, 100/compression_ratio))



The input string is: 'DABBCAACCCDABBDCCA'
144 bits are required to send this string.

Table:
| Character       | Frequency | Code      | Size      |
-------------------------------------------------------
| A               | 5         | 10        | 10        |
-------------------------------------------------------
| B               | 4         | 01        | 8         |
-------------------------------------------------------
| C               | 6         | 11        | 12        |
-------------------------------------------------------
| D               | 3         | 00        | 6         |
-------------------------------------------------------
| 32 bits         | 144 bits  |           | 36 bits   |
-------------------------------------------------------

The compressed string is: '001001011110101111110010010100111110'
36 bits are required to send the compressed string.

The compression ratio = 4.0
The size of the compressed string is 100/4.0 = 25.0% of the original input
