# 1  Example A: Optimal Binary Encoding
Consider a discrete information source X with the alphabet and probabilities given on the nextpage, together with three different binary encodings.Answer the following questions for each of the provided encodings.  You may use either asoftware script or explain your reasoning manually.

In [1]:
symbols = ['a', 'b', 'c', 'd', 'e']
ps = [0.36, 0.24, 0.18, 0.12, 0.10]
code_1 = ['0', '10', '110', '111', '101']
code_2 = ['10', '01', '001', '000', '111']
code_3 = ['00', '01', '11', '100', '101']

1.  Determine the amount of information carried by each symbol.

In [9]:
from math import log2
Hs = [- p * log2(p) for p in ps]
Hs

[0.5306152277996684,
 0.4941344853728565,
 0.4453076138998342,
 0.3670672426864282,
 0.33219280948873625]

2.  Determine the information content of the source.

In [10]:
print(f"{sum(Hs)} bits/symbol")

2.1693173792475235 bits/symbol


3.  Determine the expected length of each of the three encodings.

In [17]:
n_1 = sum(len(c) * p for c, p in zip(code_1, ps))
print(f"code 1: {n_1}")
n_2 = sum(len(c) * p for c, p in zip(code_2, ps))
print(f"code 2: {n_2}")
n_3 = sum(len(c) * p for c, p in zip(code_3, ps))
print(f"code 3: {n_3}")

code 1: 2.04
code 2: 2.4
code 3: 2.2199999999999998


4.  Compare the encodings in terms of their performance and discuss their efficiency.

5.  Identify which of the encodings can be used as prefix codes.  Motivate your answer.

* Code 1 not a valid prefix code, 10 is a encoding but also a prefix for 101.
* Code 2 is a valid prefix code, none of the codes are a prefix of another code
* Code 3 is also a valid prefix code

6.  Determine the symbol lengths assigned by Shannonâ€™s binary encoding to this source.

In [21]:
from math import ceil
N = [ceil(-log2(pi)) for pi in ps]
print(f"<N>: {N}")

<N>: [2, 3, 3, 4, 4]


7.  Construct a binary prefix code for the source that cannot be further improved under thegiven constraints.  Compare it to the three provided encodings.

In [18]:
class Node:
    def __init__(self, symbol, p, left, right, code):
        self.symbol = symbol
        self.p = p
        self.left = left
        self.right = right
        self.code = code
    def toString(self):
        return f"{self.symbol}: p = {self.p}, code = {self.code}"

def CalculateCodes(node, code):
    node.code = code
    if not node.left == None:
        CalculateCodes(node.left, code + "0")
    if not node.right == None:
        CalculateCodes(node.right, code + "1")

def ReturnLeafNodes(node):
    if node == None:
        return None
    if node.left == None and node.right == None:
        print(node.toString())
    else:
        ReturnLeafNodes(node.left)
        ReturnLeafNodes(node.right)

def helper():
    x = symbols
    p = ps
    priority_list = [Node(xi, pi, None, None, "") for xi, pi in zip(x, p)]    

    def ConstructTree():
        priority_list.sort(key = lambda x: x.p, reverse = True)
        node_2 = priority_list.pop()
        node_1 = priority_list.pop()
        node_new = Node(
            symbol = node_1.symbol + node_2.symbol,
            p = node_1.p + node_2.p,
            left = node_1,
            right = node_2,
            code = "")
        if len(priority_list) == 0:
            print(node_new.toString())
            return node_new
        else:
            priority_list.append(node_new)
            return ConstructTree()
            
    return ConstructTree()

root = helper()
CalculateCodes(root, "")
ReturnLeafNodes(root)


abdec: p = 1.0, code = 
a: p = 0.36, code = 00
b: p = 0.24, code = 01
d: p = 0.12, code = 100
e: p = 0.1, code = 101
c: p = 0.18, code = 11


* Code 1 breaks the Kraft-McMillan, because <N> > H(x) and thus is not uniquely decodeable
* Code 2 has still redundancy, because it has more code symbols
* Code 3 is the huffman encoding

In [23]:
e_1 = sum(Hs) / n_1
print(e_1)
e_2 = sum(Hs) / n_2
print(e_2)
e_3 = sum(Hs) / n_3
print(e_3)

1.0633908721801586
0.9038822413531349
0.9771699906520377
