**CS560 - Algorithms and Their Analysis**
<br>
Date: **19 March 2021**
<br>

Title: **Seminar 8**
<br>
Speaker: **Dr. Shota Tsiskaridze**

Bibliography:
<br> 
 **Chapter 16.3**. Cormen, Thomas H. and Leiserson, Charles Eric and Rivest, Ronald Linn and Stein, Clifford Seth, *Introduction to Algorithms, 3rd Edition*, MIT Press, 2009
 


<h2 align="center">Huffman Codes</h2>

- One of the **good examples** of **using greedy algorithm** is **Huffman codes**.


- **Huffman codes compress data very effectively**: savings of 20% to 90% are typical, depending on the characteristics of the data being compressed.


- We **consider the data** to be a **sequence of characters**.


- **Huffman’s greedy algorithm** uses a **table** giving **how often each character occurs**, i.e., its **frequency**.


- Suppose we have $100$-character data file that we wish to store compactly.

  We observe, that the **characters** in the file **occur with the frequencies** given below:
  
|           |  a |  b |  c |  d | e | f |
|:---------:|:--:|:--:|:--:|:--:|:-:|:-:|
| Frequency | 45 | 13 | 12 | 16 | 9 | 5 |


- There are **many options** for how to represent such a file of information.


- We consider the problem of designing a **binary character code**, in which **each character** is represented by a **unique binary string**, which we call a **codeword**.
 
  For example, if we use **fixed-length code**, we will need 3 bits per character:

|                       |  a  |  b  |  c  |  d  |  e  |  f  |
|:---------------------:|:---:|:---:|:---:|:---:|:---:|:---:|
|       Frequency       |  45 |  13 |  12 |  16 |  9  |  5  |
| Fixed-length codeword | 000 | 001 | 010 | 011 | 100 | 101 |


- This method requires **300 bits** to code the entire file.


- Can we do better?

  A **variable-length code** can do considerably better than a fixed-length code:
  
|                          |  a |  b  |  c  |  d  |   e  |   f  |
|:------------------------:|:--:|:---:|:---:|:---:|:----:|:----:|
|         Frequency        | 45 |  13 |  12 |  16 |   9  |   5  |
| Variable-length codeword | 0  | 101 | 100 | 111 | 1101 | 1100 |


- This method requires:

  $$45 \cdot 1 + 13 \cdot 3 + 12 \cdot 3 + 16 \cdot 3 + 9 \cdot 4 + 5 \cdot 4 = 224 \text{ bits.}$$ 
  
  to represent the file, a savings of approximately 25%. 
  
  In fact, this is an **optimal character code** for this file, as we shall see.
  
 

<h3 align="center">Prefix Codes</h3>

- We consider only **codes** in which ***no codeword** is **also a prefix** of **some other codeword**.


- Such codes are called **prefix codes**.



- Although we won’t prove it, a **prefix code** can **always achieve** the **optimal data compression** among **any character code**.


- The **decoding process** needs a **convenient representation** for the **prefix code** so that we can **easily pick off** the **initial codeword**.


- We interpret the **binary codeword** for a character as the **simple path** from the **root** to that **character**, where $0$ means **go to the left child** and $1$ means **go to the right child**.

  <center><img src="images/S8_Hufmann_Tree.png" width="900" alt="Example" /></center>
  

- One can prove that an **optimal code** for a file is **always represented** by a **full binary tree**, in which **every nonleaf node** has **two children**.


- Thus, the **fixed-length code** in our example **is not optimal** since its tree is **not a full binary tree**: it do not containt codewords beginning 11.


- If $C$ is the **alphabet** from which the characters are drawn and **all character frequencies are positive**, then the **tree for an optimal prefix code has exactly $|C|$ leaves**, one for each letter of the alphabet, and exactly $|C|-1$ **internal nodes**.


- If we denote $c_{freq}$ the frequency of $c$ in the file and $d_T(c)$ denote the depth of $c$'s leaf in the tree, then the number of bits required to encode a file is thus:

  $$B(T) = \sum_{c \in C} c_{freq} \cdot d_T(c),$$
  
  which we define as the **cost** of the tree $T$.

<h3 align="center">Constructing a Huffman Code</h3>

- Huffman invented a **greedy algorithm** that constructs an **optimal prefix** code called a **Huffman code**.


- The **algorithm builds** the **tree** $T$ corresponding to the optimal code in a **bottom-up manner**.


- It **begins** with a set of $|C|$ **leaves** and performs a **sequence** of $|C|-1$ **merging operations** to create the final tree.

  <center><img src="images/S8_Hufmann_Steps.png" width="900" alt="Example" /></center>

In [37]:
# Creating tree
class Tree(object):

    def __init__(self, left=None, right=None):
        self.left = left
        self.right = right

    def children(self):
        return (self.left, self.right)

    def nodes(self):
        return (self.left, self.right)

    def __str__(self):
        return '%s_%s' % (self.left, self.right)


# Main function implementing huffman coding
def huffman_code_tree(node, left=True, binString=''):
    if type(node) is str:
        return {node: binString}
    (l, r) = node.children()
    d = dict()
    d.update(huffman_code_tree(l, True, binString + '0'))
    d.update(huffman_code_tree(r, False, binString + '1'))
    return d


string = 'To be, or not to be: that is the question'
print(string)

# Calculating frequency
freq = {}
for c in string:
    if c in freq:
        freq[c] += 1
    else:
        freq[c] = 1

freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)

print("\n")
for i in range(0,len(freq)):
    print(' %-4r |%12s' % (freq[i][0], freq[i][1]))

nodes = freq

while len(nodes) > 1:
    n = len(nodes)
    (key1, c1) = nodes[n-1]
    (key2, c2) = nodes[n-2]
    nodes = nodes[0:n-2]
    node = Tree(key1, key2)
    nodes.append((node, c1 + c2))

    nodes = sorted(nodes, key=lambda x: x[1], reverse=True)

huffmanCode = huffman_code_tree(nodes[0][0])

print("\n")
for (char, frequency) in freq:
    print(' %-4r |%12s' % (char, huffmanCode[char]))

To be, or not to be: that is the question


 ' '  |           9
 't'  |           6
 'o'  |           5
 'e'  |           4
 'b'  |           2
 'n'  |           2
 'h'  |           2
 'i'  |           2
 's'  |           2
 'T'  |           1
 ','  |           1
 'r'  |           1
 ':'  |           1
 'a'  |           1
 'q'  |           1
 'u'  |           1


 ' '  |          01
 't'  |         110
 'o'  |         101
 'e'  |        1111
 'b'  |        1000
 'n'  |        0001
 'h'  |        0000
 'i'  |        0011
 's'  |        0010
 'T'  |       10010
 ','  |      100111
 'r'  |      100110
 ':'  |      111001
 'a'  |      111000
 'q'  |      111011
 'u'  |      111010


<h2 align="center">Egyptian Fraction</h2>

- **Problem**:

  A fraction is called a **unit fraction**, if **numerator is** $1$ and **denominator** is a **positive integer**.

  For example $frac{1}/{3}$ is a **unit fraction**.

  Every **positive fraction** $\frac{m}{n}$ can be represented as **sum of unique unit fractions**: 

  $$\frac{m}{n} = \sum_{i \in I} \frac{1}{i}$$. 

  Such a representation is called **Egyptian Fraction** as it was used by ancient Egyptians.


- **Solution**:
 
  We can use **gready algorithm** to find the **unit function representation** for a **given positive fraction**:

  For a **given number** of the form $\frac{m_0}{n_0}$, first **find the greatest possible unit fraction**, then recur for the remaining part. 
  
  For example,  consider $\frac{m_0}{n_0} = \frac{6}{14}$.
  
  We first find ceiling of $\frac{n}{m} = \frac{14}{6}$, i.e., $3$. 
  
  So the **first unit fraction** becomes $\frac{1}{3}$, then recur for $\frac{m_1}{n_1} = \frac{6}{14} - \frac{1}{3} = \frac{4}{42}$.
  

In [56]:
import math

def egyptianFraction(m, n): 
    
    # list of denominator of unit fractions
    uf = [] 
  
    # loop until fraction becomes 0
    while m != 0: 
  
        # taking ceiling 
        x = math.ceil(n / m) 
  
        # storing value in uf list 
        uf.append(x) 
  
        # updating new m and n 
        m = x * m - n 
        n = n * x 
  
    # printing the unit fractions 
    for i in range(len(uf)): 
        if i != len(uf) - 1: 
            print("1/{0} +".format(uf[i]), end = " ") 
        else: 
            print("1/{0}".format(uf[i]), end = " ") 

egyptianFraction(6, 14) 

1/3 + 1/11 + 1/231 

<h2 align="center">Fitting Shelves Problem</h2>

- **Problem**:

  Given **length of wall** $w$ and **shelves** of **two lengths** $m$ and $n$.

  Find the **number** of each type of **shelf** to be used and the **remaining empty space** in the **optimal solution** so that the **empty space is minimum**. 

  The **larger** of the **two shelves** is **cheaper** so it is **preferred**. 

  However cost is secondary and **first priority** is to **minimize empty space** on wall.

<h1 align="center">End of Seminar</h1>