# Assignment 4 - Greedy Algorithms
## Part 2 - Huffman algorithm 
We want to create a version of the Huffman algorithm using Python. 

### 1 - Data structures set up
We want to sort our characters using a binary heap. You can implement a binary heap in Python using the [heapq library](https://docs.python.org/3/library/heapq.html). We want to create a `HeapNode` class that will work as the basis for storing our Huffman tree. To use our `HeapNode` class with the `heapq` library, we need to modify some of its built-in arithmetic operations. We want to change the `<` and `=` operations to compare the values of the nodes. 

In [24]:
import heapq
class HeapNode:
    def __init__(self, character, value):
        # TODO: Initialize the HeapNode with a character and value
        # We also want to keep track of left_child and right_child,
        # so these should be initialized as None
        self.character = character
        self.value = value
        self.HeapNode = []
        heapq.heappush(self.HeapNode, (self.character, self.value))
        
        self.left_child = None 
        self.right_child = None 


    def __lt__(self, other):
        # TODO: check if other is a HeapNode object (return false if it isn't)
        # and return a lesser than (<) comparision between the two objects values
        if self.value < other.value:
            return self.value < other.value
        else:
            return False

    def __eq__(self, other):
        # TODO: check if other is a HeapNode object (return false if it isn't)
        # and return an equals (==) comparision between the two objects values
        if other.value == self.value:
            return other.value == self.value
        else:
            return False

    def __str__(self):
        # TODO Add a fitting __str__ for printing the Node, could be usefull
        # for testing along the way
        return "Character:" + " " + str(self.character) + " " + "Value:" + " " + str(self.value) + " " + "Left child:" + " " + str(self.left_child) + " " + "Right child:" + " " + str(self.right_child)

    def __repr__(self) -> str:
        return self.__str__()


### 2 - Making the frequency heap 

Let's start by constructing our frequency heap, we want to take a string as input and count the characters using a dictionary, we then want to input the characters into our heap using the [heapq library](https://docs.python.org/3/library/heapq.html). 


In [25]:
def make_frequency_heap(string: str) -> list:
    # Initialize freq dictionary and heap array
    freq = {}
    heap = []
    # TODO loop through the characters in the string and
    # insert them into the freq dictionary,
    # with the character being they key and the value being the frequency
    for character in string:
        if character in freq:
            freq[character] += 1
        else:
            freq[character] = 1
    # TODO insert the values from the freq dictionary into the heap
    # using the HeapNode object and the heapq library
    for i in freq:
        heapq.heappush(heap, HeapNode(i, freq[i]))   
          
    return heap


teststring = "ABBCCCDDDD"
heap = make_frequency_heap(teststring)
for node in heap:
    print(node)


Character: A Value: 1 Left child: None Right child: None
Character: B Value: 2 Left child: None Right child: None
Character: C Value: 3 Left child: None Right child: None
Character: D Value: 4 Left child: None Right child: None


Expected output should be some form of (depending on your `__str__` implementation): 
```sh
Character: 'A' Value: '1' Left child: 'None' Right child: 'None' 
Character: 'B' Value: '2' Left child: 'None' Right child: 'None' 
Character: 'C' Value: '3' Left child: 'None' Right child: 'None' 
Character: 'D' Value: '4' Left child: 'None' Right child: 'None' 
```


### 3 - Merging the codes

Next, we want to merge the characters with their frequencies together. Follow the instructions in the pseudocode bellow to merge the heap into a single Huffman tree


In [26]:
def merge_code(heap: list) -> HeapNode:
    # TODO
    # While there is more than one node in the heap
    # Extract the two nodes with the lowest frequency letters from the heap (remember that the letter with the lowest frequency will always be at the top of the heap)
    # Create a new node that has the sum of the values of the two nodes as its value, and the two nodes as left and right child respectively
    # Push this new node into the heap
    while len(heap)>1: 
        node1 = heapq.heappop(heap)
        node2 = heapq.heappop(heap)
        
        node3 = HeapNode(None, node1.value + node2.value)
        node3.left_child = node1
        node3.right_child = node2 
        
        heapq.heappush(heap, node3)
    # Return the root of the tree
    return heap[0]


### 4 - Traversing the tree

We've now made a program that can construct a Huffman tree using Huffman's algorithm. Now we want to traverse said tree and find out what the Huffman encoding is for each letter. We will do this recursively. We have created the  main function `traverse_huffman()` to set up the variables for you. Your job is to finish the implementation of `traverse_huffman_recursive()`.


In [27]:
def traverse_huffman(root: HeapNode) -> dict:
    # Stores the codes for each letter
    codes = {}
    # Keeps track of the current code
    current_code = ""
    # traverse recursively
    traverse_huffman_recursive(root, current_code, codes)
    # return finished encoding
    return codes


def traverse_huffman_recursive(node: HeapNode, current_code: str, codes: dict) -> None:
    # TODO if there exists a character in the node,
    # append current_code as the value and the character
    # as the key in codes and return
    if node.character != None:
        codes[node.character] = current_code
    
        return
    # TODO make the recursive calls, there should be two,
    # one for the left side of the tree and one for the right
    # When you traverse to the left, append 0 to the current code,
    # and 1 if you traverse to the right
    traverse_huffman_recursive(node.left_child, current_code  + "0", codes) 
    traverse_huffman_recursive(node.right_child, current_code + "1", codes)
    return


### Running the program

Here we have a main function to run the whole program, use it to test if you get the correct output: 


In [28]:
def main():
    text = "ABBBBCCCDDEEEEAAAEEBBBCC"
    heap = make_frequency_heap(text)
    root = merge_code(heap)
    encoding = traverse_huffman(root)
    print(encoding)


main()


{'C': '00', 'E': '01', 'D': '100', 'A': '101', 'B': '11'}


Expected output: `{'C': '00', 'E': '01', 'D': '100', 'A': '101', 'B': '11'}`
