# HW 11

**Upload one file** to Gradescope: 
* `HW11.py` (which will be autograded)

___

In [1]:
from collections import Counter

### Counting Frequencies
An inefficient way to count the letter frequencies in a string is to call `.count()` for each letter of the alphabet.

A more efficient method is to use a `Counter` which is a subclass of `dict`. Documentation can be found here: https://docs.python.org/3/library/collections.html#collections.Counter.

Example:
```
from collections import Counter
ct = Counter()
ct.update('banana')
ct.update('bun')
ct
```
returns 
```
Counter({'b': 2, 'a': 3, 'n': 3, 'u': 1})
```
which can be used like a dictionary. To sort by most frequent to least frequent, call
```
ct.most_common()
```
which will return the list
```
[('a', 3), ('n', 3), ('b', 2), ('u', 1)]
```

### Flatland

The file `'flatland.txt'` contains the text of the book *Flatland* by Edwin A. Abbott, which is a satire about Victorian England. Its main characters are geometric shapes. **Calculate the frequencies** of the 26 letters of the alphabet in the text using a `Counter`. Save the result in **`flatland_freq`**.

* For space efficiency, **read the file line by line**. For example:
```
with open('flatland.txt') as fp:
    for line in fp:
        ...
```
* Use `.isalpha()` to distinguish letters from non-alphabetic characters.
* Use `.lower()` to convert upper case characters to lower case.

In [2]:
alphabet = 'abcdefghijklmnopqrstuvwxyz' 
curr_str = ''

with open('flatland.txt') as fp:
    for line in fp:
        for char in line:
            if char.isalpha():
                curr_str += char.lower()

flatland_freq = Counter()

flatland_freq.update(curr_str)

### Flatland Fixed-Length Encodings
Suppose the letters in the alphabet are represented using fixed-length ternary (base 3) codes. **Calculate the total number of ternary digits needed** to encode the 26 letters in *Flatland* (converting upper case letters to lower case). Store the result in `flatland_digit_ct_fixed`.

For example, the first 5 letters of the alphabet can be represented as two-digit base 3 numbers: `a=00`, `b=01`, `c=02`, `d=10`, and `e=11`. Then the encoding for the word `aced` would require 8 digits: `00021110`.

In [3]:
flatland_digit_ct_fixed = 445932
#a=000 b=001 c=002 d=010 e=011 f=012 g=020 h=021 i=022 j=100 k=101 l=102 m= 110 n=111 o=112 p=120 q=121 r= 122 s = 200 t=201 u = 202 v=210 w=211 x=212 y=220 z=221

In [4]:
flatland_freq.values()

dict_values([3694, 6402, 11460, 13582, 10814, 5686, 8706, 11790, 3986, 4438, 18666, 3125, 11154, 9785, 2665, 2148, 2810, 7955, 4488, 2588, 193, 502, 1418, 85, 379, 125])

In [5]:
sum1 = 0
for i in list(flatland_freq.values()):
    sum1 += 3*i
sum1

445932

### Huffman Code

Write a function **`huffman(char_freq)`** that takes a Counter containing `ch: freq` key-value pairs representing letter frequencies, and **returns a dictionary** containing the ternary encodings for the characters. The dictionary keys will be the characters, and the values will be the base 3 encodings in string format. Assume that there are at least 3 characters in `char_freq`.

The algorithm will use a **ternary tree** (instead of a binary tree) composed of `HuffNode`s (defined below) with each node having up to 3 children. The children should be arranged from left to right in order of increasing frequency. (It is not necessary for the function to implement an efficient min-priority queue; it may call `sorted()`.)

**Note**: An optimal encoding can be found if the number of characters is odd. If there is an even number of characters, add a dummy character `'@'` with frequency 0. This will ensure that the root will have 3 children.

**Example**: 
```
char_freq = Counter({'a': 45, 'b': 10, 'c': 18, 
                     'd': 48, 'e': 22, 'f': 33})
huffman(char_freq)

```
returns (in some order)
```
{'b': '211', 'f': '22', 'a': '0', 'd': '1', 'c': '212', 'e': '20'}

```

In [6]:
class HuffNode:
    def __init__(self, ch, freq):
        self.char = ch  # set to '' if internal node
        self.freq = freq
        self.parent = None
        self.left = None
        self.middle = None
        self.right = None

In [107]:
Q = [n[1] for n in flatland_freq.most_common() + [('@', 0)]]
C = [n[0] for n in flatland_freq.most_common() + [('@', 0)]]


In [99]:
x = 1
def test1(x):
    x +=1
test1(x)


1

In [87]:
def create_parent_node(sorted_vals, node_lst, node_ct):
    node_lst1 = []
    '''Pull last three entries in list and create new nodes. If entry is allready node '''
    #print(f'The three least frequent characters are {sorted_vals[-3:]}')
    for i in sorted_vals[-3:]:
        if type(i[0]) != str:
            node_lst1.append(i[0])
        else:
            node_lst1.append(HuffNode(i[0],i[1]))

    new_parent_node = HuffNode(node_ct, node_lst1[-1].freq + node_lst1[-2].freq + node_lst1[-3].freq)   

    new_parent_node.right = node_lst1[-1]
    new_parent_node.middle = node_lst1[-2]
    new_parent_node.left = node_lst1[-3]

    #print(f'New Parent Node Created- Left Child: {new_parent_node.left.char} Middle Child: {new_parent_node.middle.char} Right Child: {new_parent_node.right.char}')

    node_lst.append(new_parent_node)
    return new_parent_node


In [117]:
def huffman(char_freq):
    node_lst = []
    node_ct = 0
    if len(char_freq) % 2 == 0:
        sorted_vals = char_freq.most_common() + [('@',0)]
    else:
        sorted_vals = char_freq.most_common()

    #First three leaf nodes of three least freqent
    node1 = HuffNode(sorted_vals[-1][0], sorted_vals[-1][1])
    node2 = HuffNode(sorted_vals[-2][0], sorted_vals[-2][1])
    node3 = HuffNode(sorted_vals[-3][0], sorted_vals[-3][1])

    # node_lst.append(node1)
    # node_lst.append(node2)
    # node_lst.append(node3)
    
    for i in range(3):
        sorted_vals.pop(-1)

    new_parent = HuffNode(node_ct, node1.freq + node2.freq + node3.freq)
    node_ct +=1

    node_lst.append(new_parent)

    #Update parent children
    new_parent.left = node3
    new_parent.middle = node2
    new_parent.right = node1

    #Update leaf node parent
    node1.parent = node2.parent = node3.parent = new_parent

    #initilaize variable equal to the new parent node
    curr_parent = new_parent
    
    sorted_vals.append((new_parent, new_parent.freq))
    sorted_vals.sort(key = lambda a: a[1], reverse = True)

    while len(sorted_vals) > 1:
        new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
        node_ct += 1
        for i in range(3):
            sorted_vals.pop(-1)

        sorted_vals.append((new_parent, new_parent.freq))
        sorted_vals.sort(key = lambda a: a[1], reverse = True)
        

    return traverse_tree_helper(node_lst)
            

  

    

In [123]:
#char_freq = Counter({'a': 45, 'b': 10, 'c': 18, 'd': 48, 'e': 22, 'f': 33})
char_freq = flatland_freq
node_lst = []
node_ct = 0
if len(char_freq) % 2 == 0:
    sorted_vals = char_freq.most_common() + [('@',0)]
else:
    sorted_vals = char_freq.most_common()
sorted_vals
#First three leaf nodes of three least freqent
node1 = HuffNode(sorted_vals[-1][0], sorted_vals[-1][1])
node2 = HuffNode(sorted_vals[-2][0], sorted_vals[-2][1])
node3 = HuffNode(sorted_vals[-3][0], sorted_vals[-3][1])

# node_lst.append(node1)
# node_lst.append(node2)
# node_lst.append(node3)

for i in range(3):
    sorted_vals.pop(-1)

new_parent = HuffNode(node_ct, node1.freq + node2.freq + node3.freq)
node_ct +=1

node_lst.append(new_parent)

#Update parent children
new_parent.left = node3
new_parent.middle = node2
new_parent.right = node1

#Update leaf node parent
node1.parent = node2.parent = node3.parent = new_parent

#initilaize variable equal to the new parent node
curr_parent = new_parent

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals, node_lst[0].freq, node_lst[0].left.char, node_lst[0].middle.char, node_lst[0].right.char,

new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
node_ct +=1

for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
node_ct +=1

for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

node_lst[1].freq, node_lst[1].left.char, node_lst[1].middle.char, node_lst[1].right.char,
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst, node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals
node_lst
new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals

new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals
new_parent = create_parent_node(sorted_vals, node_lst,node_ct)
node_ct +=1
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals




for i in range(0, len(node_lst)):
    print(node_lst[i].freq, node_lst[i].left.char, node_lst[i].middle.char, node_lst[i].right.char)


210 j z @
782 x 0 q
2702 v 1 k
7401 w p b
8637 y g 2
12118 c m f
16576 l d u
23993 4 h 3
29305 n s r
34404 o a i
42276 6 t 5
71964 8 7 e
148644 11 10 9


In [9]:
node_lst = []
sorted_vals = flatland_freq.most_common() + [('@',0)]
sorted_vals
new_parent = create_parent_node(sorted_vals, node_lst)
for i in range(3):
    sorted_vals.pop(-1)

sorted_vals.append((new_parent, new_parent.freq))
sorted_vals.sort(key = lambda a: a[1], reverse = True)
sorted_vals[-3:]
create_parent_node(sorted_vals, node_lst)
node_lst[1].right.char

AttributeError: 'NoneType' object has no attribute 'char'

In [118]:
huffman(flatland_freq)

[(11, 0),
 (10, 1),
 (9, 2),
 (8, 20),
 (7, 21),
 ('e', 22),
 (6, 220),
 ('t', 221),
 (5, 222),
 ('o', 2220),
 ('a', 2221),
 ('i', 2222),
 ('n', 22220),
 ('s', 22221),
 ('r', 22222),
 (4, 222220),
 ('h', 222221),
 (3, 222222),
 ('l', 2222220),
 ('d', 2222221),
 ('u', 2222222),
 ('c', 22222220),
 ('m', 22222221),
 ('f', 22222222),
 ('y', 222222220),
 ('g', 222222221),
 (2, 222222222),
 ('w', 2222222220),
 ('p', 2222222221),
 ('b', 2222222222),
 ('v', 22222222220),
 (1, 22222222221),
 ('k', 22222222222),
 ('x', 222222222220),
 (0, 222222222221),
 ('q', 222222222222),
 ('j', 2222222222220),
 ('z', 2222222222221),
 ('@', 2222222222222)]

In [20]:
flatland_freq.most_common()[0]

('e', 18666)

In [21]:
sum2 = 0
m=0
for i,j in huffman(flatland_freq):
    sum2 += len(str(j)) * flatland_freq.most_common()[0][1]
    m+=1
sum2

AttributeError: 'NoneType' object has no attribute 'char'

In [22]:
create_parent_node(huffman(char_freq)).right.char

NameError: name 'char_freq' is not defined

In [119]:
char_freq = Counter({'a': 45, 'b': 10, 'c': 18, 'd': 48, 'e': 22, 'f': 33})
huffman(char_freq)


[(1, 0),
 ('d', 1),
 ('a', 2),
 ('f', 20),
 (0, 21),
 ('e', 22),
 ('c', 220),
 ('b', 221),
 ('@', 222)]

In [61]:
def traverse_tree_helper(node_lst):
    k = [node_lst[i] for i in range(0,len(node_lst))]


    dict_lst = []
    encoding_count = '0'
    for i in k[::-1]:
        #print(f'Left = {i.left.char}, Middle = {i.middle.char}, Right = {i.right.char}')
        if i.left.char =='' and i.right.char == '' and i.middle.char == '':
            pass
        elif i.left.char == '':
            encoding_count = str(int(encoding_count)+1)
            if i.middle.char == '':
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.right.char, int(encoding_count)))
            elif i.right.char == '':
                dict_lst.append((i.middle.char, int(encoding_count)))
            else:
                dict_lst.append((i.middle.char, int(encoding_count)))
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.right.char, int(encoding_count)))
                
        elif i.middle.char == '':
            if i.left.char == '':
                encoding_count = str(int(encoding_count)+1)
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.right.char, int(encoding_count)))
            elif i.right.char == '':
                dict_lst.append((i.left.char, int(encoding_count)))
            else:
                dict_lst.append((i.left.char, int(encoding_count)))
                encoding_count = str(int(encoding_count)+1)
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.right.char, int(encoding_count)))
        
        elif i.right.char == '':
            if i.left.char == '':
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.middle.char, int(encoding_count)))
            elif i.middle.char == '':
                dict_lst.append((i.left.char, int(encoding_count)))
            else:
                dict_lst.append((i.left.char, int(encoding_count)))
                encoding_count = str(int(encoding_count)+1)
                dict_lst.append((i.middle.char, int(encoding_count)))
        
        else:
            dict_lst.append((i.left.char, int(encoding_count)))
            encoding_count = str(int(encoding_count)+1)

            dict_lst.append((i.middle.char, int(encoding_count)))
            encoding_count = str(int(encoding_count)+1)
            dict_lst.append((i.right.char, int(encoding_count)))

        
        encoding_count += '0'
    
    return dict_lst



In [386]:
def traverse_tree_helper(parent_node, encoding_count = '0'):
    dict_freq = []
    curr_parent = parent_node
    while curr_parent.left != None and curr_parent.middle != None and curr_parent.right != None:
        #If parent left node is leaf update list to include char and encoding
        if curr_parent.left.left == None and curr_parent.left.middle == None and curr_parent.left.right == None:
            dict_freq.append((curr_parent.left.char, int(encoding_count)))
            encoding_count = str(int(encoding_count)+1)
        else:
            next_node = 'left'

        #If parent middle node is leaf update list to include char and encoding
        if curr_parent.middle.left == None and curr_parent.middle.middle == None and curr_parent.middle.right == None:
            dict_freq.append((curr_parent.middle.char, int(encoding_count)))
            encoding_count = str(int(encoding_count)+1)
        else:
            next_node = 'mid'
    

        #If parent right node is leaf update list to include char and encoding
        if curr_parent.right.left == None and curr_parent.right.middle == None and curr_parent.right.right == None:
            dict_freq.append((curr_parent.right.char, int(encoding_count)))
            encoding_count = str(int(encoding_count)+1)
        else:
            next_node = 'right'
        
        encoding_count += '0'
        
        if next_node == 'left':
            curr_parent = curr_parent.left
    
        elif next_node == 'mid':
            curr_parent = curr_parent.middle
        elif next_node == 'right':
            curr_parent = curr_parent.right
        else:
            break
    
    return_dict = {}
    for i in dict_freq[:-1]:
        return_dict[i[0]] = i[1]   

    
    return return_dict

        
    


In [354]:
traverse_tree_helper(huffman(char_freq)[0][0])

{'a': 0, 'd': 1, 'e': 20, 'f': 22, 'c': 220, 'b': 221}

In [71]:
def huffman(char_freq):
    #Test usage
    #new_heap = []

    if len(char_freq) % 2 == 0:
        sorted_vals = char_freq.most_common() + [('@',0)]
    else:
        sorted_vals = char_freq.most_common()

    ## Create list of frequencies
    Q = [n[1] for n in sorted_vals]

    ##Create list of characters
    C = [n[0] for n in sorted_vals]

    #First three leaf nodes of three least freqent
    new_node = HuffNode(C[-1], Q[-1])
    new_node2 = HuffNode(C[-2], Q[-2])
    new_node3 = HuffNode(C[-3], Q[-3])

    C.pop(-1)
    C.pop(-1)
    C.pop(-1)

    Q.pop(-1)
    Q.pop(-1)
    Q.pop(-1)
    #Parent node for three least frequent characters
    new_parent = HuffNode(new_node.char +new_node2.char + new_node3.char, new_node.freq + new_node2.freq + new_node3.freq)

    #Update parent children
    new_parent.left = new_node3
    new_parent.middle = new_node2
    new_parent.right = new_node

    #Update leaf node parent
    new_node.parent = new_node2.parent = new_node3.parent = new_parent

    #initilaize variable equal to the new parent node
    curr_parent = new_parent


    C.append(new_parent.char)
    Q.append(new_parent.freq)
    #C.sort(reverse=True)
    Q.sort(reverse = True)
    
    #Test usage
    #new_heap.append((new_node.char, new_node2.char, new_node3.char, new_parent.char))

    for i in range(len(C)-4,0,-1):
        #Create new nodes for the next two leaf nodes
        leaf_node1 = HuffNode(C[i], Q[i])
        leaf_node2 = HuffNode(C[i-1], Q[i-1])

        C.pop(-1)

        #Update new parent node for next 2 least freqeunt leaf nodes and previous parent node
        new_parent1 = HuffNode(leaf_node1.char + leaf_node2.char + curr_parent.char, leaf_node1.freq + leaf_node2.freq + curr_parent.freq)

        #Update parent children 
        new_parent1.left = leaf_node2
        new_parent1.middle = leaf_node1
        new_parent1.right = curr_parent

        #Update parent for children 
        leaf_node1.parent = leaf_node2.parent = curr_parent.parent = new_parent1

        #Set current parent for next interation
        curr_parent = new_parent1

        C.append(new_parent1.char)
        Q.append(new_parent1.freq)
        #C.sort(reverse=True)
        Q.sort(reverse = True)
        #Test usage
        # new_heap.append((leaf_node1.char, leaf_node2.char, new_parent1.char, curr_parent.char))

    #Initilaize encoding value
    encoding_count = '0'

    #list to store char and encoding
    dict_freq = []

    #loop to traverse tree
    while curr_parent.left != None and curr_parent.middle != None and curr_parent.right.right != None:

        #If parent left node is leaf update list to include char and encoding
        if curr_parent.left.left == None and curr_parent.left.middle == None and curr_parent.left.right == None:
            dict_freq.append((curr_parent.left.char, int(encoding_count)))

        #After left node, middle node encoding +1
        encoding_count = str(int(encoding_count)+1)

        #If parent middle node is leaf update list to include char and encoding
        if curr_parent.middle.left == None and curr_parent.middle.middle == None and curr_parent.middle.right == None:
            dict_freq.append((curr_parent.middle.char, int(encoding_count)))

        #After middle node, right node encoding +1
        encoding_count = str(int(encoding_count)+1)

        #If parent right node is leaf update list to include char and encoding
        if curr_parent.right.left == None and curr_parent.right.middle == None and curr_parent.right.right == None:
            dict_freq.append((curr_parent.right.char, int(encoding_count)))
    
        #Format of tree means right node always next parent
        curr_parent = curr_parent.right

        #Each level of tree adds a bit of information: '0', '1', '2', '20',...
        encoding_count += '0'

    #Last parent node not looping hard fix
    dict_freq.append((curr_parent.left.char, int(encoding_count)))
    encoding_count = str(int(encoding_count)+1)
    dict_freq.append((curr_parent.middle.char, int(encoding_count)))
    encoding_count = str(int(encoding_count)+1)
    
    #Dictionary to return
    return_dict = {}

    #loop to create keys of char and values of frequencies
    for i in dict_freq:
        return_dict[i[0]] = i[1]

    #Return dictionary
    return return_dict

In [178]:
def huffman(char_freq):
    #Test usage
    #new_heap = []

    if len(char_freq) % 2 == 0:
        sorted_vals = char_freq.most_common() + [('@',0)]
    else:
        sorted_vals = char_freq.most_common()

    ## Create list of frequencies
    Q = [n[1] for n in sorted_vals]

    ##Create list of characters
    C = [n[0] for n in sorted_vals]

    #First three leaf nodes of three least freqent
    new_node = HuffNode(C[-1], Q[-1])
    new_node2 = HuffNode(C[-2], Q[-2])
    new_node3 = HuffNode(C[-3], Q[-3])

    C.pop(-1)
    C.pop(-1)
    C.pop(-1)

    Q.pop(-1)
    Q.pop(-1)
    Q.pop(-1)
    #Parent node for three least frequent characters
    new_parent = HuffNode('', new_node.freq + new_node2.freq + new_node3.freq)

    #Update parent children
    new_parent.left = new_node3
    new_parent.middle = new_node2
    new_parent.right = new_node

    #Update leaf node parent
    new_node.parent = new_node2.parent = new_node3.parent = new_parent

    #initilaize variable equal to the new parent node
    curr_parent = new_parent


    Q.append(new_parent.freq)
    Q.sort(reverse = True)

    i=0
    while Q[i] != new_parent.freq:
        i+=1
    C.insert(i, new_parent)
    
    #Test usage
    #new_heap.append((new_node.char, new_node2.char, new_node3.char, new_parent.char))

    while len(C) != 1:
        #Create new nodes for the next two leaf nodes
        node_lst = []
        for i in range(-1,-4,-1):
            if type(C[i]) != str:
                node_lst.insert(-1,C[i])
            else:
                node_lst.append(HuffNode(C[i],Q[i]))
        if type(C[-1]) != str:
            leaf_node1 = C[-1]
            leaf_node2 = HuffNode(C[-2], Q[-2])
            leaf_node3 = HuffNode(C[-3], Q[-3])
        elif type(C[-2]) != str:
            leaf_node1 = C[-2]
            leaf_node2 = HuffNode(C[-1], Q[-1])
            leaf_node3 = HuffNode(C[-3], Q[-3])
        elif type(C[-3]) != str:
            leaf_node1 = C[-3]
            leaf_node2 = HuffNode(C[-2], Q[-2])
            leaf_node3 = HuffNode(C[-1], Q[-1])
        else:
            leaf_node1 = HuffNode(C[-1], Q[-1])
            leaf_node2 = HuffNode(C[-2], Q[-2])
            leaf_node3 = HuffNode(C[-3], Q[-3])

        C.pop(-1)
        C.pop(-1)
        C.pop(-1)
        
        Q.pop(-1)
        Q.pop(-1)
        Q.pop(-1)

        #Update new parent node for next 2 least freqeunt leaf nodes and previous parent node
        new_parent1 = HuffNode('', leaf_node1.freq + leaf_node2.freq + leaf_node3.freq)

        #Update parent children 
        new_parent1.right = leaf_node1
        new_parent1.left = leaf_node3
        new_parent1.middle = leaf_node2

        #Update parent for children 
        leaf_node1.parent = leaf_node2.parent = curr_parent.parent = new_parent1

        #Set current parent for next interation
        curr_parent = new_parent1

        
        Q.append(new_parent1.freq)
        #C.sort(reverse=True)
        Q.sort(reverse = True)

        z=0
        while Q[z] != new_parent1.freq:
            z+=1
        C.insert(z, new_parent1)
        #Test usage
        # new_heap.append((leaf_node1.char, leaf_node2.char, new_parent1.char, curr_parent.char))

    #Initilaize encoding value
    encoding_count = '0'

    #list to store char and encoding
    dict_freq = []

    #loop to traverse tree
    while curr_parent.left != None and curr_parent.middle != None and curr_parent.right.right != None:

        #If parent left node is leaf update list to include char and encoding
        if curr_parent.left.left == None and curr_parent.left.middle == None and curr_parent.left.right == None:
            dict_freq.append((curr_parent.left.char, int(encoding_count)))

        #After left node, middle node encoding +1
        encoding_count = str(int(encoding_count)+1)

        #If parent middle node is leaf update list to include char and encoding
        if curr_parent.middle.left == None and curr_parent.middle.middle == None and curr_parent.middle.right == None:
            dict_freq.append((curr_parent.middle.char, int(encoding_count)))

        #After middle node, right node encoding +1
        encoding_count = str(int(encoding_count)+1)

        #If parent right node is leaf update list to include char and encoding
        if curr_parent.right.left == None and curr_parent.right.middle == None and curr_parent.right.right == None:
            dict_freq.append((curr_parent.right.char, int(encoding_count)))
    
        #Format of tree means right node always next parent
        curr_parent = curr_parent.right

        #Each level of tree adds a bit of information: '0', '1', '2', '20',...
        encoding_count += '0'

    #Last parent node not looping hard fix
    dict_freq.append((curr_parent.left.char, int(encoding_count)))
    encoding_count = str(int(encoding_count)+1)
    dict_freq.append((curr_parent.middle.char, int(encoding_count)))
    encoding_count = str(int(encoding_count)+1)
    
    #Dictionary to return
    return_dict = {}

    #loop to create keys of char and values of frequencies
    for i in dict_freq:
        return_dict[i[0]] = i[1]

    #Return dictionary
    return return_dict

In [212]:
# Q = [48,45,33,22,18,10,0]
# C = ['d','a','f','e','c','b','@']
Q = [n[1] for n in flatland_freq.most_common() + [('@', 0)]]
C = [n[0] for n in flatland_freq.most_common() + [('@', 0)]]

new_node = HuffNode(C[-1], Q[-1])
new_node2 = HuffNode(C[-2], Q[-2])
new_node3 = HuffNode(C[-3], Q[-3])

C.pop(-1)
C.pop(-1)
C.pop(-1)

Q.pop(-1)
Q.pop(-1)
Q.pop(-1)

#Parent node for three least frequent characters
new_parent = HuffNode('', new_node.freq + new_node2.freq + new_node3.freq)

#Update parent children
new_parent.left = new_node3
new_parent.middle = new_node2
new_parent.right = new_node

#Update leaf node parent
new_node.parent = new_node2.parent = new_node3.parent = new_parent

#initilaize variable equal to the new parent node
curr_parent = new_parent

Q.append(new_parent.freq)
Q.sort(reverse = True)

i=0
while Q[i] != new_parent.freq:
    i+=1
C.insert(i, new_parent)

if type(C[-1]) != str:
    leaf_node1 = C[-1]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-2]) != str:
    leaf_node1 = C[-2]
    leaf_node2 = HuffNode(C[-1], Q[-1])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-3]) != str:
    leaf_node1 = C[-3]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-1], Q[-1])
else:
    leaf_node1 = HuffNode(C[-1], Q[-1])
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])

C.pop(-1)
C.pop(-1)
C.pop(-1)

Q.pop(-1)
Q.pop(-1)
Q.pop(-1)

#Update new parent node for next 2 least freqeunt leaf nodes and previous parent node
new_parent1 = HuffNode('', leaf_node1.freq + leaf_node2.freq + leaf_node3.freq)

#Update parent children 
new_parent1.right = leaf_node1
new_parent1.left = leaf_node3
new_parent1.middle = leaf_node2

#Update parent for children 
leaf_node1.parent = leaf_node2.parent = curr_parent.parent = new_parent1

#Set current parent for next interation
curr_parent = new_parent1


Q.append(new_parent1.freq)
#C.sort(reverse=True)
Q.sort(reverse = True)

z=0
while Q[z] != new_parent1.freq:
    z+=1
C.insert(z, new_parent1)
        #Test usage
        
if type(C[-1]) != str:
    leaf_node1 = C[-1]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-2]) != str:
    leaf_node1 = C[-2]
    leaf_node2 = HuffNode(C[-1], Q[-1])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-3]) != str:
    leaf_node1 = C[-3]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-1], Q[-1])
else:
    leaf_node1 = HuffNode(C[-1], Q[-1])
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])

C.pop(-1)
C.pop(-1)
C.pop(-1)

Q.pop(-1)
Q.pop(-1)
Q.pop(-1)

#Update new parent node for next 2 least freqeunt leaf nodes and previous parent node
new_parent1 = HuffNode('', leaf_node1.freq + leaf_node2.freq + leaf_node3.freq)

#Update parent children 
new_parent1.right = leaf_node1
new_parent1.left = leaf_node3
new_parent1.middle = leaf_node2

#Update parent for children 
leaf_node1.parent = leaf_node2.parent = curr_parent.parent = new_parent1

#Set current parent for next interation
curr_parent = new_parent1


Q.append(new_parent1.freq)
#C.sort(reverse=True)
Q.sort(reverse = True)

z=0
while Q[z] != new_parent1.freq:
    z+=1
C.insert(z, new_parent1)
        #Test usage
Q,C


([18666,
  13582,
  11790,
  11460,
  11154,
  10814,
  9785,
  8706,
  7955,
  6402,
  5686,
  4488,
  4438,
  3986,
  3694,
  3125,
  2810,
  2702,
  2665,
  2588,
  2148],
 ['e',
  't',
  'o',
  'a',
  'i',
  'n',
  's',
  'r',
  'h',
  'l',
  'd',
  'u',
  'c',
  'm',
  'f',
  'y',
  'g',
  <__main__.HuffNode at 0x1142a1390>,
  'w',
  'p',
  'b'])

In [217]:
if type(C[-1]) != str:
    leaf_node1 = C[-1]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-2]) != str:
    leaf_node1 = C[-2]
    leaf_node2 = HuffNode(C[-1], Q[-1])
    leaf_node3 = HuffNode(C[-3], Q[-3])
elif type(C[-3]) != str:
    leaf_node1 = C[-3]
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-1], Q[-1])
else:
    leaf_node1 = HuffNode(C[-1], Q[-1])
    leaf_node2 = HuffNode(C[-2], Q[-2])
    leaf_node3 = HuffNode(C[-3], Q[-3])

C.pop(-1)
C.pop(-1)
C.pop(-1)

Q.pop(-1)
Q.pop(-1)
Q.pop(-1)

#Update new parent node for next 2 least freqeunt leaf nodes and previous parent node
new_parent1 = HuffNode('', leaf_node1.freq + leaf_node2.freq + leaf_node3.freq)

#Update parent children 
new_parent1.right = leaf_node1
new_parent1.left = leaf_node3
new_parent1.middle = leaf_node2

#Update parent for children 
leaf_node1.parent = leaf_node2.parent = curr_parent.parent = new_parent1

#Set current parent for next interation
curr_parent = new_parent1


Q.append(new_parent1.freq)
#C.sort(reverse=True)
Q.sort(reverse = True)

z=0
while Q[z] != new_parent1.freq:
    z+=1
C.insert(z, new_parent1)
        #Test usage
Q,C

([18666,
  16576,
  13582,
  12118,
  11790,
  11460,
  11154,
  10814,
  9785,
  8706,
  8637,
  7955,
  7401],
 ['e',
  <__main__.HuffNode at 0x113fa5210>,
  't',
  <__main__.HuffNode at 0x113fa6e50>,
  'o',
  'a',
  'i',
  'n',
  's',
  'r',
  <__main__.HuffNode at 0x113fa7090>,
  'h',
  <__main__.HuffNode at 0x113f51690>])

In [196]:
dict_freq = []
encoding_count = '0'

In [199]:

#If parent left node is leaf update list to include char and encoding
if curr_parent.left.left == None and curr_parent.left.middle == None and curr_parent.left.right == None:
    dict_freq.append((curr_parent.left.char, int(encoding_count)))

#After left node, middle node encoding +1
encoding_count = str(int(encoding_count)+1)

#If parent middle node is leaf update list to include char and encoding
if curr_parent.middle.left == None and curr_parent.middle.middle == None and curr_parent.middle.right == None:
    dict_freq.append((curr_parent.middle.char, int(encoding_count)))

#After middle node, right node encoding +1
encoding_count = str(int(encoding_count)+1)

#If parent right node is leaf update list to include char and encoding
if curr_parent.right.left == None and curr_parent.right.middle == None and curr_parent.right.right == None:
    dict_freq.append((curr_parent.right.char, int(encoding_count)))

#Format of tree means right node always next parent
curr_parent = curr_parent.right

#Each level of tree adds a bit of information: '0', '1', '2', '20',...
encoding_count += '0'

dict_freq, encoding_count, curr_parent.char
# #Last parent node not looping hard fix
# dict_freq.append((curr_parent.left.char, int(encoding_count)))
# encoding_count = str(int(encoding_count)+1)
# dict_freq.append((curr_parent.middle.char, int(encoding_count)))
# encoding_count = str(int(encoding_count)+1)

# #Dictionary to return
# return_dict = {}

# #loop to create keys of char and values of frequencies
# for i in dict_freq:
#     return_dict[i[0]] = i[1]

([('a', 0),
  ('d', 1),
  ('f', 20),
  ('e', 21),
  ('c', 220),
  ('b', 221),
  ('@', 222)],
 '2220',
 '@')

In [219]:
char_freq = Counter({'a': 45, 'b': 10, 'c': 18, 'd': 48, 'e': 22, 'f': 33})
huffman(char_freq)


{'a': 0, 'd': 1, 'f': 20, 'e': 21, 'c': 220, 'b': 221}

In [104]:
flatland_freq

Counter({'e': 18666,
         't': 13582,
         'o': 11790,
         'a': 11460,
         'i': 11154,
         'n': 10814,
         's': 9785,
         'r': 8706,
         'h': 7955,
         'l': 6402,
         'd': 5686,
         'u': 4488,
         'c': 4438,
         'm': 3986,
         'f': 3694,
         'y': 3125,
         'g': 2810,
         'w': 2665,
         'p': 2588,
         'b': 2148,
         'v': 1418,
         'k': 502,
         'x': 379,
         'q': 193,
         'j': 125,
         'z': 85})

In [218]:
huffman(flatland_freq)

{<__main__.HuffNode at 0x113d5c890>: 0,
 <__main__.HuffNode at 0x113d5da50>: 1,
 'o': 20,
 'a': 21}

### Flatland Encodings
Call `huffman(flatland_freq)` and store the result in `flatland_huffman_codes`.

In [31]:
dict1 = huffman(flatland_freq)
sum2 = 0
for i in range(len(dict1)):
    sum2 += len(str(dict1[C[i]])) * Q[i]
sum2
    

592314

In [23]:
len(str(dict1[C[-2]]))

13

In [153]:
dict1[C[0]]

0

In [170]:
flatland_huffman_codes = huffman(flatland_freq)
flatland_huffman_codes

{'e': 0,
 't': 1,
 'o': 20,
 'a': 21,
 'i': 220,
 'n': 221,
 's': 2220,
 'r': 2221,
 'h': 22220,
 'l': 22221,
 'd': 222220,
 'u': 222221,
 'c': 2222220,
 'm': 2222221,
 'f': 22222220,
 'y': 22222221,
 'g': 222222220,
 'w': 222222221,
 'p': 2222222220,
 'b': 2222222221,
 'v': 22222222220,
 'k': 22222222221,
 'x': 222222222220,
 'q': 222222222221,
 'j': 2222222222220,
 'z': 2222222222221}

**Calculate the number of ternary digits needed** to encode the letters in *Flatland* (converted to lower case) using `flatland_huffman_codes`. Store the result in `flatland_digit_ct_huffman`.

In [116]:
flatland_digit_ct_huffman = 592314