## Problem 1: Least Recently Used (LRU) Cache

We have briefly discussed caching as part of a practice problem while studying hash maps.

The lookup operation (i.e., `get()`) and `put()` / `set()` is supposed to be fast for a cache memory.

While doing the `get()` operation, if the entry is found in the cache, it is known as a `cache hit`. If, however, the entry is not found, it is known as a `cache miss`.

When designing a cache, we also place an upper bound on the size of the cache. If the cache is full and we want to add a new entry to the cache, we use some criteria to remove an element. After removing an element, we use the `put()` operation to insert the new element. The remove operation should also be fast.

For our first problem, the goal will be to design a data structure known as a **Least Recently Used (LRU) cache**. An LRU cache is a type of cache in which we remove the least recently used entry when the cache memory reaches its limit. For the current problem, consider both `get` and `set` operations as an `use operation`.

Your job is to use an appropriate data structure(s) to implement the cache.

- In case of a `cache hit`, your `get()` operation should return the appropriate value.
- In case of a `cache miss`, your `get()` should return -1.
- While putting an element in the cache, your `put()` / `set()` operation must insert the element. If the cache is full, you must write code that removes the least recently used entry first and then insert the element.

All operations must take `O(1)` time.

For the current problem, you can consider the `size of cache = 5`.

Here is some boiler plate code and some example test cases to get you started on this problem:

In [17]:
from typing import Tuple
import itertools
import warnings


class Node:
    _id_generator = itertools.count()

    def __init__(self, prev: 'Node', next: 'Node', key: int, value: int):
        self.id: int = self._id_generator.__next__()
        self.prev = prev
        self.next = next
        self.key = key
        self.value = value

    def __repr__(self):
        return f'Node (id: {self.id}, prev: {self.prev.id if type(self.prev) is type(self) else None}, next: {self.next.id if type(self.next) is type(self) else None}, {{{self.key}: {self.value}}})'

    def __eq__(self, other):
        return (
            self.prev is other.prev and
            self.next is other.next and
            self.key == other.key and
            self.value == other.value
        )

class DoublyLinkedCircularList:

    def __init__(self):
        self.size: int = 0
        self.root: Node = Node(None, None, None, None)
        self.root.prev = self.root
        self.root.next = self.root

    def __repr__(self):
        s = '['
        link = self.root
        while True:
            s += f'{link}'
            link = link.next
            if link is self.root:
                break
            else:
                s += ',\n'
        s += ']'
        return s

    def __len__(self):
        return self.size

    def add_item(self, key: int, value: int) -> Node:
        if self.size == 0:
            self.root.key = key
            self.root.value = value
            link = self.root
        else:
            root = self.root
            last = root.prev
            link = Node(last, root, key, value)
            last.next = root.prev = link
        self.size += 1

        return link

    def replace_oldest_node(self, key: int, value: int) -> Tuple[Node, Node]:
        old_link, last, self.root = self.root, self.root.prev, self.root.next
        new_link = Node(last, self.root, key, value)
        last.next, self.root.prev = new_link, new_link
        return old_link, new_link

    def update_node(self, link: Node):
        # in a circular linked list the last item precedes root
        last = self.root.prev
        # no need to update
        if link is last:
            return

        # edge case -> new root node
        if link is self.root:
            self.root = self.root.next

        # connect old neighbours of the link (bridge gap)
        # necessary if updated link is NOT root
        link.prev.next = link.next
        link.next.prev = link.prev
        
        # put updated node at the end of the list
        # update new neighbours to point to link (insert link)
        last.next = link
        self.root.prev = link

        # update link to point to new neighbours
        link.next = self.root
        link.prev = last
    

class LRU_Cache:

    def _cache_capacity_warning(self):
        warnings.warn('Cache capacity is 0. No items will be stored.', UserWarning)

    def __init__(self, capacity: int):
        self.cache = {}
        self.list = DoublyLinkedCircularList()
        self.cap = capacity

        if self.cap == 0:
            self._cache_capacity_warning()

    def get(self, key: int) -> int:
        # Retrieve item from provided key. Return -1 if nonexistent.
        if self.cap == 0:
            self._cache_capacity_warning() 
        result = self.cache.get(key, None)
        if result is not None:
            self.list.update_node(result)
            return result.value
        return -1

    def set(self, key: int, value: int):
        # Set the value if the key is not present in the cache. If the cache is at capacity remove the oldest item. 
        if self.cap == 0:
            self._cache_capacity_warning()
            return

        if key in self.cache:
            link = self.cache.get(key)
            self.list.update_node(link)
            link.value = value
        elif self.cache_full:
            old_link, new_link = self.list.replace_oldest_node(key, value)
            del self.cache[old_link.key]
            del old_link
            self.cache[key] = new_link
        else:
            link = self.list.add_item(key, value)
            self.cache[key] = link

    @property
    def cache_full(self):
        return self.list.size == self.cap


In [18]:
import unittest


class TestNode(unittest.TestCase):

    def setUp(self):
        self.key = 1
        self.value = 2

        self.node1 = Node(None, None, None, None)
        self.node2 = Node(self.node1, None, None, None)
        self.node3 = Node(self.node2, self.node1, self.key, self.value)

    def test_create_empty_node(self):
        # require arguments for Node()
        self.assertRaises(TypeError, Node)
        
    def test_node_link(self):
        self.node1.prev = self.node3
        self.node1.next = self.node2
        self.node2.next = self.node3

        self.assertIs(self.node2.next, self.node3)
        self.assertIs(self.node1.next, self.node2)
        self.assertIs(self.node3.next, self.node1)

        self.assertIs(self.node2.prev, self.node1)
        self.assertIs(self.node1.prev, self.node3)
        self.assertIs(self.node3.prev, self.node2)

    def test_node_key_value(self):
        self.assertEqual(self.node3.key, self.key)
        self.assertEqual(self.node3.value, self.value)

    def test_node_repr(self):
        self.assertEqual(repr(self.node1), f"Node (id: {self.node1.id}, prev: None, next: None, {{None: None}})")
        self.assertEqual(repr(self.node2), f"Node (id: {self.node2.id}, prev: {self.node1.id}, next: None, {{None: None}})")
        self.assertEqual(repr(self.node3), f"Node (id: {self.node3.id}, prev: {self.node2.id}, next: {self.node1.id}, {{{self.key}: {self.value}}})")


class TestCircularDoublyLinkedList(unittest.TestCase):

    def setUp(self):
        self.list = DoublyLinkedCircularList()
        self.list.add_item(1, 1)
        self.list.add_item(2, 2)
        self.list.add_item(3, 3)
    
    def test_init(self):
        list = DoublyLinkedCircularList()
        self.assertIs(list.root, list.root.prev)
        self.assertIs(list.root, list.root.next)
        self.assertEqual(list.size, 0)

    def test_add_item(self):
        self.assertEqual(self.list.size, 3)
        node = self.list.add_item(4, 4)
        self.assertIs(type(node), Node)
        self.assertEqual(self.list.size, 4)

    def test_replace_oldest_node(self):
        size = self.list.size

        oldest = self.list.root
        old, new = self.list.replace_oldest_node(4, 4)
        newest = self.list.root.prev
        # check that the removed node was really the oldest
        self.assertIs(old, oldest)
        # check that the new node is really at the "end" of the list
        self.assertIs(new, newest)

        # check that the old node was removed from the list
        node = self.list.root
        while True:
            self.assertIsNot(old, node)
            node = node.next
            if node is self.list.root:
                break
        
        # size should not have changed
        self.assertEqual(self.list.size, size)

    def test_update_node(self):
        """Test node update procedure."""
        root = self.list.root
        last = root.prev
        node = root.next

        node_prev = node.prev
        node_next = node.next

        self.list.update_node(node)

        # Updated node moved to end of list
        self.assertIs(node, root.prev)
        self.assertIs(node, last.next)
        # References of node were updated correctly
        self.assertIs(node.prev, last)
        self.assertIs(node.next, root)
        # Old neighbours were updated correctly
        # Node was inserted at the end, leaving a gap that had to be filled
        self.assertIs(node_prev.next, node_next)
        self.assertIs(node_next.prev, node_prev)


    def test_update_root_node(self):
        """Test edge case where the root node gets updated."""
        root = self.list.root
        last = root.prev
        node = root

        node_prev = node.prev
        node_next = node.next

        self.list.update_node(node)

        # Reference the new root
        self.assertIsNot(root, self.list.root)

        # reassign root
        root = self.list.root

        # Updated node moved to end of list
        self.assertIs(node, root.prev)
        self.assertIs(node, last.next)
        # References of node were updated correctly
        self.assertIs(node.prev, last)
        self.assertIs(node.next, root)
        # Old neighbours were updated correctly
        # Order did not change, all nodes just got rotated
        self.assertIs(node_prev.next, node)
        self.assertIs(node_next.prev, node)

        self.assertIsNot(node, root)

    def test_update_last_node(self):
        """Test the edge case where the last node gets updated."""
        root = self.list.root
        last = root.prev
        node = last

        node_prev = node.prev
        node_next = node.next

        self.list.update_node(node)

        # Updated node did not move
        self.assertIs(node, node_prev.next)
        self.assertIs(node, node_next.prev)
        self.assertIs(node, self.list.root.prev)
        self.assertIs(node.next, self.list.root)
        # References of node were not changed
        self.assertIs(node.prev, node_prev)
        self.assertIs(node.next, node_next)


class TestLRU_Cache(unittest.TestCase):

    def setUp(self):
        self.c = LRU_Cache(5)
        self.c.set(1, 1)
        self.c.set(2, 2)
        self.c.set(3, 3)
        self.c.set(4, 4)
        self.c.set(5, 5)
    
    def test_zero_capacity(self):
        """Test edge case with zero cache capacity.
        
        No erros, only warnings should occur.
        """
        c = LRU_Cache(0)
        c.set(1, 1)

        self.assertWarns(UserWarning, LRU_Cache, 0)
        self.assertWarns(UserWarning, c.get, 1)
        self.assertWarns(UserWarning, c.set, 1, 1)
        self.assertEqual(c.get(1), -1)

    def test_get(self):
        self.assertEqual(self.c.get(1), 1)
        self.assertEqual(self.c.get(2), 2)
        self.assertEqual(self.c.get(3), 3)
        self.assertEqual(self.c.get(4), 4)
        self.assertEqual(self.c.get(5), 5)

    def test_get_changes_lru_order(self):
        self.c.get(1)
        self.c.get(2)
        order = [3, 4, 5, 1, 2]
        lru = []
        node = self.c.list.root
        for i in range(5):
            lru.append(node.value)
            node = node.next
        self.assertEqual(lru, order)

    def test_lru_remove_least_used(self):
        self.assertEqual(self.c.cache.get(1).value, 1)
        self.c.set(6, 6)
        self.assertEqual(self.c.cache.get(1), None)
        self.assertEqual(self.c.get(1), -1)

    def test_set(self):
        self.c.set(7, 7)
        self.assertEqual(self.c.get(7), 7)
        self.c.set(7, 8)
        self.assertEqual(self.c.get(7), 8)

    def test_set_existing_key_use_operation(self):
        self.c.set(1, 1)
        last_used_key = self.c.list.root.prev.key
        self.assertEqual(last_used_key, 1)
    
    def test_get_existing_key_use_operation(self):
        self.c.get(1)
        last_used_key = self.c.list.root.prev.key
        self.assertEqual(last_used_key, 1)


# if __name__ == '__main__':
#     unittest.main()

# Run unittest in Jupyterlab
unittest.main(argv=[''], verbosity=2, exit=False)

test_add_item (__main__.TestCircularDoublyLinkedList) ... ok
test_init (__main__.TestCircularDoublyLinkedList) ... ok
test_replace_oldest_node (__main__.TestCircularDoublyLinkedList) ... ok
test_update_last_node (__main__.TestCircularDoublyLinkedList)
Test the edge case where the last node gets updated. ... ok
test_update_node (__main__.TestCircularDoublyLinkedList)
Test node update procedure. ... ok
test_update_root_node (__main__.TestCircularDoublyLinkedList)
Test edge case where the root node gets updated. ... ok
test_get (__main__.TestLRU_Cache) ... ok
test_get_changes_lru_order (__main__.TestLRU_Cache) ... ok
test_get_existing_key_use_operation (__main__.TestLRU_Cache) ... ok
test_lru_remove_least_used (__main__.TestLRU_Cache) ... ok
test_set (__main__.TestLRU_Cache) ... ok
test_set_existing_key_use_operation (__main__.TestLRU_Cache) ... ok
test_zero_capacity (__main__.TestLRU_Cache)
ok
test_create_empty_node (__main__.TestNode) ... ok
test_node_key_value (__main__.TestNode) ... o

<unittest.main.TestProgram at 0x7f1ec4470e50>

In [157]:
our_cache = LRU_Cache(5)

our_cache.set(1, 1)
our_cache.set(2, 2)
our_cache.set(3, 3)
our_cache.set(4, 4)
print('Initial Cache:', our_cache.list, sep='\n')

print(our_cache.get(1))       # returns 1
print(our_cache.get(2))       # returns 2
print(our_cache.get(9))       # returns -1 because 9 is not present in the cache

our_cache.set(5, 5)
our_cache.set(6, 6)

print(our_cache.get(3))      # returns -1 because the cache reached it's capacity and 3 was the least recently used entry
print(our_cache.get(6))      # returns 6
print(our_cache.get(5))      # returns 5
our_cache.set(6, 6)          # update node 6
print('Final Cache:', our_cache.list, sep='\n')

Initial Cache:
[Node (id: 35, prev: 38, next: 36, {1: 1}),
Node (id: 36, prev: 35, next: 37, {2: 2}),
Node (id: 37, prev: 36, next: 38, {3: 3}),
Node (id: 38, prev: 37, next: 35, {4: 4})]
1
2
-1
-1
6
5
Final Cache:
[Node (id: 38, prev: 40, next: 35, {4: 4}),
Node (id: 35, prev: 38, next: 36, {1: 1}),
Node (id: 36, prev: 35, next: 39, {2: 2}),
Node (id: 39, prev: 36, next: 40, {5: 5}),
Node (id: 40, prev: 39, next: 38, {6: 6})]


## Problem 2: File Recursion

For this problem, the goal is to write code for finding all files under a directory (and all directories beneath it) that end with `.c`

Here is an example of a test directory listing, which can be downloaded [here](https://s3.amazonaws.com/udacity-dsand/testdir.zip):

```shell
./testdir
./testdir/subdir1
./testdir/subdir1/a.c
./testdir/subdir1/a.h
./testdir/subdir2
./testdir/subdir2/.gitkeep
./testdir/subdir3
./testdir/subdir3/subsubdir1
./testdir/subdir3/subsubdir1/b.c
./testdir/subdir3/subsubdir1/b.h
./testdir/subdir4
./testdir/subdir4/.gitkeep
./testdir/subdir5
./testdir/subdir5/a.c
./testdir/subdir5/a.h
./testdir/t1.c
./testdir/t1.h
```

Note: `os.walk()` is a handy Python method which can achieve this task very easily. However, for this problem you are not allowed to use `os.walk()`.


In [66]:
import os


def find_files(suffix, path):
    """
    Find all files beneath path with file name suffix.

    Note that a path may contain further subdirectories
    and those subdirectories may also contain further subdirectories.

    There are no limit to the depth of the subdirectories can be.

    Args:
      suffix(str): suffix of the file name to be found
        valid forms are e.g. '.doc' and 'doc'
      path(str): path of the file system

    Returns:
       a list of paths
    """
    matched_files = []

    match suffix:
        case "":
            return "ValueError: suffix can't be an empty string!"
        case None:
            return "ValueError: suffix can't be None!"
        case ".":
            return "ValueError: '.' is not a valid suffix!"
        case str(p) if p.find(".") > 0:
            return f"ValueError: '{suffix}' is not a valid suffix!"

    match path:
        case "":
            return "ValueError: path can't be an empty string!"
        case None:
            return "ValueError: path can't be  None!"
        case str(p) if not os.path.isdir(p):
            return "ValueError: path must be a directory!"

    # remove '.' at the beginning
    suffix = suffix[1:] if suffix[0] == '.' else suffix

    for child in os.listdir(path):
        child_path = os.path.join(path, child)

        if os.path.isfile(child_path):
            
            if child_path.split('.')[-1] == suffix:
                matched_files.append(child_path)

        if os.path.isdir(child_path):
            matched_files.extend(find_files(suffix, child_path))

    return matched_files


print(find_files('.c', './'))
# ['./testdir/subdir1/a.c', './testdir/t1.c', './testdir/subdir5/a.c', './testdir/subdir3/subsubdir1/b.c']
print(find_files('.', './'))
# ValueError: '.' is not a valid suffix!
print(find_files('.c', './dir-does-not-exist'))
# ValueError: path must be a directory!
print(find_files('.gitkeep', './'))
# ['./testdir/subdir2/.gitkeep', './testdir/subdir4/.gitkeep']
print(find_files('keep', './'))
# []
print(find_files('a.c', './'))
# ValueError: 'a.c' is not a valid suffix!


['./testdir/subdir1/a.c', './testdir/t1.c', './testdir/subdir5/a.c', './testdir/subdir3/subsubdir1/b.c']
ValueError: '.' is not a valid suffix!
ValueError: path must be a directory!
['./testdir/subdir2/.gitkeep', './testdir/subdir4/.gitkeep']
[]
ValueError: 'a.c' is not a valid suffix!


## Problem 3: Huffmann Coding

### Overview - Data Compression

In general, a data compression algorithm reduces the amount of memory (bits) required to represent a message (data). The compressed data, in turn, helps to reduce the transmission time from a sender to receiver. The sender encodes the data, and the receiver decodes the encoded data. As part of this problem, you have to implement the logic for both encoding and decoding.

A data compression algorithm could be either **_lossy_** or **_lossless_**, meaning that when compressing the data, there is a loss (lossy) or no loss (lossless) of information. The **Huffman Coding** is a _lossless_ data compression algorithm. Let us understand the two phases - encoding and decoding with the help of an example.

### A. Huffman Encoding

Assume that we have a string message `AAAAAAABBBCCCCCCCDDEEEEEE` comprising of 25 characters to be encoded. The string message can be an unsorted one as well. We will have two phases in encoding - building the Huffman tree (a binary tree), and generating the encoded data. The following steps illustrate the Huffman encoding:

#### **Phase I - Build the Huffman Tree**  

A Huffman tree is built in a bottom-up approach.

1.  First, determine the frequency of each character in the message. In our example, the following table presents the frequency of each character.

| (Unique) Character | Frequency |
|---|---|
| A | 7 |
| B | 3 |
| C | 7 |
| D | 2 |
| E | 6 |


2.  Each row in the table above can be represented as a _node_ having a character, frequency, left child, and right child. In the next step, we will repeatedly require to pop-out the node having the lowest frequency. Therefore, build and sort a _list_ of nodes in the order lowest to highest frequencies. Remember that a _list_ preserves the order of elements in which they are appended.
    
    We would need our _list_ to work as a **[priority queue](https://en.wikipedia.org/wiki/Priority_queue)**, where a node that has lower frequency should have a higher priority to be popped-out. The following snapshot will help you visualize the example considered above:
    

![](img/screenshot-2020-04-27-at-5.15.56-pm.png)

> Can you come up with other data structures to create a priority queue? How about using a _min-heap_ instead of a list? You are free to choose from anyone.

3.  Pop-out two nodes with the minimum frequency from the _priority queue_ created in the above step.

4.  Create a new node with a frequency equal to the sum of the two nodes picked in the above step. This new node would become an _internal node_ in the Huffman tree, and the two nodes would become the children. The lower frequency node becomes a left child, and the higher frequency node becomes the right child. Reinsert the newly created node back into the priority queue.  
    
    **Do you think that this reinsertion requires the sorting of priority queue again?** If yes, then a _min-heap_ could be a better choice due to the lower complexity of sorting the elements, every time there is an insertion.
    

5.  Repeat steps #3 and #4 until there is a single element left in the priority queue. The snapshots below present the building of a Huffman tree.

![](img/huffman-tree-1.png)

![](img/huffman-tree-2.png)

6.  For each node, in the Huffman tree, assign a bit `0` for left child and a `1` for right child. See the final Huffman tree for our example:

![](img/huffman-tree-3.png)

#### **Phase II - Generate the Encoded Data**  

7.  Based on the Huffman tree, generate unique binary code for each character of our string message. For this purpose, you'd have to traverse the path from root to the leaf node.

| (Unique) Character | Frequency | Huffman Code |
|---|---|---|
| D | 2 | 000 |
| B | 3 | 001 |
| E | 6 | 01 |
| A | 7 | 10 |
| C | 7 | 11 |

> **Points to Notice**  
> 
> -   Notice that the whole code for any character is **_not_** a prefix of any other code. Hence, the Huffman code is called a **_[Prefix code](https://en.wikipedia.org/wiki/Prefix_code)_**.
> -   Notice that the binary code is shorter for the more frequent character, and vice-versa.
> -   The Huffman code is generated in such a way that the entire string message would now require a much lesser amount of memory in binary form.
> -   Notice that each node present in the original _priority queue_ has become a _leaf node_ in the final Huffman tree.

This way, our encoded data would be `1010101010101000100100111111111111111000000010101010101`

### B. Huffman Decoding

Once we have the encoded data, and the (pointer to the root of) Huffman tree, we can easily decode the encoded data using the following steps:

1.  Declare a blank decoded string
2.  Pick a bit from the encoded data, traversing from left to right.
3.  Start traversing the Huffman tree from the root.
    -   If the current bit of encoded data is `0`, move to the left child, else move to the right child of the tree if the current bit is `1`.
    -   If a leaf node is encountered, append the (alphabetical) character of the leaf node to the decoded string.
4.  Repeat steps #2 and #3 until the encoded data is completely traversed.

You will have to implement the logic for both encoding and decoding in the following template. Also, you will need to create the sizing schemas to present a summary.

---

### Visualization Resource

Check this website to visualize the Huffman encoding for any string message - [Huffman Visualization!](https://people.ok.ubc.ca/ylucet/DS/Huffman.html)

In [3]:
#!/usr/bin/env python3.10

import heapq
import sys
from collections import Counter
from typing import Tuple


class Node:
    def __init__(self, char, freq):
        self.char: str | None = char
        self.freq: int = freq
        self.left: Node | None = None
        self.right: Node | None = None

    def __lt__(self, other: "Node") -> bool:
        return other.freq > self.freq


def text_to_heap(text: str) -> list:
    """Convert text to a min-heap."""
    frequencies = dict(Counter(text))
    heap = [Node(key, value) for key, value in frequencies.items()]
    heapq.heapify(heap)
    return heap


def heap_to_tree(heap: list) -> Node:
    """Recursively merge a min-heap into a tree."""
    # edge case empty text:
    if heap == []:
        return Node("", 0)
    if len(heap) > 1:
        child1, child2 = heapq.heappop(heap), heapq.heappop(heap)
        parent = Node(None, child1.freq + child2.freq)
        parent.left, parent.right = child1, child2
        heapq.heappush(heap, parent)
        heap_to_tree(heap)
    return heap[0]


def create_codes(node: Node, code: str = "") -> dict:
    """Create huffman codes."""
    # edge case: single character in text
    if code == "" and node.left is None:
        code = "0"

    codes = {}
    if node.char is not None:
        codes[node.char] = code
    else:
        codes |= create_codes(node.left, code + str(0))
        codes |= create_codes(node.right, code + str(1))
    return codes


def huffman_encoding(data: str) -> Tuple[str, Node]:
    """Encode data with huffman codes."""
    heap = text_to_heap(data)
    tree = heap_to_tree(heap)
    codes = create_codes(tree)
    code = "".join([codes[char] for char in data])
    return code, tree


def huffman_decoding(data: str, tree: Node) -> str:
    """Decode huffman codes to text."""
    # edge case: single character in text
    if tree.left is None:
        return "".join([tree.char for _ in data])

    decoded = ""
    node = tree
    for pos, c in enumerate(data):
        match c:
            case "0":
                node = node.left
            case "1":
                node = node.right
        if node.char:
            decoded += node.char
            decoded += huffman_decoding(data[pos + 1 :], tree)
            break
    return decoded


def huffmann_test(text):
    print("-------------------------------------------------------")
    print(f"----- {text}")
    print("-------------------------------------------------------")
    print("The size of the data is: {}".format(sys.getsizeof(text)))

    encoded_data, tree = huffman_encoding(text)

    if len(encoded_data) == 0:
        print(
            "The size of the encoded data is: {}".format(
                sys.getsizeof(encoded_data)
            )
        )
    else:
        print(
            "The size of the encoded data is: {}".format(
                sys.getsizeof(int(encoded_data, base=2))
            )
        )
    print("The content of the encoded data is: {}".format(encoded_data))

    decoded_data = huffman_decoding(encoded_data, tree)

    print(
        "The size of the decoded data is: {}".format(
            sys.getsizeof(decoded_data)
        )
    )
    print("The content of the encoded data is: {}".format(decoded_data))


if __name__ == "__main__":
    huffmann_test("The bird is the word")
    huffmann_test("")
    huffmann_test("12304905ndylköxykcv 1234uiopxyn.asdf")
    huffmann_test(
        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    )


-------------------------------------------------------
----- The bird is the word
-------------------------------------------------------
The size of the data is: 69
The size of the encoded data is: 36
The content of the encoded data is: 1110111111101010001100110000101100101101101011111101010000111001100001
The size of the decoded data is: 69
The content of the encoded data is: The bird is the word
-------------------------------------------------------
----- 
-------------------------------------------------------
The size of the data is: 49
The size of the encoded data is: 49
The content of the encoded data is: 
The size of the decoded data is: 49
The content of the encoded data is: 
-------------------------------------------------------
----- 12304905ndylköxykcv 1234uiopxyn.asdf
-------------------------------------------------------
The size of the data is: 109
The size of the encoded data is: 48
The content of the encoded data is: 101001101110100110100101110011010111100001011010

## Problem 4: Active Directory

In Windows Active Directory, a group can consist of user(s) and group(s) themselves. We can construct this hierarchy as such. Where User is represented by str representing their ids.

Write a function that provides an efficient look up of whether the user is in a group.


In [174]:
class Group(object):
    def __init__(self, _name):
        self.name = _name
        self.groups = []
        self.users = []

    def add_group(self, group):
        self.groups.append(group)

    def add_user(self, user):
        self.users.append(user)

    def get_groups(self):
        return self.groups

    def get_users(self):
        return self.users

    def get_name(self):
        return self.name


def is_user_in_group(user, group):
    """
    Return True if user is in the group, False otherwise.

    Args:
      user(str): user name/id
      group(class:Group): group to check user membership against
    """
    if user in group.get_users():
        return True
    else:
        for group in group.get_groups():
            return is_user_in_group(user, group)
    return False


parent = Group("parent")
child = Group("child")
sub_child = Group("subchild")

sub_child_user = "sub_child_user"
sub_child.add_user(sub_child_user)

child.add_group(sub_child)
parent.add_group(child)

print(is_user_in_group(sub_child_user, child))
# True
print(is_user_in_group(sub_child_user, parent))
# True
print(is_user_in_group("Some User", sub_child))
# False
print(is_user_in_group("", parent))
# False
print(is_user_in_group(None, parent))
# False

True
True
False
False
False


## Problem 5: Blockchain

A [Blockchain](https://en.wikipedia.org/wiki/Blockchain) is a sequential chain of records, similar to a linked list. Each block contains some information and how it is connected related to the other blocks in the chain. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data. For our blockchain we will be using a [SHA-256](https://en.wikipedia.org/wiki/SHA-2) hash, the [Greenwich Mean Time](https://en.wikipedia.org/wiki/Greenwich_Mean_Time) when the block was created, and text strings as the data.

Use your knowledge of linked lists and hashing to create a blockchain implementation.

![](./img/blockchain.png)

We can break the blockchain down into three main parts.

First is the information hash.

We do this for the information we want to store in the block chain such as transaction time, data, and information like the previous chain.

The next main component is the block on the blockchain.

Finally you need to link all of this together in a block chain, which you will be doing by implementing it in a linked list. All of this will help you build up to a simple but full blockchain implementation!

In [2]:
#!/usr/bin/env python3.10

import hashlib
from datetime import datetime


class Block:
    def __init__(self, data, previous_hash):
        self.timestamp = datetime.utcnow().isoformat()
        self.data = data
        self.previous_hash = previous_hash
        self.hash = self.calc_hash(str(self))
        self.next = None

    def __str__(self):
        return (
            f"{str(self.timestamp)} {str(self.data)} {str(self.previous_hash)}"
        )

    def calc_hash(self, data):
        sha = hashlib.sha256()
        sha.update(data.encode("utf-8"))
        return sha.hexdigest()


class Blockchain:
    def __init__(self):
        self.size = 0
        self.root = None
        self.tail = None

    def __len__(self):
        return self.size

    def __str__(self):
        if self.root is None:
            return "No Blocks."

        blocks = []
        current = self.root
        while current:
            blocks.append(str(current))
            current = current.next

        return f"\n--- Blockchain Size: {self.size} ---\n" + "\n".join(blocks)

    def add_block(self, data):
        self.size += 1

        if self.root is None:
            self.root = Block(data, 0)
            self.tail = self.root
            return

        last = self.tail
        last.next = Block(data, last.hash)


if __name__ == "__main__":
    b = Blockchain()
    print(b)
    # No Blocks.
    b.add_block("Dummy Data1")
    print(b)
    # --- Blockchain Size: 1 ---
    # ...
    b.add_block("Dummy Data2")
    print(b)
    # --- Blockchain Size: 2 ---
    # ...
    print(b.root.hash == b.root.next.previous_hash)
    # True
    print(b.root.previous_hash)
    # 0
    b = Blockchain()
    for x in range(1000000):
        b.add_block(f"Dummy Data{x}")
    print("Created long blockchain (1M). Size:", len(b))
    # Created long blockchain (1M). Size: 1000000


No Blocks.

--- Blockchain Size: 1 ---
2022-07-08T11:18:47.043583 Dummy Data1 0

--- Blockchain Size: 2 ---
2022-07-08T11:18:47.043583 Dummy Data1 0
2022-07-08T11:18:47.043619 Dummy Data2 d6cdfa9cb49ee52b8433cd6a7107655cba5a6a89690dc13725e2c5f77db63926
True
0
Created long blockchain (1M). Size: 1000000


## Problem 6: Union and Intersection

Your task for this problem is to fill out the union and intersection functions. The union of two sets A and B is the set of elements which are in A, in B, or in both A and B. The intersection of two sets A and B, denoted by A ∩ B, is the set of all objects that are members of both the sets A and B.

You will take in two linked lists and return a linked list that is composed of either the union or intersection, respectively. Once you have completed the problem you will create your own test cases and perform your own run time analysis on the code.

We have provided a code template below, you are not required to use it:

In [1]:
#!/usr/bin/env python3.10


class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

    def __repr__(self):
        return str(self.value)

    def __eq__(self, other):
        return self.value == other.value

    def __hash__(self):
        return hash(self.value)


class LinkedList:
    def __init__(self):
        self.head = None
        self.size = 0

    def __str__(self):
        cur_head = self.head
        out_string = ""
        while cur_head:
            out_string += str(cur_head.value) + " -> "
            cur_head = cur_head.next
        return out_string

    def __len__(self):
        return self.size

    def __iter__(self):
        self.current_index = 0
        self.current_node = self.head
        return self

    def __next__(self):
        if self.current_index < len(self):
            node = self.current_node
            self.current_node = self.current_node.next
            self.current_index += 1
            return node
        raise StopIteration

    def append(self, value):
        self.size += 1

        if self.head is None:
            self.head = Node(value)
            return

        node = self.head
        while node.next:
            node = node.next

        node.next = Node(value)


def union(l1: LinkedList, l2: LinkedList) -> LinkedList:
    seen = set()
    result = LinkedList()

    for lst in [l1, l2]:
        for elem in lst:
            if elem not in seen:
                result.append(elem)
                seen.add(elem)

    return result


def intersection(l1: LinkedList, l2: LinkedList) -> LinkedList:
    seen = set()
    result = LinkedList()

    for elem in l1:
        seen.add(elem)
    for elem in l2:
        if elem in seen:
            result.append(elem)
            seen.remove(elem)

    return result


def test(elements1, elements2):
    l1, l2 = LinkedList(), LinkedList()
    for i in elements1:
        l1.append(i)
    for i in elements2:
        l2.append(i)
    print("u: ", union(l1, l2))
    print("i: ", intersection(l1, l2))


if __name__ == "__main__":

    test_case1 = [
        [3, 2, 4, 35, 6, 65, 6, 4, 3, 21],
        [6, 32, 4, 9, 6, 1, 11, 21, 1],
    ]
    # u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 21 -> 32 -> 9 -> 1 -> 11 ->
    # i:  4 -> 6 -> 21 ->

    test_case2 = [[3, 2, 4, 35, 6, 65, 6, 4, 3, 23], [1, 7, 8, 9, 11, 21, 1]]
    # u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 23 -> 1 -> 7 -> 8 -> 9 -> 11 -> 21 ->
    # i:

    test_case3 = [[], [1, 7, 8, 9, 11, 21, 1]]
    # u:  1 -> 7 -> 8 -> 9 -> 11 -> 21 ->
    # i:

    test_case4 = [[3, 2, 4, 35, 6, 65, 6, 4, 3, 23], []]
    # u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 23 ->
    # i:

    test_case5 = [[1, 1, 1], [2, 2, 2]]
    # u:  1 -> 2 ->
    # i:

    test_case6 = [[], []]
    # u:
    # i:

    test(*test_case1)
    test(*test_case2)
    test(*test_case3)
    test(*test_case4)
    test(*test_case5)
    test(*test_case6)


u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 21 -> 32 -> 9 -> 1 -> 11 -> 
i:  6 -> 4 -> 21 -> 
u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 23 -> 1 -> 7 -> 8 -> 9 -> 11 -> 21 -> 
i:  
u:  1 -> 7 -> 8 -> 9 -> 11 -> 21 -> 
i:  
u:  3 -> 2 -> 4 -> 35 -> 6 -> 65 -> 23 -> 
i:  
u:  1 -> 2 -> 
i:  
u:  
i:  
