(data_structures)=
# Data Structures
``` {index} Data Structures
```

*Data structures* allow us to speed our algorithms up often by many orders of magnitude. They arrange the data that is suitable for the problem being solved and usually simplify the workings of the algorithm. In this notebook we will implement some of the most popular data structures in computer science:

## Array

Arrays are lists of objects. In older programming languages arrays were objects of fixed length and type (e.g. list of 10 integers). In Python, arrays can be implemented using `list`s, which provide a litte more flexibility by hosting multiple types and variable lenght. Arrays/lists support the following operation in complexity:

* Set item (at index i smaller than the length): \\(O(1)\\)
* Get item: \\(O(1)\\)
* Deletion is no really supported as we assume that the array is of fixed lenght, for Python lists it is \\(O(n)\\).

Arrays are used in many programming problems where collections of data are used.

In [15]:
arr = [1,2,3]
# constant
arr[2] = 4
# constant
print(arr[1])

2


## Stack

Stack is a structure that follows that **LIFO** (last in, first out) rule. Elements can be `push`ed on top of the stack or `pop`ped of it. Those are the only two operations that this structure supports:

In [18]:
class Stack:
    def __init__(self):
        self.contents = []
    def pop(self):
        if len(self.contents):
            temp = self.contents[-1]
            self.contents = self.contents[:-1]
            return temp
        else:
            print("ERROR: Stack is empty")
    def push(self, elem):
        self.contents.append(elem)

st = Stack()
st.push(1)
print(st.pop())
st.pop()

# Python lists also support these operations, push is called append!

l = [1,2]
print(l.pop())
l.append(3)
print(l)

1
ERROR: Stack is empty
2
[1, 3]


Operations/complexity:
* push: \\(O(1)\\)
* pop: \\(O(1)\\)

One of the popular problems utilizing stacks is checking whether an expression has balanced brackets. Let us limit ourselves to expressions consisting of `[{,},(,)]`. We will push the opening brackets on the stack and pop them when closing bracket is approached. Of course, the types of brackets must match:

In [19]:
def checkIfValidBracketing(expr):
    # our stack :)
    st = []
    for i in expr:
        # not a bracket
        if i not in ["{","}","(",")"]:
            continue
        # push the opening bracket on the stack
        elif i in ["{","("]:
            st.append(i)
        # check when closing bracket approached    
        elif i in ["}",")"]:
            try:
                temp = st.pop()
                if (i == "}" and temp == "{") or (i == ")" and temp == "("):
                    continue
                # invalid bracketing    
                else:
                    return False
            except IndexError:
                return False
    return not len(st)

# Basic testing
print(checkIfValidBracketing("{}()"))    
print(checkIfValidBracketing("{(")) 
print(checkIfValidBracketing("({()})")) 
print(checkIfValidBracketing(")")) 
print(checkIfValidBracketing("")) 

True
False
True
False
True


The following algorithm has \\(O(n)\\) time and space complexity.

## Linked list

Linked lists are the simplest node-based structures. They are inherently recursive in nature. A linked list consist of a `head` pointing to the `next` which itself is a linked list. Base case is an empty list:

```{figure} algo_images/LinkedList.png
:width: 60%
```

A simple implementation would be:

In [39]:
class Node:
    def __init__(self,val):
        self.val = val
        self.next = None
        
class LinkedList:
    def __init__(self):
        self.head = None
        
    # insert at a position index  
    # false is index out of range, true otherwise
    def insert(self,x, index):
        assert index >= 0
        newNode = Node(x)
        if not index:
            newNode.next = self.head
            self.head = newNode
            return True
        else:
            curr = self.head 
            for _ in range(index-1):
                if curr is None:
                    return False
                curr = curr.next
            if curr is None:
                return False
            newNode.next = curr.next
            curr.next = newNode
            return True
            
    # remove item at index position  
    # return true if the element was actually removed, 
    # false otherwise
    def remove(self,index):
        assert index >= 0
        # Removing head
        if not index:
            if self.head is not None:
                self.head = self.head.next
                return True
            else:
                return False
        # index > 0
        else:
            curr = self.head
            for _ in range(index-1):
                if curr is None:
                    return False
                curr = curr.next
            if curr is None:
                return False
            else:
                if curr.next is None:
                    return False
                else:
                    curr.next = curr.next.next
                    return True
                
    # returns true if x in the list, false otherwise            
    def search(self,x):
        curr = self.head
        while curr is not None:
            if curr.val == x:
                return True
            curr = curr.next
        return False
    
    # for testing and nice representation
    def __str__(self):
        res = ""
        curr = self.head
        while curr is not None:
            res += str(curr.val) + "->"
            curr = curr.next
        return res
        
    
ll = LinkedList()
print(ll)
print(ll.insert(0,0))
print(ll)
print(ll.insert(1,1))
print(ll)
print(ll.insert(2,1))
print(ll)
print(ll.remove(2))
print(ll)
print(ll.search(2))
print(ll.remove(0))
print(ll)
print(ll.search(0))
print(ll.remove(0))
print(ll.remove(0))


True
0->
True
0->1->
True
0->2->1->
True
0->2->
True
True
2->
False
True
False


All of these operations (`insert`,`remove`,`search`) take \\(O(n)\\) which is usually not optimal. There are numerous versions of linked lists e.g. doubly linked list which has nodes that point both to the previous and next nodes. Linked lists are often used to implement other data structures such as queues.

## Queue

Ques are similair to stacks but utilize the **FIFO** (first in, first out) rule. Basic ques support `enque` and `deque` operations. A linked list implementation would be:

In [48]:
class Node:
    def __init__(self, item):
        self.item = item
        self.next = None
        self.previous = None


class Queue:
    def __init__(self):
        self.length = 0
        self.head = None
        self.tail = None

    def enqueue(self, x):
        newNode = Node(x)
        if self.head == None:
            self.head = self.tail = newNode
        else:
            self.tail.next = newNode
            newNode.previous = self.tail
            self.tail = newNode
        self.length += 1


    def dequeue(self):
        if not self.length:
            return None
        item = self.head.item
        self.head = self.head.next 
        self.length -= 1
        if self.length == 0:
            self.tail = None
        return item
    
    def __str__(self):
        res = ""
        curr = self.head
        while curr is not None:
            res += str(curr.item) + "->"
            curr = curr.next
        return res

queue = Queue()
print(queue)
queue.enqueue(1)
queue.enqueue(2)
queue.enqueue(3)
print(queue)
print(queue.dequeue())
print(queue)
print(queue.dequeue())
print(queue)
print(queue.dequeue())
print(queue)
print(queue.dequeue())


1->2->3->
1
2->3->
2
3->
3

None


With this implementation `enuqueue` and `dequeue` are \\(O(1)\\). Queues are used in numerous problems, such as graph traversals (BFS). 

## Binary search tree (BST)

*Binary tree* is another node-based structure also recursive in nature. The tree consists of a *root* which points to at most two children which are also binary trees. *Binary search trees* are a type of binary trees which have the property that the key of the left child is no greater than the root key. Right child's key is greater than the root key. An example binary search tree is as follows:

```{figure} algo_images/BST.png
:width: 80%
```

An example implementation for BST with integers is as follows:

In [None]:
# implemented as a set, will not containg duplicates
def BST:
    def __init__(self,val=None):
        self.val = val
        self.left = None
        self.right = None
        
    # true if the insert changes the BST    
    def insert(self,x):
        if self.val is None:
            self.val = x
            return True
        # already in the set
        elif self.val == x:
            return False
        elif self.val > x:
            if self.left is None:
                self.left = BST(x)
                return True
            else:
                return self.left.insert(x)
        else:
            if self.right is None:
                self.right = BST(x)
                return True
            else:
                return self.right.insert(x)
            
    # seach for x in BST
    def search(self,x):
        if self.val is None:
            return False
        elif self.val == x:
            return True
        elif self.val > x:
            if self.left is None:
                return False
            else:
                return self.left.search(x)
        else:
            if self.right is None:
                return False
            else:
                return self.right.search(x)            

Node removal is a bit more involved and will be a subject of an exercise. All basic operations can be done in \\(O(log(n))\\). BST can be used to speed up many problems that require repetitive searching of the data. Consider the following problem:

**Numbers smaller than the given number in an array** You are given an array of integers. You should provide a data structure which should return the number of elements smaller than any chosen element in the array.

We will solve the question by constructing a BST out of the array that will store the the number of nodes to the left of the node (its rank):

In [50]:
class newNode:
    def __init__(self, data):
        self.data = data 
        self.left = self.right = None
        self.leftSize = 0
        
# Inserting a new Node. 
def insert(root, data):
    if root is None: 
        return newNode(data) 
 
    # Updating size of left subtree. 
    if data <= root.data: 
        root.left = insert(root.left, data) 
        root.leftSize += 1
    else:
        root.right = insert(root.right, data)
    return root
 
# Function to get Rank of a Node x. 
def getRank(root, x):
     
   
    if root.data == x:
        return root.leftSize 
 

    if x < root.data: 
        if root.left is None: 
            return -1
        else:
            return getRank(root.left, x)
 

    else: 
        if root.right is None: 
            return -1
        else: 
            rightSize = getRank(root.right, x)
            if rightSize == -1:
                # x not found in right sub tree, i.e. not found in stream
                return -1
            else:
                return root.leftSize + 1 + rightSize
 
arr = [5, 1, 4, 4, 5, 9, 7, 13, 3] 
root = None
for i in range(len(arr)):
    root = insert(root, arr[i])
    
print(getRank(root, 4))
print(getRank(root, 13))
print(getRank(root, 8))

3
8
-1


This algorith will work in \\(O(log(n))\\) time on average. However, of the array was initially sorted the formed tree will be degenerate (it will basically form a linked list). We will learn how to solve such cases in AVL trees.

## Heap

*Heaps* (especially *binary heaps*) are tree like structures that have two properties:

1) They are a complete tree (all levels of the tree are completly filled except possibly the last level).
2) It is either a min heap or a max heap. In the first case root is smaller or equal to its children. The oppositie is true for the max heap.

Let us implement a min heap storing integers. It will support taking the top element (which is the minimum) as well as inserting new elements. Interestingly, we will implement it using a Python list. It is an important fact to spot that a parent of an element at index `i` is at index `i//2`.

In [65]:
# utility functions
def swap(arr, x, y):
    temp = arr[x]
    arr[x] = arr[y]
    arr[y] = temp
    
class MinHeap:
    def __init__(self):
        self.contents = []
        self.size = 0
        
    # utility function    
    def heapifyUp(self, i):
        while i // 2 > 0:
            if self.contents[i-1] < self.contents[i // 2 - 1]:
                swap(self.contents,  i-1, i // 2 - 1)
            i //= 2
    
    def insert(self,val):
        self.size += 1
        self.contents.append(val)
        self.heapifyUp(self.size)
        
    def heapifyDown(self,i):
        while (2 * i + 1) < self.size:
            mc = self.minChild(i)
            if self.contents[i] > self.contents[mc]:
                swap(self.contents, i, mc)
            i = mc
            
    def minChild(self, i):
        if 2 * i + 2 >= self.size:
            return 2 * i + 1
        else:
            if self.contents[2 * i + 1] < self.contents[2 * i + 2]:
                return 2 * i + 1
            else:
                return 2 * i + 2
        
    def getMin(self):
        assert self.size > 0
        self.size -= 1
        res = self.contents[0]
        swap(self.contents,0, self.size)
        self.contents.pop()
        self.heapifyDown(0)
        return res
        
#Heapsort -  a O(nlog(n)) sorting algorithm.

arr = [1,3,2,6,8,3,4,5,6,7,2,3,8]
heap = MinHeap()

for j in range(len(arr)):
    heap.insert(arr[j])

print(heap.contents)   
arr = []
while heap.size:
    arr.append(heap.getMin())
print(arr)    

[1, 2, 2, 5, 3, 3, 4, 6, 6, 8, 7, 3, 8]
[1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 7, 8, 8]


Heaps have many, many uses. One of such is HeapSort - a \\(O(nlog(n))\\) sorting algorithm (see above). Heaps are also used in *priority queues* - queues which have the elements sorted by some *priority* such as objects sorted by their value. 

## AVL Trees

## Tries

## Hash Tables

https://www.geeksforgeeks.org/rank-element-stream/
https://runestone.academy/runestone/books/published/pythonds/Trees/BinaryHeapImplementation.html