## Data Structure Notes

In [3]:
# Data types

print("\n Numeric")
print("1", type(1))
print("1.", type(1.))
print("1.0", type(1.))
print("3j", type(3j))

print("\n Boolean")
print("bool(1)", type(bool(1)))
print("bool(0)", type(bool(0)))
print("False", type(False))

print("\n Sequence") 
print("'1'", type('1'))
print("'a'", type('a'))
print("range(3)", type(range(3)))
print("[1,2,3]", type([1,2,3]))
print("(1,2,3)", type( (1,2,3)))

print("\n Mapping")
print("{'a':1,'b':2,'c':3}" , type({'a':1,'b':2,'c':3}))

print("\n Set")
print("{1,2,3}", type({1,2,3}))
print("frozenset((1,2,3))", type(frozenset((1,2,3))))



 Numeric
1 <class 'int'>
1. <class 'float'>
1.0 <class 'float'>
3j <class 'complex'>

 Boolean
bool(1) <class 'bool'>
bool(0) <class 'bool'>
False <class 'bool'>

 Sequence
'1' <class 'str'>
'a' <class 'str'>
range(3) <class 'range'>
[1,2,3] <class 'list'>
(1,2,3) <class 'tuple'>

 Mapping
{'a':1,'b':2,'c':3} <class 'dict'>

 Set
{1,2,3} <class 'set'>
frozenset((1,2,3)) <class 'frozenset'>


#### Other details
Immutable objects:  String, Integer, Float, Range, Tuple, Unicode
Mutable objects:  List, Dictionary, Set, 

#### Membership
* in
* not in

#### Identity
* is
* is not

#### Logical Operators
* AND
* OR
* NOT

#### Comparison Operator
* See list online - there are many



#### Common Dictionary Methods
* The dictionary data type is mutable and dynamic.  

mydict.clear()
mydict.get(<key>)
mydict.items()
mydict.keys()
mydict.values()
mydict.pop()
mydict.popitem()
mydict.update(<obj>)

#### Common Sets Methods
* A set is an unordered collection of hashable objects. It is iterable, mutable, and has unique elements. The order of the elements is also not defined. While the addition and removal of items are allowed, the items themselves within the set must be immutable and hashable. Sets support membership testing operators (in, not in), and operations such as intersection, union, difference, and symmetric difference. Sets cannot contain duplicate items.  

set1.union(set2)  
set1.intersection(set2)  
set1.difference(set2)  
set1.symmetric_difference(set2)
set1.issubset(set2)


**Note:** Immutable sets are known as frozensets.  Except for this one difference, they are the same.  

Here is a good example that shows how sets can only hold immutable objects.  A set of sets would not work because the individual sets are mutable, but a set of frozensets would work due to the frozensets immutabilty property.  


In [4]:
# Using just a set
a11 = set(['data'])
a21 = set(['structure'])
a31 = set(['python'])
x1 = {a11, a21, a31}


TypeError: unhashable type: 'set'

In [None]:
# Using a frozenset
a1 = frozenset(['data'])
a2 = frozenset(['structure'])
a3 = frozenset(['python'])
x = {a1, a2, a3}
print(x)


#### Collections
The collections module provides different types of containers, which are objects that are used to store different objects and provide a way to access them.  

Access this module via `from collections import _____`  

Types:  
namedTuple  
deque  
defaultdict  
ChainMap  
Counter  
UserDict, UserList, UserString  

### Algorithm Development

Important Factors:
- Time Complexity
- Space Complexity (Memory Usage)
- Asymptotically Efficiency (Rate of Growth)
- Amortized Analysis

Types of Notation:
- $theta - worst-case running time complexity with a tight bound.
- O - worst-case running time complexity with an upper bound
- $omega - lower bound of an algorithm’s running time  

Use Big O Notations:
- 0(1) : Constant
- O(logn)  :  Logarithmic
- O(n)   :  Linear
- (nlogn)   :  Linear-logarithmic
- O(n^2)   :  Quadriatic
- O(n^3)   :  Cubic
- O(2n)    :  Exponential

Total Time Complexity
- Simplified notation


### Algorithms  
- Brute-force - try all possible solutions

#### Recursion
- Search for condition to be true then stops but for each loop it modifies the previous result and calls itself


In [None]:
# Not Recursive - Factorial

def fact(n, tot=1):
    if n == 0:
        return tot
    else:
        tot *= n
        return fact(n-1, tot)
    
print(fact(4))

In [None]:
# Recursive - Factorial

def fact(n):
    if n == 0:
        return 1
    else:
        return n*fact(n-1)
    
print(fact(4))

#### Divide and conquer

In [None]:
# - binary search - 0(logn)

def binary_search(arr, start, end, key):
    loop = 0
    while start <= end: 
        loop += 1
        mid = start + int((end - start)/2)
        if arr[mid] == key:  
            return (mid, loop)  
        elif arr[mid] < key:  
            start = mid + 1  
        else:  
            end = mid - 1  
    return (None,loop)  

arr = [4, 6, 9, 13, 14, 18, 21, 24, 38] 
x = 6
result = binary_search(arr, 0, len(arr)-1, x)  
print(f'Binary Search: Index of {x}: {result[0]}.  Evaluated in {result[1]} loops')

In [None]:
# - merge sort - O(nlogn)

# merge
def merge(first_sublist, second_sublist): 
    i = j = 0
    merged_list = []
    while i < len(first_sublist) and j < len(second_sublist):
        if first_sublist[i] < second_sublist[j]:
            merged_list.append(first_sublist[i]) 
            i += 1 
        else:
            merged_list.append(second_sublist[j]) 
            j += 1
    while i < len(first_sublist): 
        merged_list.append(first_sublist[i]) 
        i += 1 
    while j < len(second_sublist):
        merged_list.append(second_sublist[j]) 
        j += 1
    return merged_list 

# merge sort
def merge_sort(unsorted_list): 
    
    # stop trigger
    if len(unsorted_list) == 1: 
        return unsorted_list
    
    # split list
    mid_point = int(len(unsorted_list)/2)
    first_half = unsorted_list[:mid_point] 
    second_half = unsorted_list[mid_point:] 
    
    # recursive function
    half_a = merge_sort(first_half) 
    half_b = merge_sort(second_half) 
    
    # takes smallest value (most to left)
    return merge(half_a, half_b) 

print("Merge Sort of List: ", merge_sort([10,9,3,5,2,1]))

#### Other Algorithms
- quick sort
- algo for fast multiplication
- strassen's matrix multiplication
- closes pair of points

### Dynamic programming

In [None]:
# Fibonacci Sequence - Recursive
def fib(n):   
     if n <= 1:   
        return 1   
     else:  
        return fib(n-1) + fib(n-2)  
for i in range(5):
    print(fib(i))

In [None]:
# Fibonacci Sequence - Dynamic
def dyna_fib(n):
    if n == 0:
        return 0
    if n == 1:
        return 1  
    # stored calculation lookup
    if lookup[n] is not None:
        return lookup[n]
  
    lookup[n] = dyna_fib(n-1) + dyna_fib(n-2)
    return lookup[n]
lookup = [None]*(1000)
 
for i in range(6): 
    print(dyna_fib(i))

### Greedy algorithms

#### Shortest distance

In [None]:
# Dijstra's Algorithm -  O(|Edges| + |Vertices|log|Vertices|)
def get_shortest_distance(table, vertex): 
    shortest_distance = table[vertex][DISTANCE] 
    return shortest_distance 

def set_shortest_distance(table, vertex, new_distance): 
    table[vertex][DISTANCE] = new_distance 

def set_previous_node(table, vertex, previous_node): 
    table[vertex][PREVIOUS_NODE] = previous_node 
    
def get_distance(graph, first_vertex, second_vertex): 
    return graph[first_vertex][second_vertex] 

def get_next_node(table, visited_nodes): 
    unvisited_nodes = list(set(table.keys()).difference(set(visited_nodes))) 
    assumed_min = table[unvisited_nodes[0]][DISTANCE] 
    min_vertex = unvisited_nodes[0] 
    for node in unvisited_nodes: 
        if table[node][DISTANCE] < assumed_min: 
            assumed_min = table[node][DISTANCE] 
            min_vertex = node 
    return min_vertex 


def find_shortest_path(graph, table, origin): 
    visited_nodes = [] 
    current_node = origin 
    starting_node = origin 
    while True: 
        adjacent_nodes = graph[current_node] 
        if set(adjacent_nodes).issubset(set(visited_nodes)): 
            # Nothing here to do. All adjacent nodes have been visited. 
            pass 
        else: 
            unvisited_nodes = set(adjacent_nodes).difference(set(visited_nodes)) 
            for vertex in unvisited_nodes: 
                distance_from_starting_node = get_shortest_distance(table, vertex) 
                if distance_from_starting_node == INFINITY and current_node == starting_node: 
                    total_distance = get_distance(graph, vertex, 
                                                  current_node) 
                else: 
                    total_distance = get_shortest_distance (table, 
                    current_node) + get_distance(graph, current_node, 
                                                 vertex) 
                if total_distance < distance_from_starting_node: 
                    set_shortest_distance(table, vertex, 
                                          total_distance) 
                    set_previous_node(table, vertex, current_node) 
        visited_nodes.append(current_node)
        #print(visited_nodes)
        if len(visited_nodes) == len(table.keys()): 
            break 
        current_node = get_next_node(table,visited_nodes) 
    return (table)


# ------------------------------------------

graph = dict() 
graph['A'] = {'B': 5, 'D': 9, 'E': 2} 
graph['B'] = {'A': 5, 'C': 2} 
graph['C'] = {'B': 2, 'D': 3} 
graph['D'] = {'A': 9, 'F': 2, 'C': 3} 
graph['E'] = {'A': 2, 'F': 3} 
graph['F'] = {'E': 3, 'D': 2} 


# DISTANCE = 0 
# PREVIOUS_NODE = 1 
# INFINITY = float('inf')

table = { 
    'A': [0, None], 
    'B': [float("inf"), None], 
    'C': [float("inf"), None], 
    'D': [float("inf"), None], 
    'E': [float("inf"), None], 
    'F': [float("inf"), None], 
}

shortest_distance_table = find_shortest_path(graph, table, 'A') 
for k in sorted(shortest_distance_table): 
     print("{} - {}".format(k,shortest_distance_table[k])) 

#### Other Algorithms
-  Kruskal’s minimum spanning tree
-  Dijkstra’s shortest path problem
-  The Knapsack problem
-  Prim’s minimal spanning tree algorithm
-  The traveling salesperson problem

### Linked Lists 
- also known as singly linked lists
- good if lots of changes are needed to the list due to it uses pointers
- each node holds a value and the next node reference unless it is the last item in the list and node.next will equal 'None'.
- slower than arrays at reading the data but doesn't require contiguous memory allocation.  
- There is also a doubly linked lists which has the next link and a previous link to nodes  
- There is also a circular linked lists which means that the last node references another node; thus, creating a circular list



### Stacks 
- data is stored sequentially like lists and arrays but is managed by how data is added and removed
- a stack works off the last in, first out (LIFO) or first in, last out (FILO) principle
- Key methods:  stack(), push "", peek(), pop()

In [None]:
size = 3
data = [0]*(size)   #Initialize the stack
top = -1
def push(x):
     global top
     if top >= size - 1:
           print("Stack Overflow")
     else:
           top = top + 1
           data[top] = x

print(data)
push(5)
print(data)
push(2)
print(data)
push(8)
print(data)
# not able to push last value - overflow
push(4)

### Queues  
- data is stored sequentially like lists and arrays but is managed by how data is added and removed
 - a queue works off the first in, first out (FIFO) principle
 - Key methods:  queue(), enqueue- "packt", enqueue "publishing", Size(), dequeue()
 - when creating list based queue, you can use lists or linked lists
 

### Trees  
- hierarchical form so very different than the linear fashion of lists, queues, and stacks
- it is non-linear and has a parent-child relationship
- traverse the tree by in-order, pre-order, or post-order

In [None]:
class Node:
    def __init__(self, data):
        self.data = data
        self.right_child = None
        self.left_child = None

In [None]:
n1 = Node("root node")
n2 = Node("left child node")
n3 = Node("right child node")
n4 = Node("left grandchild node")

n1.left_child = n2
n1.right_child = n3
n2.left_child = n4

In [None]:
current = n1
while current:
    print(current.data)
    current = current.left_child

In [None]:
# recursive inorder traverse
def inorder(root_node):
    current = root_node
    if current is None:
        return
    inorder(current.left_child)
    print(current.data)
    inorder(current.right_child)
inorder(n1)

In [None]:
# recursive pre-order traverse
def preorder(root_node):
    current = root_node
    if current is None:
        return
    print(current.data)
    preorder(current.left_child)
    preorder(current.right_child)
preorder(n1)

In [None]:
# recursive post-order traverse
def postorder( root_node):
    current = root_node
    if current is None:
        return
    postorder(current.left_child)
    postorder(current.right_child)
    print(current.data)
postorder(n1)

In [None]:
# breadth-first traversal
from collections import deque
class Node:
    def __init__(self, data):
        self.data = data
        self.right_child = None
        self.left_child = None
        
n1 = Node("root node")
n2 = Node("left child node")
n3 = Node("right child node")
n4 = Node("left grandchild node")
n1.left_child = n2
n1.right_child = n3
n2.left_child = n4
 
def level_order_traversal(root_node):
    list_of_nodes = []
    traversal_queue = deque([root_node])
    while len(traversal_queue) > 0:
        node = traversal_queue.popleft()
        list_of_nodes.append(node.data)
        if node.left_child:
            traversal_queue.append(node.left_child)
            if node.right_child:
                traversal_queue.append(node.right_child)
    return list_of_nodes
print(level_order_traversal(n1))

#### Expression Trees

#### Binary search trees

#### Heaps
- A heap data structure is a tree-based data structure in which each node of the tree has a specific relationship with other nodes, and they are stored in a specific order

#### Heap Sort

#### Priority Queues
- data structure that is similar to a queue in which data is retrieved based on the First In, First Out (FIFO) policy, but in the priority queue, priority is attached with the data. In the priority queue, the data is retrieved based on the priority associated with the data elements, the data elements with the highest priority are retrieved before the lower priority data elements, and if two data elements have the same priority, they are retrieved according to the FIFO policy.

#### Hash Tables

#### Graphs and Algorithms
- breadth-first search (BFS)
- depth-first search (DFS)

- Minimum Spanning Tree (MST)
- Kruskal's Minimum Spanning Tree
- Prim's Minimum Spanning Tree

#### Searching
- Linear search
- Unordered linear search
- Jump Search
- Interprolation search
- exponentail search


Choosing a search algorithm

#### Sorting Algorithms
- bubble sort algorithm
- selection sort algorithm
- quicksort algorithm

#### Selection Algorithms
-determininistic selection

#### String Matching Algorithms
- Brute force
- KMP algorithm
- Rabin-Karp algorithm
- Knuth-Morris-Pratt algorithm
- Boyer-Moore algorithms