# *Algorithms and Data Structures in Python*, Goodrich, Tamassia, Goldwasser

## 1. Python Primer

### Miscellaneous notes:

* The keys of ```dict``` are ordered since Python 3.6
* > The ```and``` and ```or``` operators *short-circuit*, in that they do not evaluate the second
operand if the result can be determined based on the value of the ﬁrst operand
* ```a is b``` checks if identifiers ```a``` and ```b``` are aliases of the *same* object, whereas ```a == b``` checks of the objects identified by both identifies are deemed to be equivalent
* Operators can be used with sets: ```<```/```<=```, ```|```, ```&```, ```^``` and ```-```
* ```list += [4, 5]``` extends ```list``` whereas ```list = list + [4, 5]``` reassign ```list``` to a new list
* Iterable vs iterators vs generators
    * Iterable: objects with an ```__iter__()``` or ```getitem()``` method that returns an iterator
    * Iterators: objects with a current state + both an ```__iter__()``` (generally returning self) and ```__next__()``` methods
    * Generators: functions returning an generator iterator (with keyword ```yield```)
* Create 2-dimensional arrays: `[[0]*3 for _ in range(3)]`

### Generators

An interesting example with multiple ```yield``` statements:

In [5]:
def factors(n):
    k=1
    while k * k < n:
        if n % k ==0:
            yield k
            yield n//k
        k += 1
    if k * k == n:
        yield k

list(factors(100))

[1, 100, 2, 50, 4, 25, 5, 20, 10]

### Comprehension

Not for lists only!

In [6]:
# Generators for squares up to integer n:

n = 100

lc = [k*k for k in range(1, n+1)] # List comprehension
sc = {k*k for k in range(1, n+1)} # Set comprehension
gc = (k*k for k in range(1, n+1)) # Generator comprehension
dc = {k: k*k for k in range(1, n+1)}  # Dictionary comprehension

### ```dict``` tips

* ```get()```: ```a_dict.get(key, value)``` returns ```a_dict[key]``` if key exists, else it returns ```value```

* ```setdefault()```: ```a_dict.setdefault(key, value)``` returns ```a_dict[key]``` if key exists, else it sets ```a_dict[key]``` to ```value``` and returns ```value```. 

### ```collections.defaultdict```

Useful to set default values to *any* key, e.g. ```defaultdict(list)```

### ```collections.Counter```

Counter is a subclass of dict that uses 0 as the default value for any missing element.

```python
from collections import Counter
words = "Count these words and words".split()
counts = Counter(words)
counts.most_common(2)
```

### ```itertools```

* ```itertools.permutations(say_a_list, r=2)```: returns all couples of elements in ```say_a_list```, with order taken into account
* ```itertools.combinations(say_a_list, r=3)```: returns all triplets of elements in ```say_a_list```, without order taken into account
* Cartesian product: ```itertools.product```

# 2. OOP

General principles:
* Modularity (decomposition in separate functional units)
* Abstraction, e.g. Abstract Data Types (ADT) which specifies **what** operations do, not **how**
* Encapsulation

*Getters*/*Setters* are not really considered to pythonic. Use ```@property```, ```@attribute.setter``` and ```@attribute.deleter``` as instead:

```python

# circle.py
class Circle:
    
    def __init__(self, radius):
        self._radius = radius

    @property
    def radius(self):
        """The radius property."""
        print("Get radius")
        return self._radius

    @radius.setter
    def radius(self, value):
        print("Set radius")
        self._radius = value

    @radius.deleter
    def radius(self):
        print("Delete radius")
        del self._radius
```

* 'Magic' a.k.a *dunder* methods: if ```__str__``` is not overridden, then it is the same as ```__repr__``` by default

## Testing:

* *Stubbing*: in a top-down approach, replace the output of a function B called inside function A by a fixed value.
* Unit testing is actually a bottom-up strategy

## Miscellaneous

```__slots__```: see [here](https://stackoverflow.com/questions/472000/usage-of-slots)

# 3. Complexity

## Prefix averages

In [7]:
# Quadratic complexity

def prefix_average_1_0(S):
    averages = []
    for j in range(len(S)):
        total = 0
        for i in range(j+1):
            total += S[i]
        averages.append(total/(j+1))
    return averages

def prefix_average_1_1(S):
    averages = []
    for j in range(len(S)):
        averages.append(sum(S[:j+1])/(j+1))
    return averages

def prefix_average_2(S):
    averages = []
    sum = 0
    for j in range(len(S)):
        sum = sum + S[j]
        averages.append(sum/(j+1))
    return averages

prefix_average_1_0([1, 1, 2]), prefix_average_1_1([1, 1, 2]), prefix_average_2([1, 1, 2]), 

([1.0, 1.0, 1.3333333333333333],
 [1.0, 1.0, 1.3333333333333333],
 [1.0, 1.0, 1.3333333333333333])

In [8]:
def disjoint_1(A,B, C):
    for a in A:
        for b in B:
            for c in C:
                if a == b == c:
                    return False
    return True

def disjoint_2(A,B, C):
    for a in A:
        for b in B:
            if a == b:
                for c in C:
                    if a == c:
                        return False
    return True

disjoint_1((1,2,3), (4,1,6), (1,8,9)), disjoint_2((1,2,3), (4,1,6), (1,8,9))

(False, False)

## Amortized complexity vs average-case complexity

* Amortized complexity considers the total complexity of a sequence of operations rather than one operation repetead n times (where n would be the length of the said sequence): see [Wikipedia](https://en.wikipedia.org/wiki/Amortized_analysis) or [Stackoverflow](https://stackoverflow.com/questions/7333376/difference-between-average-case-and-amortized-analysis)

* Average-case complexity considers all possible inputs and makes assumption about their distribution

# 4. Recursion

It is important to ```return``` the recursive call of the function, otherwise the first call ends returning ```None```.

In [9]:
# Sequential Search vs Binary Search

def sequential_search(data, target):
    i = 0
    while i < len(data):
        if data[i] == target:
            return i
        i += 1
    return -1

def binary_search(data_ordered, target, low = None, high = None):
    '''
    Return -1 if `target` is not in the ordered list of unique elements `data_ordered`, else the index of `target` in `data_ordered`
    '''
    
    if low is None: # no need to test both `low` and `high`, neither are None or they are both None
        low, high = 0, len(data_ordered)-1
    if low > high: # Works for empty lists as well
        return -1
    else:
        mid = (low + high) // 2
        if data_ordered[mid] == target:
            return mid
        elif data_ordered[mid] < target:
            return binary_search(data_ordered, target, low = mid + 1, high = high)
        else:
            return binary_search(data_ordered, target, low = low, high = mid - 1)

binary_search([1,2,3], 1), binary_search([1,2,3], 2), binary_search([1,2,3], 3), binary_search([1,2,3], 4), binary_search([1,2,3], -2), binary_search([42], 42), binary_search([], 42)

(0, 1, 2, -1, -1, 0, -1)

In [10]:
import os

def disk_usage(path = os.getcwd()):
    total = os.path.getsize(path)
    if os.path.isdir(path):
        for filename in os.listdir(path):
            total += disk_usage(os.path.join(path, filename))
    print(f'Size: {total:0.0f} | {path}')
    return total

## Tail recursion to non-recursive

*Not covered*

### Binary search

*Not covered*

# 5. Arrays

In [11]:
# Insertion sort

def insertion_sort(A):
    for i in range(1, len(A)):
        current = A[i] # A[i] is "free" now
        j = i-1
        while j >= 0 and A[j] > current:
            A[j+1] = A[j] # Shift A[j] on the "right", A[i] is saved in `current` anyway
            j -= 1
        # Exit the loop when A[j] <= current
        # `current` should now be at A[j+1]
        # What about value at A[j+1]? It was pushed to A[j+2] in the while loop
        A[j+1] = current 
    return A

In [12]:
# Caesar cipher

class CaesarCipher:

    LENGTH_ALPHABET = ord('Z') - ord('A') + 1

    def __init__(self, key=0):
        self.key = key    

    # Helper functions to encrypt/decrypt characters
    @staticmethod
    def encrypt_character(char, key):
        return chr(ord('A') + (ord(char) - ord('A') + key) % CaesarCipher.LENGTH_ALPHABET )
    @staticmethod
    def decrypt_character(char, key):
        return chr(ord('A') + (ord(char) - ord('A') - key) % CaesarCipher.LENGTH_ALPHABET )

    # String encryption
    def encrypt(self, string):
        return ''.join([CaesarCipher.encrypt_character(char, self.key) for char in string])
    
    def decrypt(self, string):
        return ''.join([CaesarCipher.decrypt_character(char, self.key) for char in string])

test = CaesarCipher(3)
test.encrypt('ABCYZ'), test.decrypt(test.encrypt('ABCYZ'))



('DEFBC', 'ABCYZ')

## Multidimensional arrays

In [13]:
# Don't
n = 4 # 4 rows
p = 3 # 3 columns

flawed_2d_list = [[0] * p] * n

# Do

correct_2d_list = [[0] * p for i in range(n)]

# Stacks

## Implementation

Use the existing Python's list class to implement a stack class with the **adapter** design pattern.

In [14]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Stack:
    def __init__(self):
        self._stack = []

    def is_empty(self):
        return len(self._stack) == 0

    def __len__(self):
        return len(self._stack)

    def __repr__(self):
        return ''.join(
            [str(self._stack[i])+'\n' for i in range(self.__len__()-1, -1, -1)]
            )
    
    def push(self, element):
        self._stack.append(element)

    def pop(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack.pop()

    def top(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack[-1]


In [15]:
# Matching delimiters with a stack

def consistent_delim(string):
    '''Returns True if parentheses/brackets/curly braces are consistently opened and closed in ``string``, False otherwise'''

    left_delim = '([{'
    right_delim = ')]}'

    stacked_delim = Stack()
    for char in string:
        if char in left_delim:
            stacked_delim.push(char)
        elif char in right_delim:
            try:
                if left_delim.index(stacked_delim.pop()) != right_delim.index(char):
                    return False
            except Empty:
                return False
    return stacked_delim.is_empty()

In [16]:
consistent_delim('(){}')

True

# Queues

## Implementation

Using ```.pop(0)``` is not an option because it has a O(n) complexity: every single element except the one at index 0 has to be shifted to the left individually.

Another option would be to point dequeued indexes to ```None```, to append queued elements to 'the right' and to maintain an index of current first element's index. But this ends up with a potentially very long list to maintain (if many queues/dequeues are done), even for a not-that-long queue.

The recommended implementation is a circular list.

In [17]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Queue:

    INITIAL_CAPACITY = 10

    def __init__(self):
        self._data = [None] * Queue.INITIAL_CAPACITY
        self._size = 0
        self._front = 0

    def __len__(self):
        return self._size
    
    def __repr__(self):
        return ', '.join([str(self._data[(self._front + i) % len(self._data)]) for i in range(self._size)])

    def is_empty(self):
        return self._size == 0

    def first(self):
        if self.is_empty():
            raise Empty('Empty queue.')
        else:
            return self._data[self._front]

    def dequeue(self):
        if self.is_empty():
            raise Empty('Empty queue.')
        first, self._data[self._front] = self._data[self._front], None
        self._front = (self._front + 1) % len(self._data)
        self._size -= 1        
        return first
            

    def enqueue(self, element):
        if self._size == len(self._data):
            print('Resizing underlying list...')
            self.resize()
        back = (self._front + self._size) % len(self._data)
        self._data[back] = element
        self._size += 1

    def resize(self):
        new_data = [None] * (2 * len(self._data))
        for i in range(self._size):
            new_data[i] = self._data[(self._front + i) % len(self._data)]
        self._data = new_data
        self._front = 0

test = Queue()
test.enqueue(11)
print(test)
test.enqueue(33)
print(test)
test.enqueue(55)
print(test)
test.dequeue()
print(test)
test.dequeue()
print(test)

for i in range(2, 22, 2):
    test.enqueue(i)
    print(f'Enqueueing {i}: {test}')

test.dequeue()
test.dequeue()
print(f'Finally: {test}')

11
11, 33
11, 33, 55
33, 55
55
Enqueueing 2: 55, 2
Enqueueing 4: 55, 2, 4
Enqueueing 6: 55, 2, 4, 6
Enqueueing 8: 55, 2, 4, 6, 8
Enqueueing 10: 55, 2, 4, 6, 8, 10
Enqueueing 12: 55, 2, 4, 6, 8, 10, 12
Enqueueing 14: 55, 2, 4, 6, 8, 10, 12, 14
Enqueueing 16: 55, 2, 4, 6, 8, 10, 12, 14, 16
Enqueueing 18: 55, 2, 4, 6, 8, 10, 12, 14, 16, 18
Resizing underlying list...
Enqueueing 20: 55, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20
Finally: 4, 6, 8, 10, 12, 14, 16, 18, 20


# Linked List

## Singly Linked Lists

### Implementation

In [18]:
class LinkedList:

    # Node 'private' class:
    class _Node:
        def __init__(self, element, next = None):
            self._element = element
            self._next = next

        @property
        def element(self):
            return self._element

        @property
        def next_node(self):
            return self._next

        @next_node.setter
        def next_node(self, node):
            self._next = node
        
        def __repr__(self):
            return str(self.element)

    # Linked list:
    def __init__(self, values = None, side = 'head'):
        self._size = 0
        self._head = None
        self._tail = None

        if values is not None:
            self.add(values, side=side)

    def __iter__(self):
        node = self._head
        while node:
            yield node.element
            node = node.next_node
    
    def __len__(self):
        return self._size

    def __repr__(self):
        return ' -> '.join([str(element) for element in self])

    def is_empty(self):
        return self._size == 0

    @property
    def values(self):
        return [element for element in self]

    def _add(self, element, side = 'head'):
        if side == 'head' or side == 'tail':
            if self.is_empty():
                self._head = LinkedList._Node(element)
                self._tail = self._head
                self._size += 1
            else:
                if side == 'head':
                    self._head = LinkedList._Node(element, self._head)
                    self._size += 1            
                elif side == 'tail':
                    self._tail.next_node = LinkedList._Node(element)
                    self._tail = self._tail.next_node
                    self._size += 1
        else:
            raise Exception("""Value of `side` parameter is either 'tail' or 'head'""")
    
    def add(self, elements, side = 'head'):
        if not isinstance(elements, (list, tuple, dict, set)):
            elements = elements,
        for e in elements:
            self._add(e, side = side)
    
    def remove(self, side = 'head'):
        if self.is_empty():
            raise Exception("""The linked list is empty: no head to remove.""")
        if side == 'head':
            self._head = self._head.next_node
            self._size -= 1
            if self.is_empty(): # Handle special case: removing from a 1-element list
                self._tail = None
        elif side == 'tail':
            print('Tail not removed!\
                Removing the tail is not implemented in this singly linked list class,\
                because this operation would require the traversal of the entire singly linked list.\
                \nSee the doubly linked lists implementation')
        else:
            raise Exception("""Value of `side` parameter is either 'tail' or 'head'""")
    
    def head(self):
        return self._head.element

test = LinkedList()
test.add(1)
test.add([2, 3])
test.add(0)
test.remove()
test

3 -> 2 -> 1

### Implementing a stack with a Singly Linked List

In this case, all operations have a worst-case O(1) running time, whereas the first implementation relying on the standard ```list``` class has amortized O(1) complexity.

In [19]:
class Empty(Exception):
    '''Error: attempting to access an element from an empty data structure.'''
    pass

class Stack2:
    def __init__(self):
        self._stack = LinkedList()

    def is_empty(self):
        return self._stack.is_empty()

    def __len__(self):
        return len(self._stack)

    def __repr__(self):
        return ''.join(
            [str(value)+'\n' for value in self._stack.values]
            )
    
    def push(self, element):
        self._stack.add(element)

    def pop(self):
        if self.is_empty():
            raise Empty('Empty stack')
        top = self.top()
        self._stack.remove()
        return top

    def top(self):
        if self.is_empty():
            raise Empty('Empty stack')
        return self._stack.head()

In [20]:
test = Stack2()
len(test)
test.push(3)
test.push(2)
test.push(1)
print(f'Stack:\n{test}')
print(f'.pop(): {test.pop()}, .top(): {test.top()}\n')
print(f'Stack:\n{test}')

Stack:
1
2
3

.pop(): 1, .top(): 2

Stack:
2
3



Similarly, a queue with worst-case O(1) time complexity for all operations can be implemented with a Singly Linked List.

## Circular Linked Lists

This data structure is used in [round-robin schedulers](https://en.wikipedia.org/wiki/Round-robin_scheduling) for example (a constant given time is allocated in order to a queue of processes). In this case, the point is to limit the number of operations:

* With a queue:
    * e = Q.dequeue()
    * Process element e
    * Q.enqueue(e)

&nbsp;

* With a circular linked list:
    * Process element C.front()
    * C.rotate()

In [21]:
class CircularQueue:

    class _Node:
        def __init__(self, element = None, next = None) -> None:
            self._element = element
            self._next = next
    
    def __init__(self):
        self._tail = None
        self._size = 0

    def __len__(self):
        return self._size

    def is_empty(self):
        return self._size == 0

    def first(self):
        if self.is_empty():
            raise Empty('Queue is empty')
        return self._tail._next._element

    def dequeue(self):
        if self.is_empty():
            raise Empty('Queue is empty')

        head = self._tail._next
        if self._size == 1:
            self._tail = None
        else:
            self._tail._next = head._next
        self._size -= 1
        return head._element

    def enqueue(self, e):
        new_tail = CircularQueue._Node(e)
        if self.is_empty():
            new_tail._next = new_tail
        else:
            new_tail._next = self._tail._next
            self._tail._next = new_tail
        self._tail = new_tail
        self._size += 1

    def rotate(self):
        if self._size > 1:
            self._tail = self._tail._next

test = CircularQueue()
test.enqueue(1)
test.enqueue(2)
print(test.first())
print(test.dequeue())
test.enqueue(3)
print(test.dequeue())
print(test.first())
print(len(test))

1
1
2
3
1


## Doubly Linked Lists

> We can efﬁciently insert a node at either end of a singly linked list, and can delete a node at the head of a list, but we are unable to efﬁciently delete a node at the tail of the list. More generally, we cannot efﬁciently delete an arbitrary node from an interior position of the list if only given a reference to that node, because we cannot determine the node that immediately precedes the node to be deleted (yet, that node needs to have its next reference updated).

> Doubly Linked Lists allow a greater variety of O(1)-time update operations, including insertions/deletions at arbitrary positions.

### Implementation

Use *dummy nodes* a.k.a *sentinels* a.k.a *header*/*trailer* nodes to avoid special cases at the boundaries of the doubly linked list.

In [22]:
class _DoublyLinkedList:
    class _Node:
        def __init__(self, element, prev = None, next = None):
            self._element = element
            self._prev = prev
            self._next = next
        
    def __init__(self):
        self._header = self._Node(None)
        self._trailer = self._Node(None)
        self._header._next = self._trailer
        self._trailer._prev = self._header

        self._size = 0

    def __len__(self):
        return self._size

    def is_empty(self):
        return self._size == 0

    def _insert_between(self, e, predecessor, successor):
        new_node = self._Node(e, predecessor, successor)
        predecessor._next = new_node
        successor._prev = new_node
        self._size += 1
        return new_node

    def _delete_node(self, node):
        node._prev._next = node._next
        node._next._prev = node._prev
        self._size -= 1
        element = node._element
        node._prev = node._next = node._element = None
        return element

Just as the Singly Linked List was a relevant base for implementing a queue, Doubly Linked Lists are a good base to implement **Deques**.

In [23]:
class LinkedDeque(_DoublyLinkedList):
    def first(self):
        if self.is_empty():
            raise Empty('Dequeue is empty.')
        return self._header._next._element

    def last(self):
        if self.is_empty():
            raise Empty('Dequeue is empty.')
        else:
            return self._trailer._prev._element

    def insert_first(self, e):
        self._insert_between(e, self._header, self._header._next_)

    def insert_last(self, e):
        self._insert_between(e, self._trailer._prev, self._trailer)

    def delete_first(self):
        if self.is_empty():
            raise Empty('Dequeue is empty.')
        self._delete(self._header._next)

    def delete_last(self):
        if self.is_empty():
            raise Empty('Dequeue is empty.')
        self._delete(self._trailerer._prev)

Doubly Linked Lists are also used to implement **Positional Lists**:

In [24]:
class _DoublyLinkedList:
    class _Node:
        def __init__(self, element, prev = None, next = None):
            self._element = element
            self._prev = prev
            self._next = next
        
    def __init__(self):
        self._header = self._Node(None)
        self._trailer = self._Node(None)
        self._header._next = self._trailer
        self._trailer._prev = self._header

        self._size = 0

    def __len__(self):
        return self._size

    def is_empty(self):
        return self._size == 0

    def _insert_between(self, e, predecessor, successor):
        new_node = self._Node(e, predecessor, successor)
        predecessor._next = new_node
        successor._prev = new_node
        self._size += 1
        return new_node

    def _delete_node(self, node):
        node._prev._next = node._next
        node._next._prev = node._prev
        self._size -= 1
        element = node._element
        node._prev = node._next = node._element = None
        return element

# _Node attributes: element, prev, next
# _DoublyLinkedList attributes: header, trailer, size
# _DoublyLinkedList methods: len, is_empty, insert_between(e, pred, succ), delete_node(node)

class PositionalList(_DoublyLinkedList):

    class Position:
        def __init__(self, container, node):
            self._container = container
            self._node = node

        def element(self):
            return self._node._element
        
        def __eq__(self, other):
            # Checking type of arguments ensures a position and a node cannot be equal
            return type(other) is type(self) and other._node is self._node
        
        def __neq(self, other):
            return not other == self
    
    def _validate(self, p):
        if not isinstance(p, self.Position):
            raise TypeError('Argument p must be of type Position.')
        if p._container is not self:
            raise ValueError('Passed argument p does not belong to current container.')
        if p._node._next is None:
            raise ValueError('Passed argument p is no longer valid.')
        return p._node

    def _make_position(self, node):
        if node is self._header or node is self._trailer:
            return None
        else:
            return self.Position(self, node)

    def first(self):
        return self._make_position(self._header._next)

    def last(self):
        return self._make_position(self._trailer._prev)

    def before(self, p):
        input_node = self._validate(p)
        return self._make_position(input_node._prev)

    def after(self, p):
        input_node = self._validate(p)
        return self._make_position(input_node._next)

    def __iter__(self):
        cursor = self.first()
        while cursor is not None:
            yield cursor.element()
            cursor = self.after(cursor)
    
    def __repr__(self):
        elements = []
        for e in self.__iter__():
            elements.append(str(e))
        return '\n'.join(elements)

    # Override inherited version to return Position, rather than Node
    def _insert_between(self, e, predecessor, successor):
        node = super()._insert_between(e, predecessor, successor)
        return self._make_position(node)

    def add_first(self, e):
        return self._insert_between(e, self._header, self._header._next)

    def add_last(self, e):
        return self._insert_between(e, self._trailer._prev, self._trailer)

    def add_before(self, p, e):
        node = self._validate(p)
        return self._insert_between(e, node._prev, node)

    def add_after(self, p, e):
        node = self._validate(p)
        return self._insert_between(e, node, node._next)

    def delete(self, p):
        return self._delete_node(self._validate(p))

    def replace(self, p, e):
        original = self._validate(p)
        old_value = original._element
        original._element = e
        return old_value

In [25]:
L = PositionalList()
L.add_last(7)
L.add_last(2)
L.add_last(3)
L

7
2
3

In [26]:
L.before(L.first())

## Insertion sort with a Positional List

In [27]:
def insertion_sort_vlg(L): # 15 lines
    if not L.is_empty(): # Edge case: empty list
        current_position = L.after(L.first()) # Initialize sorting on second element of the list
        while current_position is not None: # Loop till end of positional list (Edge case ok: 1-element list)
            current_element = current_position.element()
            walk = current_position
            if L.before(walk).element() <= current_element: # No need to move current element
                current_position = L.after(current_position) # Go to next position
            else:
                while walk != L.first() and L.before(walk).element() > current_element: # Shift leftward
                    walk = L.before(walk)
                L.add_before(walk, current_element)
                # The marker/pivot variables below avoid the following lines
                next_position = L.after(current_position)# Delete current position and shift to the next one
                L.delete(current_position)
                current_position = next_position

L = PositionalList()
L.add_last(7)
L.add_last(2)
L.add_last(3)
print('Not sorted:')
print(L)
insertion_sort_vlg(L)
print('\nSorted:')
print(L)           


Not sorted:
7
2
3

Sorted:
2
3
7


In [28]:
# The textbook uses 3 positions instead of 2: marker, walk and pivot
# Resulting code is slightly more concise and readable

def insertion_sort_book(L): # 13 lines
    # Edge case: empty list
    if len(L) > 1:
        marker = L.first()
        while marker != L.last(): # marker is not 'moved': It becomes the pivot on its right, or it's the pivot that moves leftward
            pivot = L.after(marker)
            if pivot.element() > marker.element():
                marker = pivot
            else:
                walk = marker
                while walk != L.first() and L.before(walk).element() > pivot.element():
                    walk = L.before(walk)
                L.add_before(walk, L.delete(pivot)) # pretty cool

L = PositionalList()
L.add_last(7)
L.add_last(2)
L.add_last(3)
print('Not sorted:')
print(L)
insertion_sort_book(L)
print('\nSorted:')
print(L) 

Not sorted:
7
2
3

Sorted:
2
3
7


## Linked lists *vs* Array lists

Array lists' advantages:

* Access to specific elements in O(1) time, thanks to the index (whereas O(k)/O(n-k) for linked lists)
* At most *2n* object references (if the array has just been resized), at least *2n* for linked lists (*3n* if doubly)

Linked lists' advantages:

* O(1) insertion/deletion at every position (inserting at first position has O(n) complexity with arrays)

* Provides wort-case bounds instead of amortized bounds

# Trees

*Ordered trees*: a tree is ordered where children of every node are ordered.

## Depth/Height

### Depth

The depth of a node is the number of its ancestor (excluding the node itself).

<pre>
a
├── b
├── c
    ├── d
    ├── e
</pre>  

*The depth of node d is 2.*

The depth can be computed recursively:

In [29]:
def depth(node):
    if node.isroot():
        return 0
    else:
        return 1 + depth(node.parent())

```depth()``` has a O(depth(node)) time complexity for a given node. Hence the worst-case time complexity is O(n) since a one-branch tree would yield a leaf with depth n-1.

### Height

The height of a **node** is defined as:
* height(node) = 1 + max({height(child) | child ∈ children of node}) if the node is not a leaf
* height(node) = 0 if the node is a leaf

The height of a **tree** is the height of its root. Note that height(T) = max({depth(L) | L ∈ Leaves of T})

In [33]:
def height_suboptimal(T):
    return 1 + max((depth(node) for node in T.nodes() if node.is_leaf()))

The above algorithm runs in O(n<sup>2</sup>) worst-case time: listing the nodes can be done in O(n) worst-case time and computing the depth of all the leaves has a O(n<sup>2</sup>) worst-case time complexity (see page 309 and [here](https://cs.stackexchange.com/questions/87336/maximum-sum-of-depths-of-all-external-nodes-in-a-binary-tree) for details).

Example of a tree for which computing all the leaves' depths would be n<sup>2</sup>/4 = O(n<sup>2</sup>):

<pre>
a<sub>1</sub>
├── a<sub>2</sub>
    ...
        ├── a<sub>n/2</sub>
            ├── b<sub>1</sub>
            ├── b<sub>2</sub>
            ...
            ├── b<sub>n/2</sub>
</pre>

On the other hand, using the recursive definition of the height yields a O(n) worst-case running time:

In [31]:
def height_optimal(node):
    if node.is_leaf():
        return 0
    else:
        return max((height_optimal(child) for child in node.children()))


## Binary trees

A binary tree is an **ordered** tree with **at most** 2 children per node, labeled left child/right child. The order is left child then right child.

A binary tree is said to be **proper** if each node has either 0 or 2 children.

A **level** d is the set of all nodes that have a depth d. A given level d has at most 2<sup>d</sup> nodes.

## Tree Traversal Algorithms

In [32]:
### Preorder traversal
def preorder(T, p):
    T.visit(p)
    for c in T.children(p):
        preorder(T, c)

### Postorder traversal
def postorder(T, p):
    for c in T.children(p):
        postorder(T, c)
    T.visit(p)

### Breadth-first traversal
def breadthfirst(T):
    Q=Queue()
    Q.enqueue(T.root())
    while not Q.is_empty():
        p=Q.dequeue()
        p.visit()
        for c in T.children(p):
            Q.enqueue(c)

### Inorder traversal
# Specific to binary trees
def inorder(T, p):
    if p.left():
        inorder(p.left())
    p.visit()
    if p.right():
        inorder(p.right())

# Priority Queues

Priority queues deal with objects consisting of a key *k* and a value *v*. The top priority objects have the minimal keys. They should be dequeued first.

## Implementation with lists

* Unsorted: fast insertion (O(1)) but linear dequeuing
* Sorted: linear insertion but fast dequeuing

## A compromise: Binary heaps

A binary heap is a binary tree with 2 additional properties

1) **Heap order**: the key stored at a given node p is greater than or equal to the key stored at p's parent (except for the root)

Consequences:
* The keys on a path from the root are in nondecreasing order
* The root key is minimal

2) **Complete Binary Tree**: a heap T with height h have its levels 1, ..., h-1 "full" (level i has 2<sup>i</sup> nodes) and the nodes on level h are on the leftmost part

[Detailed comparison of heaps and binary search trees](https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst)

### Fundamental characteristic of a heap

The height of a heap with n entries is E[log(n)].

### Insertion

Up-heap bubbling: insert new value at the bottom of the heap, then swap values with parent node if necessary. Respecting the **heap order** may take up to h swaps, that is O(log(n)).

### Remove min key

After removing the root (with minimal key by definition), replace it with the "last" node (bottom-right) and use **down-heap bubbling** to respect heap order.

Conclusion: insertions and removal both have O(log(n)) time complexity, whereas with list-based priority queues, one is always O(n) and the other one O(1).

# Maps, hash tables

*Not covered*

Hash functions, collision-handling schemes...

# Search trees

*Not covered* - See binary_tree.ipynb

# Graphs


