<a href="https://colab.research.google.com/github/lblogan14/data_structures_and_algorithms/blob/master/ch9_priority_queues.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#9.1 The Priority Queue Abstract Data Type

##9.1.1 Priorities
**Priority queue** is a collection of prioritized elements that allows arbitray element insertion, and allows the removal of the element that has first priority. When an element is
added to a priority queue, the user designates its priority by providing an associated
**key**. The element with the minimum key will be the next to be removed from the
queue.

##9.1.2 Priority Queue ADT
The priority queue ADT, `P,` is defined to support the following methods:
* `P.add(k, v)`: Insert an item with key `k` and value `v` into priority queue `P`.
* `P.min()`: Return a tuple, `(k,v)`, representing the key and value of an
item in priority queue `P` with minimum key (but do not remove
the item); an error occurs if the priority queue is empty.
* `P.remove_min()`: Remove an item with minimum key from priority queue P,
and return a tuple, `(k,v)`, representing the key and value of the
removed item; an error occurs if the priority queue is empty.
* `P.is_empty()`: Return True if priority queue `P` does not contain any items.
* `len(P)`: Return the number of items in priority queue `P`.

#9.2 Implementing a Priority Queue
Implement a priority queue by sotring its elements in a positional list $L$

##9.2.1 The Composition Design Pattern
For priority queues, the **composition design pattern** is applied to store items internally as pairs comsisting of a key $k$ and a value $v$. The `PriorityQueueBase` class is privoded to implement this concept for all priority queue implementations. This base class includes a definition for a nested `_Item` class.

In [0]:
class PriorityQueueBase:
  '''Abstract base class for a priority queue'''
  
  class _Item:
    '''Lightweight composite to store priority queue items'''
    __slots__ = '_key', '_value'
    
    def __init__(self, k, v):
      self._key = k
      self._value = v
      
    def __It__(self, other):
      return self._key < other._key # compare items based on their keys
    
  def is_empty(self): # concrete method assuming abstract len
    '''Return True if the priority queue is empty'''
    return len(self) == 0

##9.2.2 Implementation with an Unsorted List

In [0]:
class UnsortedPriorityQueue(PriorityQueueBase): # base class defines _Item
  '''A min-oriented priority queue implemented with an unsorted list'''
  
  def _find_min(self): # nonpublic utility
    '''Return Position of item with minimum key'''
    if self.is_empty(): # is_empty inherited from base class
      raise Empty('Priority queue is empty')
    small = self._data.first()
    walk = self._data.after(small)
    while walk is not None:
      if walk.element() < small.element():
        small = walk
      walk = self._data.after(walk)
    return small
  
  def __init__(self):
    '''Create a new empty Priority Queue'''
    self._data = PositionalList()
    
  def __len__(self):
    '''Return the number if items in the priority queue'''
    return len(self._data)
  
  def add(self, key, value):
    '''Add a key-value pair'''
    self._data.add_last(self.Item(key, value))
    
  def min(self):
    '''Return but do not remove (k,v) tuple with minimum key'''
    p = self._find_min()
    item = p.element()
    return (item._key, item._value)
  
  def remove_min(self):
    '''Remove and return (k,v) tuple with minimum key'''
    p = self._find_min()
    item = self._data.delete(p)
    return (item._key, item._value)

The `UnsortedPriorityQueue` class inherits from the `PriorityQueueBase` class. The key-value pairs are presented as composites, using instances of the inherited `_Item` class. These items are stored within a `PositionalList` class, identified as the `_data` member. The positional list here is assumed to be implemented with a doubly-linked list.

##9.2.3 Implementation with a Sorted List
maintains entries sorted by nondecreasing keys. This ensures that the first element of the list is an entry with the smallest key.

In [0]:
class SortedPriorityQueue(PriorityQueueBase): # base class defines _Item
  '''A min-oriented priority queue implemented with a sorted list'''
  
  def __init__(self):
    '''Create a new empty Priority Queue'''
    self._data = PositionalList()
    
  def __len__(self):
    '''Return the number of items in the priority queue'''
    return len(self._data)
  
  def add(self, key, value):
    '''Add a key-value pair'''
    newest = self._Item(key, value) # make new item instance
    walk = self._data.last() # walk backward looking for smaller key
    while walk is not None and newest < walk.element():
      walk = self._data.before(walk)
    if walk is None:
      self._data.add_first(newest) # new key is smallest
    else:
      self._data.add_after(walk, newest) # newest goes after walk
      
  def min(self):
    '''Return but do not remove (k,v) tuple with minimum key'''
    if self.is_empty():
      raise Empty('Priority queue is empty')
    p = self._data.first()
    item = p.element()
    return (item._key, item._value)
  
  def remove_min(self):
    '''Remove and return (k,v) tuple with minimum key'''
    if self.is_empty():
      raise Empty('Priority queue is empty')
      item = self._data.delete(self._data.first())
      return (item._key, item._value)

The `min` and `remove_min` methods in the `SortedPriorityQueue` class are straightforward given knowledge that the first element of the list has a minimum key.

###Comparison between two list-based implementations
Operation | Unsorted List | Sorted List
-- | -- | --
`len` | $O(1)$ | $O(1)$
`is_empty`|$O(1)$|$O(1)$
`add`|$O(1)$|$O(n)$
`min`|$O(n)$|$O(1)$
`remove_min`|$O(n)$|$O(1)$

When using an unsorted list to store entries,
we can perform insertions in $O(1)$ time, but finding or removing an element with
minimum key requires an $O(n)$-time loop through the entire collection. In contrast,
if using a sorted list, we can trivially find or remove the minimum element in $O(1)$
time, but adding a new element to the queue may require $O(n)$ time to restore the
sorted order.

#9.3 Heaps
The **binary heap** data structure allows users to perform both insertions and removals in logarithmic time, which is a significant improvement over the list-based implementations.

##9.3.1 The Heap Data Structure
A **heap** is a binary tree $T$ that stores a collection of items at its positions and that satisfies two additional properties: a relational property defined in terms of the way keys are stored in $T$ and a structural property defined in terms of the shape of $T$.

***Heap-Order Property***: In a heap $T$, for every position $p$ other than the root, the key stored at $p$ is greater than or equal to the key stored at $p$'s parent.

Thus, the keys encountered on a path from the root to a leaf of $T$ are in nondecreasing order. Also, a minimum key is always stored at the root of $T$, which makes it easy to locate such an item when `min` or `remove_min` is called, as it is informally said to be "at the top of the heap".

To have a satisfying efficiency, the heap $T$ is required to have as small a height as possible. Thus, the **complete binary tree property** comes into play.

**Complete Binary Tree Property**: A heap $T$ with height $h$ is a **complete** binary tree if levels 0, 1, 2, ..., $h$-1 of $T$ have the maximum number of nodes possible (namely, level $i$ has $2^i$ nodes, for $0\leq i \leq h-1$ ) and the remaining nodes at level $h$ reside in the leftmost possible positions at that level.

###The Height of a Heap
Let $h$ denote the height of $T$. *A heap $T$ storing $n$ entries has height $h=\lfloor\log n\rfloor$.*

##9.3.2 Implementing a Priority Queue with a Heap

###Adding an Item to the Heap
We store the pair $(k,v)$ as an item at a new node of the tree. To maintain
the **complete binary tree property**, that new node should be placed at a position p
just beyond the rightmost node at the bottom level of the tree, or as the leftmost
position of a new level, if the bottom level is already full (or if the heap is empty).

###Up-Heap Bubbling After an Insertion
After adding the item to the heap, the tree $T$ is complete, but it may violate the **heap-order property**. Thus, unless position $p$ is the root of $T$, we compare the key at position $p$ to that of $p$'s parent, which we denote as $q$. If key $k_p\geq k_q$, the heap-order property is satisfied and the algorithm terminates. If instead $k_p\leq k_q$, then we need to restore the heap-order property, which can be locally achieved by swapping the entries stored at positions $p$ and $q$. This swap causes the new item to move up one level. The swap process continues until no violation of the heap-order property occurs.

**Up-heap bubbling**: the upward movement of the newly inserted entry by means of swaps

###Removing the Item with Minimum Key
The entry with the smallest key is stored at the root $r$ of $T$. To ensure the shape of the heap respects the **complete binary tree property** by deleting the leaf at the last position $p$ of $T$, defined as the rightmost position $p$, we copy it to the root $r$ (in place of the item with minimum key that is being removed by the operation)

###Down-Heap Bubbling After a Removal
For even though $T$ is now complete, it likely violates the heap-order property. If $T$ has only one node (the root), then the heap-order property is trivially satisfied and the algorithm terminates. Otherwise, we distinguish two cases, where $p$ initially denotes the root of $T$:
* If $p$ has no right child, let $c$ be the left child of $p$.
* Otherwise ($p$ has both children), let $c$ be a child of $p$ with minimal key.

If key $k_p \leq k_c$, the heap-order property is satisfied and the algorithm terminates. If
instead $k_p >k_c$, then we need to restore the heap-order property. This can be locally
achieved by swapping the entries stored at $p$ and $c$.

**Down-heap bubbling**: Having restored the heap-order property for node $p$ relative to its children, there
may be a violation of this property at $c$; hence, we may have to continue swapping
down T until no violation of the heap-order property occurs.

##9.3.3 Array-Based Representation of a Complete Binary Tree
The elements of $T$ are stored in an array-based list $A$ such that the element at position $p$ in $T$ is stored in $A$ with index equal to the level number $f(p)$ of $p$, defined as follows:
* If $p$ is the root of $T$, then $f(p)=0$
* If $p$ is the left child of position $q$, then $f(p)=2f(q)+1$
* If $p$ is the right child of position $q$, then $f(p)=2f(q)+2$

With this implementation, the elements of $T$ have contiguous indices in the range $[0, n-1]$ and the last position of $T$ is always at index $n-1$, where $n$ is the number of positions of $T$.

Implementing a priority queue using an array-based heap representation allows
us to avoid some complexities of a node-based tree structure

##9.3.4 Python Heap Implementation

In [0]:
class HeapPriorityQueue(PriorityQueueBase): # base class defines _Item
  '''A min-oriented priority queue implemented with a binary heap'''
  #-------------------nonpublic behaviors---------------------------
  def _parent(self, j):
    return (j-1) // 2
  
  def _left(self, j):
    return 2*j + 1
  
  def _right(self, j):
    return 2*j + 2
  
  def _has_left(self, j):
    return self._left(j) < len(self._data) # index beyond end of list?
  
  def _has_right(self, j):
    return self._right(j) < len(self._data) # index beyond end of list?
  
  def _swap(self, i, j):
    '''Swap the elements at indices i and j of array'''
    self._data[i], self._data[j] = self._data[j], self._data[i]
    
  def _upheap(self, j):
    parent = self._parent(j)
    if j>0 and self._data[j] < self._data[parent]:
      self._swap(j, parent)
      self._upheap(parent) # recur at position of parent
      
  def _downheap(self, j):
    if self._has_left(j):
      left = self._left(j)
      small_child = left # although right may be smaller
      if self._has_right(j):
        right = self._right(j)
        if self._data[right] < self._data[left]:
          small._child = right
      if self._data[small_child] < self._data[j]:
        self._swap(j, small_child)
        self._downheap(small_child) # recur at position of small child
        
  #-----------------------public behaviors-------------------------------
  def __init__(self):
    '''Create a new empty Priority Queue'''
    self._data = []
    
  def __len__(self):
    '''Return the number of items in the priority queue'''
    return len(self._data)
  
  def add(self, key, value):
    '''Add a key-value pair to the priority queue'''
    self._data.append(self._Item(key, value))
    self._upheap(len(self._data) - 1) # upheap newly added position
    
  def min(self):
    '''Return but do not remove (k,v) tuple with minimum key
    
    Raise Empty exception if empty
    '''
    if self.is_empty():
      raise Empty('Priority queue is empty')
    item = self._data[0]
    return (item._key, item._value)
  
  def remove_min(self):
    '''Remove and return (k,v) tuple with minimum key
    
    Raise EMpty exception if empty
    '''
    if self.is_empty():
      raise Empty('Priority queue is empty')
    self._swap(0, len(self._data) − 1) # put minimum item at the end
    item = self._data.pop() # and remove it from the list;
    self._downheap(0) # then fix new root
    return (item._key, item._value)

##9.3.5 Analysis of a Heap-Based Priority Queue
Each of the priority queue ADT methods can be performed in $O(1)$ or
in $O(\log n)$ time, where $n$ is the number of entries at the time the method is executed

* The heap $T$ has $n$ nodes, each storing a reference to a key-value pair.
* The height of heap $T$ is $O(\log n)$, since $T$ is complete.
* The `min` operation runs in $O(1)$ because the root of the tree contains such an
element.
* Locating the last position of a heap, as required for `add` and `remove_min`,
can be performed in $O(1)$ time for an array-based representation, or $O(\log n)$
time for a linked-tree representation.
* In the worst case, up-heap and down-heap bubbling perform a number of
swaps equal to the height of $T$.

##9.3.6 Bottom-Up Heap Construction
If we start with an initially empty heap, $n$ successive calls to the `add` operation will run in $O(n\log n)$ time in the worst case.

If all $n$ key-value pairs to be stored in the heap are given in advance, such as during the first phase of the heap-sort algorithm, there is an alternative **bottom_up** construction methods that runs in $O(n)$ time. Heap-sort still requries $O(n\log n)$ time because of the second phase in which we repeatedly remove the remaining element with smallest key.

Assume the number of keys, $n$, is an integer such that $n=2*{h+1}-1$. That is, the heap is a complete binary tree with every level being full, so the heap has height $h=\log(n+1)-1$.

Viewed nonrecursively, bottom-up heap construction consists of the following $h+1=\log(n+1)$ steps:
1. Construct $(n+1)/2$ elementary heaps storing one entry each.
2. Form $(n+1)/4$ heaps, each storing three entries, by joinin pairs of elementary heaps and adding a new entry. The new entry is placed at the root and may have to be swapped with the entry stored at a child to preserve the heap-order property.
3. Form $(n+1)/8$ heaps, each storing 7 entries, by joining pairs of 3-entry heaps (constructed in the previous step) and adding a new entry. The new entry is placed initially at the root, but may have to move down with a down-heap bubbling to preserve the heap-order property
4. In the generic $i$th  step, $2\leq i\leq h$, form $(n+1)/2^i$ heaps, each storing $2^i-1$ entries, by joining pairs of heaps storing $(2^{i-1}-1)$ entries (constructed in the previous step) and adding a new entry. The new entry is placed initially at the root but may have to move down with a down-heap bubbling to preserve the heap-order property.
5. In the last step, $h+1$, form the final heap, storing all the
$n$ entries, by joining two heaps storing $(n−1)/2$ entries (constructed in the
previous step) and adding a new entry. The new entry is placed initially at
the root, but may have to move down with a down-heap bubbling to preserve
the heap-order property.

In [0]:
class HeapPriorityQueue(PriorityQueueBase): # base class defines _Item
  '''A min-oriented priority queue implemented with a binary heap
  bottom-up construction
  '''
  #-------------------nonpublic behaviors---------------------------
  def _parent(self, j):
    return (j-1) // 2
  
  def _left(self, j):
    return 2*j + 1
  
  def _right(self, j):
    return 2*j + 2
  
  def _has_left(self, j):
    return self._left(j) < len(self._data) # index beyond end of list?
  
  def _has_right(self, j):
    return self._right(j) < len(self._data) # index beyond end of list?
  
  def _swap(self, i, j):
    '''Swap the elements at indices i and j of array'''
    self._data[i], self._data[j] = self._data[j], self._data[i]
    
  def _upheap(self, j):
    parent = self._parent(j)
    if j>0 and self._data[j] < self._data[parent]:
      self._swap(j, parent)
      self._upheap(parent) # recur at position of parent
      
  def _downheap(self, j):
    if self._has_left(j):
      left = self._left(j)
      small_child = left # although right may be smaller
      if self._has_right(j):
        right = self._right(j)
        if self._data[right] < self._data[left]:
          small._child = right
      if self._data[small_child] < self._data[j]:
        self._swap(j, small_child)
        self._downheap(small_child) # recur at position of small child
        
  #-----------------------public behaviors-------------------------------
  def __init__(self, contents=()):
    '''Create a new empty Priority Queue
    
    By default, queue will be empty. If contents is given, it should be as an
    iterable sequence of (k,v) typles specifying the initial contents
    '''
    self._data = [ self._Item(k,v) for k,v in contents ] # empty by default
    if len(self._data) > 1:
      self._heapify()
      
  def _heapify(self):
    start = self._parent(len(self) - 1) # start at PARENT of last leaf
    for j in range(start, -1, -1): # going to and including the root
      self._downheap(j)
    
  def __len__(self):
    '''Return the number of items in the priority queue'''
    return len(self._data)
  
  def add(self, key, value):
    '''Add a key-value pair to the priority queue'''
    self._data.append(self._Item(key, value))
    self._upheap(len(self._data) - 1) # upheap newly added position
    
  def min(self):
    '''Return but do not remove (k,v) tuple with minimum key
    
    Raise Empty exception if empty
    '''
    if self.is_empty():
      raise Empty('Priority queue is empty')
    item = self._data[0]
    return (item._key, item._value)
  
  def remove_min(self):
    '''Remove and return (k,v) tuple with minimum key
    
    Raise EMpty exception if empty
    '''
    if self.is_empty():
      raise Empty('Priority queue is empty')
    self._swap(0, len(self._data) − 1) # put minimum item at the end
    item = self._data.pop() # and remove it from the list;
    self._downheap(0) # then fix new root
    return (item._key, item._value)

With the array-based representation of a heap, if we initially store all $n4 items in
arbitrary order within the array, we can implement the bottom-up heap construction
process with a single loop that makes a call to `_downheap` from each position of
the tree, as long as those calls are ordered starting with the deepest level and ending
with the root of the tree.

Bottom-up heap construction is asymptotically faster than incrementally inserting
$n$ keys into an initially empty heap.

*Bottom-up construction of a heap with $n$ entries takes $O(n)$ time, assuming two keys can be compared in $O(1)$ time.*

##9.3.7 Python's `heapq` Module
`heapq` module provides support for heap-based priority queues but does not provide any priority queue class; it only provides functions that allow a standard Python list to be managed as a heap.


#9.4 Sorting with a Priority Queue
Assume $C$ is a positional list. Use an original element of the collection as both a key and value when calling `P.add(element, element)`

In [0]:
def pq_sort(C):
  '''Sort a collection of elements stored in a positional list'''
  n = len(C)
  P = PriorityQueue()
  for j in range(n):
    element = C.delete(C.first())
    P.add(element, element) # use element as key and value
  for j in range(n):
    (k,v) = P.remove_min()
    C.add_last(v) # store smallest remaining element in C

##9.4.1 Selection-Sort and Insertion-Sort

###Selection-Sort
If we implement $P$ with an unsorted list, then Phase 1 of `pq_sort` takes $O(n)$ time,
for we can add each element in $O(1)$ time. In Phase 2, the running time of each
`remove_min` operation is proportional to the size of $P$. Thus, the bottleneck computation
is the repeated “selection” of the minimum element in Phase 2. The total time needed for the second phase is $O(n^2)$ as does the entire selection-sort algorithm.

###Insertion-Sort
If we implement the priority queue $P$ using a sorted list, then we improve the running
time of Phase 2 to $O(n)$, for each `remove_min` operation on $P$ now takes $O(1)$
time. Now, Phase 1 becomes the bottleneck for the running time, since, in the worst case, each `add` operation takes time proportional to the current size of $P$. Phase 1 takes $O(n^2)$ time in the worst case. However, unlike selection-sort, insertion-sort
has a *best-case* running time of O(n).

##9.4.2 Heap-Sort
During Phase 1, the $i$th `add` operation takes $O(\log i)$ time, since the heap has $i$
entries after the operation is performed. Therefore this phase takes $O(n\log n)$ time.
(It could be improved to $O(n)$ with the bottom-up heap construction). \\
During the second phase of `pq_sort`, the $j$th `remove_min` operation runs in
$O(\log(n− j +1))$, since the heap has $n− j +1$ entries at the time the operation
is performed. Summing over all $j$, this phase takes $O(n\log n)$ time, so the entire
priority-queue sorting algorithm runs in $O(n\log n)$ time when we use a heap to implement
the priority queue.

*The heap-sort algorithm sorts a collection $C$ of $n$ elements in $O(n\log n)$ time, assuming two elements of $C$ can be compared in $O(1)$ time.*