# Heaps

Heap property: any node is larger than its children (max-heap). It's useful to implement priority queues, find the max, or the N larger elements.

## Binary heap

A binary heap is a complete binary tree with heap property. Complete means every level is complete except the last one where elements are filled from left to right.

It's commonly implemented using an array, where left and right children of element at index n are stored in indices 2n+1 and 2n+2. Parent is at floor((n-1) / 2).

Element 0 is the max.

Inserting is done by appending to the back of the array and sifting up, i.e. swapping with its parent until heap property is restored. Its worst time is O(log n).

Deleting the max is done by replacing the first element with the last element, and sifting down until the heap property is restored. Its worst time is O(log n).

In [1]:
class Heap:
    """A max-heap."""
    
    def __init__(self):
        self.array = []
        self.len = 0
        
    def insert(self, key):
        """Inserts an element."""
        self.array.append(key)
        self.len += 1
        idx = self.len - 1
        while True:
            p = self._parent(idx)
            if p is None:
                return
            if self.array[p] >= key:
                return
            if self.array[p] < key:
                self.array[idx], self.array[p] = self.array[p], self.array[idx]
                idx = p

    def pop(self):
        """Removes and returns max element."""
        if self.len == 0:
            raise ValueError('Empty heap')
        if self.len == 1:
            self.len = 0
            return self.array.pop()
        top = self.array[0]
        last = self.array.pop()
        self.array[0] = last
        self.len -= 1
        idx = 0
        while True:
            left = self._left(idx)
            right = self._right(idx)
            # No child, done.
            if left is None and right is None:
                break
            # One child, necessary on the left.
            elif right is None:
                val = self.array[left]
                if val > last:
                    self.array[idx], self.array[left] = self.array[left], self.array[idx]
                    idx = left
                    continue
                else:
                    break
            # Two children. If necessary, sift down with the larger one.
            else:
                a = self.array[left]
                b = self.array[right]
                if a > b:
                    if a > last:
                        self.array[idx], self.array[left] = self.array[left], self.array[idx]
                        idx = left
                        continue
                    else:
                        break
                else:
                    if b > last:
                        self.array[idx], self.array[right] = self.array[right], self.array[idx]
                        idx = right
                        continue
                    else:
                        break
        return top
        
    def display(self):
        """Displays the tree."""
        if self.len == 0:
            print()
        else:
            lines, _, _, _ = self._display_aux(0)
            for line in lines:
                print(line)

    def display_array(self):
        """Displays the array used internally."""
        print('%r (%d)' % (self.array, self.len))

    def _parent(self, i):
        if i == 0:
            return None
        return (i - 1) // 2

    def _left(self, i):
        idx = 2 * i + 1
        if idx >= self.len:
            return None
        return idx

    def _right(self, i):
        idx = 2 * i + 2
        if idx >= self.len:
            return None
        return idx
        
    def _display_aux(self, idx):
        """Returns list of strings, width, height, and horizontal coordinate of the root."""
        left = self._left(idx)
        right = self._right(idx)
        
        # No child.
        if left is None and right is None:
            line = '%s' % self.array[idx]
            width = len(line)
            height = 1
            middle = width // 2
            return [line], width, height, middle
        
        # Only left child.
        if right is None:
            lines, n, p, x = self._display_aux(left)
            s = '%s' % self.array[idx]
            u = len(s)
            first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s
            second_line = x * ' ' + '/' + (n - x - 1 + u) * ' '
            shifted_lines = [line + u * ' ' for line in lines]
            return [first_line, second_line] + shifted_lines, n + u, p + 2, n + u // 2
        
        # Only right child.
        if left is None:
            lines, n, p, x = self._display_aux(right)
            s = '%s' % self.array[idx]
            u = len(s)
            first_line = s + x * '_' + (n - x) * ' '
            second_line = (u + x) * ' ' + '\\' + (n - x - 1) * ' '
            shifted_lines = [u * ' ' + line for line in lines]
            return [first_line, second_line] + shifted_lines, n + u, p + 2, u // 2
        
        # Two children.
        left_lines, n, p, x = self._display_aux(left)
        right_lines, m, q, y = self._display_aux(right)
        s = '%s' % self.array[idx]
        u = len(s)
        first_line = (x + 1) * ' ' + (n - x - 1) * '_' + s + y * '_' + (m - y) * ' '
        second_line = x * ' ' + '/' + (n - x - 1 + u + y) * ' ' + '\\' + (m - y - 1) * ' '
        if p < q:
            left_lines += [n * ' '] * (q - p)
        elif q < p:
            right_lines += [m * ' '] * (p - q)
        zipped_lines = zip(left_lines, right_lines)
        lines = [first_line, second_line] + [a + u * ' ' + b for a, b in zipped_lines]
        return lines, n + m + u, max(p, q) + 2, n + u // 2

In [2]:
h = Heap()
h.display()

h.insert(3)
h.display()

h.insert(5)
h.display()

h.insert(7)
h.display()

h.insert(1)
h.display()

h.insert(19)
h.display()

h.display_array()

print('Popping %r' % h.pop())
h.display()

print('Popping %r' % h.pop())
h.display()

print('Popping %r' % h.pop())
h.display()

print('Popping %r' % h.pop())
h.display()

print('Popping %r' % h.pop())
h.display()

h.display_array()


3
 5
/ 
3 
 7 
/ \
3 5
  7 
 / \
 3 5
/   
1   
  _19 
 /   \
 7   5
/ \   
1 3   
[19, 7, 5, 1, 3] (5)
Popping 19
  7 
 / \
 3 5
/   
1   
Popping 7
 5 
/ \
3 1
Popping 5
 3
/ 
1 
Popping 3
1
Popping 1

[] (0)


In [3]:
import random

h = Heap()
for _ in range(50):
    h.insert(random.randint(0, 100))
h.display()

for _ in range(20):
    h.pop()
h.display()

                              ____________________________99____________________               
                             /                                                  \              
              ______________99_______________                             _____92_______       
             /                               \                           /              \      
       _____88_______                 ______93______                ____87__         __79___   
      /              \               /              \              /        \       /       \  
   __68__         __78___         __84___        __87__         __84___    79_     64_     24_ 
  /      \       /       \       /       \      /      \       /       \  /   \   /   \   /   \
 61_    63_     72_     75_     81_     79_    69_    74_     65_     80  8  43  60  56  19  13
/   \  /   \   /   \   /   \   /   \   /   \  /   \  /   \   /   \   /                         
9  19  7  24  27  66  37  17  69  65  39

## Application: Heapsort

We heapify the list, then pop elements one by one.

In [4]:
def heap_sort(lst):
    h = Heap()
    for x in lst:
        h.insert(x)
    return [h.pop() for _ in range(h.len)]

In [5]:
import random

values = []
for _ in range(100):
    values.append(random.randint(1, 1000))
print(values)

[266, 266, 587, 596, 819, 527, 532, 314, 556, 691, 514, 384, 685, 99, 49, 907, 796, 602, 292, 747, 212, 676, 439, 53, 438, 572, 857, 704, 635, 84, 138, 6, 196, 479, 351, 748, 833, 871, 35, 992, 736, 128, 604, 755, 989, 84, 539, 971, 775, 120, 997, 241, 670, 763, 815, 97, 256, 287, 698, 510, 230, 450, 203, 64, 833, 596, 736, 66, 533, 700, 497, 263, 281, 864, 923, 666, 951, 267, 245, 760, 668, 472, 823, 287, 125, 620, 544, 320, 793, 310, 336, 838, 649, 902, 443, 633, 482, 684, 662, 637]


In [6]:
print(heap_sort(values))

[997, 992, 989, 971, 951, 923, 907, 902, 871, 864, 857, 838, 833, 833, 823, 819, 815, 796, 793, 775, 763, 760, 755, 748, 747, 736, 736, 704, 700, 698, 691, 685, 684, 676, 670, 668, 666, 662, 649, 637, 635, 633, 620, 604, 602, 596, 596, 587, 572, 556, 544, 539, 533, 532, 527, 514, 510, 497, 482, 479, 472, 450, 443, 439, 438, 384, 351, 336, 320, 314, 310, 292, 287, 287, 281, 267, 266, 266, 263, 256, 245, 241, 230, 212, 203, 196, 138, 128, 125, 120, 99, 97, 84, 84, 66, 64, 53, 49, 35, 6]


## Application: Priority queues

Priority queues support the following operations:

* pop_element_with_highest_priority()
* insert_element_with_priority(element, priority)
* is_empty()

A stack can be seen as a priority queue with priority monotonically increasing.

A queue can be seen as a priority queue with priority monotonically decreasing.

## Heapify in linear time

The above implementation of heapsort heapifies by inserting, i.e. sifting new values up, which is O(n log n). We can heapify in O(n) instead, by starting from an unsorted array and sifting down, starting from the leaves.

In [7]:
import math

def heapify(array):
    n = len(array)
    h = math.ceil(math.log(n + 1, 2))
    
    # Start at the level above the leaves.
    for k in range(h - 1, -1, -1):
        for i in range(2 ** k):
            idx = 2 ** k + i - 1
            done = False
            while not done:
                left = 2 * idx + 1
                right = 2 * idx + 2
                if left >= n and right >= n:
                    done = True
                    continue
                elif right >= n:
                    if array[left] > array[idx]:
                        array[left], array[idx] = array[idx], array[left]
                        idx = left
                        continue
                    else:
                        done = True
                        continue
                else:
                    if array[left] > array[right]:
                        if array[left] > array[idx]:
                            array[left], array[idx] = array[idx], array[left]
                            idx = left
                            continue
                        else:
                            done = True
                            continue
                    else:
                        if array[right] > array[idx]:
                            array[right], array[idx] = array[idx], array[right]
                            idx = right
                            continue
                        else:
                            done = True
                            continue
    
    return array

In [8]:
import random

values = []
for _ in range(50):
    values.append(random.randint(1, 100))

h = Heap()
h.array = values # not a heap
h.len = len(values)
h.display()

h.array = heapify(values)
h.display()

                              ______________________________86____________________               
                             /                                                    \              
                _____________7_______________                               ______7_______       
               /                             \                             /              \      
        ______26_______               ______90_______                 ____17___        __81___   
       /               \             /               \               /         \      /       \  
    __11___         __96__        __66___         __12___         __37___     67_    74_     52_ 
   /       \       /      \      /       \       /       \       /       \   /   \  /   \   /   \
  70_     28_     37_    28_    39_     89_     40_     40_     18_     80  67  19 45  68  13  76
 /   \   /   \   /   \  /   \  /   \   /   \   /   \   /   \   /   \   /                         
18  15  42  64  89  

## Variants

A 2-3 heap is a like a 2-3 tree, but with heap property. A 2-3 tree has 2-nodes which have 1 value and 2 children (unless they're leaves) and 3-nodes which have 2 values and 3 children. Searching in a 2-3 tree is similar to searching in a BST.

A beap or bi-parental heap is a heap where nodes have two parents. Its sides are O(√n) so algorithms like finding the max are O(√n). Finding the min is still O(1).

Binomial heaps (named after binomial coefficient because num elements of layer k for order n is C_n^k) have O(1) insert.

Fibonacci heaps allow O(1) merge, find min, decrease key, and insert, and O(log n) delete min.