# Heaps

Heaps are binary trees with a specific ordering criterion:
- in min-heaps, the child nodes are always bigger
- in max-heaps, the child nodes are always smaller

This data structure is therefore useful if we have to keep track of the minimum or maximum over time. The heap operations are:

- push: O(logN)
- pop: O(logN)
- get min/max: constant
- heapify (i.e. turn an array into a heap): O(N)
- nlargest: O(log n)
- nsmallest: O(log n)

In [15]:
from heapq import heapify, heappush, heappop, nlargest, nsmallest

In [18]:
heap = [6,2,9,4,7,1]
heapify(heap)
for _ in range(len(heap)):
    print( heappop(heap) )

1
2
4
6
7
9


In [14]:
# of course, this is equivalent to: 

heap = []
for i in [6,2,9,4,7,1]:
    heappush(heap,i)
    
for _ in range(len(heap)):
    print( heappop(heap) )

1
2
4
6
7
9


Note there's no implementation of max-heap in python, but instead we can simply negate the min-heap to create a proxy-max heap:

In [11]:
heap = [6,2,9,4,7,1]
heap = [-n for n in heap]
heapify(heap)
for _ in range(len(heap)):
    print( -heappop(heap) )

9
7
6
4
2
1


## Using nsmallest and nlargest
In python, we can simply call nsmallest() and nlargest on a python list. Under the hood, the list will be converted into a min/max heap of size n, so that we can then pop from the heap n times to get the desired output. 

In [33]:
heap = [6,2,9,4,7,1]
print( nsmallest(3,heap) )
print( nlargest(3,heap) )

[1, 2, 4]
[9, 7, 6]


## Problem: get the top-k most frequent elements in a list

We could solve this problem with frequency counting + sorting, which would be O(NlogN), where N is the number of unique elements in the list. However, we actually don't need to sort everything, instead we only need the top-k elements. That's why a heap-based solution wins here. The solution shown below runs with O(klogk)

In [40]:
from collections import Counter
def topk(nums,k):
    counts = Counter(nums)
    return nlargest(k,counts.keys(), key=lambda n: counts[n])

topk([1,1,1,2,2,3],2)

[1, 2]