## Chapter 10 / Problem 5 - Online Median

You want to compute the running median of a sequence of numbers.  The sequence is presented to you in a streaming fashion -- you cannot back up to read an earlier value, and you need to output the median after reding in each new element for example:


| Stream Input | Median | Stream so Far       | Sorted Values          |
|  ---         |  ---   | ---                 | ---                    |
|   1          |   1    | 1                   | **1**                  |
|   0          |  0.5   | 1, 0                | **0, 1**               |
|   3          |   1    | 1, 0, 3             | 0, **1**, 3            |
|   5          |   2    | 1, 0, 3, 5          | 0, **1, 3**, 5         |
|   2          |   2    | 1, 0, 3, 5, 2       | 0, 1, **2**, 3, 5      |
|   0          |   1.5  | 1, 0, 3, 5, 2, 0    | 0, 0, **1, 2**, 3, 5   |
|   1          |   1    | 1, 0, 3, 5, 2, 0, 1 | 0, 0, 1,**1**, 2, 3, 5 |

Design an algorithm for computing the running median of a sequence.

***Hint***: *Avoid look at all values each time you read a new value*


### Solution

It's important to remember that the median is the **middle** of the sorted stream.  We don't want to re-sort the sorted stream every time the stream grows, so we need to maintain a **middle**.  By using two heaps, a min-heap and a max-heap (min-heap with inverted values), we can more easily find the **middle**.  Let's take a look at the same stream, only with two heaps included:

| Stream Input | Median | Stream so Far       | Sorted Values          | Min Heap | Max Heap   |
|  ---         |  ---   | ---                 | ---                    | ---      | ---        |
|   1          |   1    | 1                   | **1**                  | **1**    | none       |
|   0          |  0.5   | 1, 0                | **0, 1**               | **0**    | **-1**     |
|   3          |   1    | 1, 0, 3             | 0, **1**, 3            | **1**, 3 | 0          |
|   5          |   2    | 1, 0, 3, 5          | 0, **1, 3**, 5         | **3**, 5 | **-1**, 0  |
|   2          |   2    | 1, 0, 3, 5, 2       | 0, 1, **2**, 3, 5      |          |            |
|   0          |   1.5  | 1, 0, 3, 5, 2, 0    | 0, 0, **1, 2**, 3, 5   |          |            |
|   1          |   1    | 1, 0, 3, 5, 2, 0, 1 | 0, 0, 1,**1**, 2, 3, 5 |          |            |

In [6]:
import heapq

global_result = []
def online_median(sequence):
    min_heap = []
    max_heap = []
    count = 0
    for x in sequence:
        heapq.heappush(max_heap, -heapq.heappushpop(min_heap, x))
        # Ensure min_heap and max_heap have equal number of elements if an even
        # number of elements is read; otherwise, min_heap must have one more
        # element than max_heap.
        if len(max_heap) > len(min_heap):
            heapq.heappush(min_heap, -heapq.heappop(max_heap))
        count += 1
        # @exclude
        if count % 2 == 0:
            global_result.append((min_heap[0] - max_heap[0]) / 2)
        else:
            global_result.append(min_heap[0])

def run_tests():
    tests = [
        ([1, 0, 3, 5, 2, 0, 1], [1, 0.5, 1, 2, 2, 1.5, 1]),
        ([5, 4, 3, 2, 1], [5, 4.5, 4, 3.5, 3]),
        ([1, 2, 3, 4, 5], [1, 1.5, 2, 2.5, 3])
    ]
    for stream, answer in tests:
        online_median(iter(stream))
        print(f"stream: {stream}, answer: {answer}")
        assert global_result == answer
        global_result.clear()



run_tests()

stream: [1, 0, 3, 5, 2, 0, 1], answer: [1, 0.5, 1, 2, 2, 1.5, 1]
stream: [5, 4, 3, 2, 1], answer: [5, 4.5, 4, 3.5, 3]
stream: [1, 2, 3, 4, 5], answer: [1, 1.5, 2, 2.5, 3]
