## RUNNING MEDIAN

Compute the running median of a sequence of numbers. That is, given a stream of numbers, print out the median of the list so far on each new element.Recall that the median of an even-numbered list is the average of the two middle numbers. For example, given the sequence [2, 1, 5, 7, 2, 0, 5], your algorithm should print out:

- 2 (the median of a single number is that number) 
- 1.5 [2,1] --> the median is 3/2 = 1.5
- 2 [2,1,5] --> the median is 2 ... 
- 3.5 [2,1,5,7] --> the median is (2+5)/2 = 7/2 = 3.5
- 2 etc. etc. 

I guess the idea that first comes to mind is to sort the list. And then take the sorted list and find the median.. of course without using any build-in median command from python. To do this manually, it's probably about looking at the length of the list, and then finding out what the position is for the median... or doing an average if there are two middle numbers. OK let's try that and see where this problem goes.


In [12]:
t_list = [2, 1, 5, 7, 2, 0, 5]

# a small note on the difference between list.sort() and sorted( list )
print(sorted(t_list)) # returns an iterable list
print(t_list) # so you can see that t_list is not changed

[0, 1, 2, 2, 5, 5, 7]
[2, 1, 5, 7, 2, 0, 5]


In [14]:
# list.sort() doesn't return an iterable list, it changes the list
t_list = [2, 1, 5, 7, 2, 0, 5]

print(t_list)
print(t_list.sort()) # returns None
print(t_list) 

[2, 1, 5, 7, 2, 0, 5]
None
[0, 1, 2, 2, 5, 5, 7]


In [33]:
# OK let's find the median of this ... odd length list
t_list = [2, 1, 5, 7, 2, 0, 5]

# odd numbered list... so len(t_list) is 7 ... and we are looking for index position

index_median = len(t_list)/2 - 0.5 # small little hack bc i can't remember the round-down idea
index_median = int(index_median) 
median = sorted(t_list)[index_median]
print(median) 

2


In [47]:
# Let's look at the median of an even numbered list... should be straightforward 
t_list_even = [2,1,5,6] # the median here is (2+5)/2 = 3.5

index_median = len(t_list_even)/2 -1 # this gets us to 4/2 - 1 = index position 1 
index_median = int(index_median) 
median = (sorted(t_list_even)[index_median] + sorted(t_list_even)[index_median + 1])/2 
print(median) 

3.5


In [49]:
# Great, now combine the two ways .. 
# it probably makes sense to re-write this and have numeric_list already be sorted somewhere... 

def running_median(numeric_list):
    
    if len(numeric_list) % 2 == 0: # then this is even
        index_median = int(len(numeric_list)/2 - 1)
        median = (sorted(numeric_list)[index_median] + sorted(numeric_list)[index_median + 1])/2 
        return median 
    elif len(numeric_list) % 2 == 1: # then this is odd
        index_median = int(len(numeric_list)/2 - 0.5)
        median = sorted(numeric_list)[index_median] 
        return median 
    else: "wasn't expecting that" 


In [53]:
running_median([2,1,5,6]) # should return 3.5
running_median([2, 1, 5, 7, 2, 0, 5]) # should return 2 
running_median([5]) # should return 5

5

### Now that we've finished the simple version, now we need to figure out how to return this through a list.. and deliver sequential results as it works its way through the list


In [110]:
def running_median_hard(hard_list): 
    '''
    - the ordered_list[: run_index] is the trickiest part of running_median_hard
    - remember that if you have [1,8,9] then [1,8,9][: 2] => [1,8]
    '''
    ordered_list = sorted(hard_list)
    run_index = 0 # needed?
    
    for i in range(len(ordered_list)):
        run_index = i + 1
        mini_call = ordered_list[: run_index]
        median = running_median(mini_call) 
        print(str(median) + " ... on list " + str(mini_call)) 
        

In [111]:
running_median_hard([15,1,8,9]) # the median here is (8+9)/2 = 17/2 = 8.5

1 ... on list [1]
4.5 ... on list [1, 8]
8 ... on list [1, 8, 9]
8.5 ... on list [1, 8, 9, 15]


### Let's use heaps in order to get this done as well (alternative solution) 
When we encounter a new element from the stream, we'll first add it to one of our heaps: the max-heap if the element is smaller than the median, or the mind-heap if it's bigger. We can make the max-heap the default heap if they're equal or if there are no elements... 

We rebalance if necessary by moving the "root" of the larger heap to the smaller one.. it's only necessary if the heap is larger than the other by more than 1 element. 

The median is just the root of the larger heap, or the average of the two roots if they're of equal size. 

In [1]:
def get_median(min_heap, max_heap): 
    if len(min_heap) > len(max_heap): 
        return min_heap.find_min()  # not sure of this 
    elif len(min_heap) < len(max_heap): 
        return max_heap.find_max()
    else: 
        min_root = min_heap.find_min()
        max_root = max_heap.find_max() 
        return (min_root + max_root)/2
    
    
def add(num, min_heap, max_heap): 
    # if empty, then just add it to the max heap
    if len(min_heap) + len(max_heap) <= 1: 
        max_heap.insert(num)
        return
    
    median = get_median(min_heap, max_heap) 
    if num > median: 
        # add it to the min heap 
        min_heap.insert(num)
    else: 
        max_heap.insert(num) 
        
def rebalance(min_heap, max_heap): 
    if len(min_heap) > len(max_heap) + 1: 
        root = min_heap.extract_min()
        max_heap.insert(root)
    elif len(max_heap) > len(min_heap) + 1: 
        root = max_heap.extract_max()
        min_heap.insert(root) 
        
def print_median(min_heap, max_heap): 
    print(get_median(min_heap, max_heap))
    
def running_median(stream):
    min_heap = minheap() 
    max_heap = maxheap() 
    for num in stream: 
        add(num, min_heap, max_heap) 
        rebalance(min_heap, max_heap) 
        print_median(min_heap, max_heap) 
        

    
