# Running Median

## Problem

The median of a dataset of integers is the midpoint value of the dataset for which an equal number of integers are less than and greater than the value. To find the median, you must first sort your dataset of integers in non-decreasing order, then:

If your dataset contains an odd number of elements, the median is the middle element of the sorted sample. In the sorted dataset $\{1, 2, 3\}$, $2$ is the median.
If your dataset contains an even number of elements, the median is the average of the two middle elements of the sorted sample. In the sorted dataset $\{1, 2, 3, 4\}$, $\frac{2 + 3}{2} = 2.5$ is the median.

Given an input stream of  integers, you must perform the following task for each  integer:

1. Add the $i^{th}$ integer to a running list of integers.
2. Find the median of the updated list (i.e., for the first $i^{th}$ element through the  element).

3. Print the list's updated median on a new line. The printed value must be a double-precision number scaled to 1 decimal place (i.e., $12.3$  format).

## Solution 1

This solution will use 2 heaps, a known data structure.

In [1]:
import operator

In [2]:
class Heap:
    ''' A simple heap data structure that only allows insertions 


    '''

    def __init__(self, comparator=operator.lt):
        '''
            comparator(p, c): comparator function, returns true if
            the relation between p and c violate the heap structure
        '''
        self.heapList = [None]
        self.compare = comparator

    def swim(self, k):
        heapList = self.heapList
        while k > 1 and self.compare(heapList[k // 2], heapList[k]):
            heapList[k // 2], heapList[k] = heapList[k], heapList[k // 2]
            k = k // 2

    def sink(self, k):
        heapList = self.heapList
        N = len(heapList) - 1
        while 2 * k <= N:
            j = 2 * k
            if j < N and self.compare(heapList[j], heapList[j + 1]):
                j += 1
            if not self.compare(heapList[k], heapList[j]):
                break
            heapList[k], heapList[j] = heapList[j], heapList[k]
            k = j

    def insert(self, item):
        self.heapList.append(item)
        self.swim(len(self.heapList) - 1)

    def removeRoot(self):
        heapList = self.heapList
        old_root = heapList[1]
        heapList[1] = heapList.pop()
        self.sink(1)
        return old_root

    def getRoot(self):
        if len(self.heapList) > 1:
            return self.heapList[1]
        else: 
            return None

    def getSize(self):
        return len(self.heapList)

The ideia is split the list in half and mantain each half in a heap. The heap with the lower half has the maximum at the root and heap with the upper half the minimun. The ideia is keep the root of the lower half heap lesser than the root of the upper half heap and the number of elements of both with a max diference of 1 element. The median will be the mean of the two roots in case both heaps has the same amount of elements or the root of the heap with more elements.

In [3]:
class ListWithMedian:
    """
        A list that provides the running median as you insert elements in it
    """

    def __init__(self):
        self.maxHeap = Heap(operator.lt)
        self.minHeap = Heap(operator.gt)
        self.N = 0

    def balanceHeaps(self):
        """
        Auxiliary method to keep max diference among number of 
        heaps elements lesser than 1        

        """
        maxHeap = self.maxHeap
        minHeap = self.minHeap
        if (maxHeap.getSize() + 1) < minHeap.getSize():
            maxHeap.insert(minHeap.removeRoot())
        elif maxHeap.getSize() > (minHeap.getSize() + 1):
            minHeap.insert(maxHeap.removeRoot())

    def insert(self, item):
        """
        Inserts a item in the list of numbers an return the median       

        """
        maxHeap = self.maxHeap
        minHeap = self.minHeap

        if (maxHeap.getRoot() is not None) and item < maxHeap.getRoot():
            maxHeap.insert(item)
        elif (minHeap.getRoot() is not None) and item > minHeap.getRoot():
            minHeap.insert(item)
        elif maxHeap.getSize() == minHeap.getSize():
            median = item
            maxHeap.insert(item)
            return median
        elif maxHeap.getSize() < minHeap.getSize():
            maxHeap.insert(item)
            median = (maxHeap.getRoot() + minHeap.getRoot()) / 2
            return median
        elif maxHeap.getSize() > minHeap.getSize():
            minHeap.insert(item)
            median = (maxHeap.getRoot() + minHeap.getRoot()) / 2
            return median

        self.balanceHeaps()

        if maxHeap.getSize() == minHeap.getSize():
            median = (maxHeap.getRoot() + minHeap.getRoot()) / 2
            return median
        elif maxHeap.getSize() > minHeap.getSize():
            median = maxHeap.getRoot()
            return median
        elif maxHeap.getSize() < minHeap.getSize():
            median = minHeap.getRoot()
            return median

In [4]:
import unittest

class TestRunningMedian(unittest.TestCase):    

    def test_RunningMedian_example_1(self):
        """Test Running median."""
        testlist = ListWithMedian()
        actual = []
        actual.append(testlist.insert(12))
        actual.append(testlist.insert(4))
        actual.append(testlist.insert(5))
        actual.append(testlist.insert(3))
        actual.append(testlist.insert(8))
        actual.append(testlist.insert(7))
        
        expected = [12.0, 8.0, 5.0, 4.5, 5.0, 6.0]
        self.assertEqual(expected, actual)

    
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

.
----------------------------------------------------------------------
Ran 1 test in 0.003s

OK


## Discussion

I choosed to implement the heap myself as an exercise, would be possible to use `heapq` python module as an alternative, although it does not provide functions as flexible as the object I implemented.

The time complexity of an insertion in a heap is O(log n) in the case we have to swim the item all way up. So is to remove the root in case we have sink the last element all way down. Therefore to calculate the median of n elements we have a O(n log n) algorithm.