## [Find Median from Running Data Stream](https://www.geeksforgeeks.org/median-of-stream-of-integers-running-integers/)

#### Given that integers are read from a data stream. Find the median of elements read so far in an efficient way. 

There are two cases for median on the basis of data set size.

If the data set has an odd number then the middle one will be consider as median.
If the data set has an even number then there is no distinct middle value and the median will be the arithmetic mean of the two middle values.

- Example 1:
    - Input Data Stream: 5, 15, 1, 3
    - Output: 5, 10,5, 4
    - Explanation:
        - After reading 1st element of stream – 5 -> median = 5
        - After reading 2nd element of stream – 5, 15 -> median = (5+15)/2 = 10
        - After reading 3rd element of stream – 5, 15, 1 -> median = 5
        - After reading 4th element of stream – 5, 15, 1, 3 -> median = (3+5)/2 = 4
- Example 2:
    - Input Data Stream: 2, 2, 2, 2
    - Output: 2, 2, 2, 2
    - Explanation:
        - After reading 1st element of stream – 2 -> median = 2
        - After reading 2nd element of stream – 2, 2 -> median = (2+2)/2 = 2
        - After reading 3rd element of stream – 2, 2, 2 -> median = 2
        - After reading 4th element of stream – 2, 2, 2, 2 -> median = (2+2)/2 = 2

**Method #1:** Binary Search + Shifting Elements
- Time Complexity: `O(n^2)`
    - The time complexity of the binarySearch function is O(log n) because it is a binary search algorithm that divides the search space in half with each iteration. The time complexity of the printMedian function is O(n^2) because for each element in the input array, it performs a binary search to find the correct position to insert the element, which takes O(log n) time, and then shifts the elements to the right, which takes O(n) time in the worst case.
- Space Complexity: `O(1)`
    - The space complexity of both functions is O(1) because they do not use any extra space that grows with the input size.

In [6]:
# Function to find position to insert current element of
# stream using binary search
def binary_search(arr, item, low, high):

	if (low >= high):
		return (low + 1) if (item > arr[low]) else low

	mid = (low + high) // 2

	if (item == arr[mid]):
		return mid + 1

	if (item > arr[mid]):
		return binary_search(arr, item, mid + 1, high)

	return binary_search(arr, item, low, mid - 1)

# Function to print median of stream of integers
def print_median(arr, n):

	i, j, pos, num = 0, 0, 0, 0
	count = 1

	print(f"Median after reading 1 element is {arr[0]}.0")

	for i in range(1, n):
		median = 0
		j = i - 1
		num = arr[i]

		# find position to insert current element in sorted
		# part of array
		pos = binary_search(arr, num, 0, j)

		# move elements to right to create space to insert
		# the current element
		while (j >= pos):
			arr[j + 1] = arr[j]
			j -= 1

		arr[j + 1] = num

		# increment count of sorted elements in array
		count += 1

		# if odd number of integers are read from stream
		# then middle element in sorted order is median
		# else average of middle elements is median
		if (count % 2 != 0):
			median = arr[count // 2] / 1

		else:
			median = (arr[(count // 2) - 1] + arr[count // 2]) / 2

		print(f"Median after reading {i + 1} elements is {median} ")


In [7]:
arr = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
n = len(arr)

print_median(arr, n)

Median after reading 1 element is 5.0
Median after reading 2 elements is 10.0 
Median after reading 3 elements is 5.0 
Median after reading 4 elements is 4.0 
Median after reading 5 elements is 3.0 
Median after reading 6 elements is 4.0 
Median after reading 7 elements is 5.0 
Median after reading 8 elements is 6.0 
Median after reading 9 elements is 7.0 
Median after reading 10 elements is 6.5 
Median after reading 11 elements is 7.0 
Median after reading 12 elements is 6.5 


**Method #2:** Using Heap
- Time Complexity: `O(n * log n)`
    - The time complexity of the given approach is O(N*log(N)), where N is the number of elements in the stream of data. This is because for each element in the stream, we perform heap operations which have a time complexity of O(log(N)).
- Space Complexity: `O(n)`
    - The space complexity of the approach is O(N), as we are storing all the elements of the stream in two min heaps of size N/2 each.

In [8]:
import heapq

def find_median_stream_heap(arr):
    max_heap = []  # stores the smaller half of numbers
    min_heap = []  # stores the larger half of numbers
    medians = []
    
    for num in arr:
        # Add number to max_heap
        heapq.heappush(max_heap, -num)
        
        # Balance heaps
        heapq.heappush(min_heap, -heapq.heappop(max_heap))
        if len(min_heap) > len(max_heap):
            heapq.heappush(max_heap, -heapq.heappop(min_heap))
        
        # Calculate median
        if len(max_heap) > len(min_heap):
            median = -max_heap[0]
        else:
            median = (-max_heap[0] + min_heap[0]) / 2
        medians.append(median)
    
    return medians

In [9]:
# # heapq always implements min-heap. To make it max-heap, we negate the numbers before inserting
# from heapq import heappush, heappop

# # Function to find the median of stream of data
# def find_median_stream_heap(arr):
#     # Declaring two heaps:
#     max_heap = []  # Max-heap (negated) for the smaller half (left side)
#     min_heap = []  # Min-heap for the larger half (right side)
    
#     for i in range(len(arr)):
#         # Add new element to the max-heap (negated to act as max-heap)
#         heappush(max_heap, -arr[i])                                                         # check the '-' sign

#         # Balance the heaps by moving the largest in max_heap to min_heap
#         heappush(min_heap, -heappop(max_heap))                                              # check the '-' sign

#         # If the min-heap has more elements, move the smallest from min_heap to max_heap
#         if len(min_heap) > len(max_heap):
#             heappush(max_heap, -heappop(min_heap))                                          # check the '-' sign

#         # Print the median
#         if len(max_heap) > len(min_heap):  # Max-heap has the median
#             print(-max_heap[0])
#         else:  # Even number of elements, median is the average of tops of both heaps
#             print((-max_heap[0] + min_heap[0]) / 2)

In [10]:
find_median_stream_heap(arr)

[1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5]