# Question 377

## Description

Given an array of numbers `arr and a window of size `k`, print out the median of each window of size `k` starting from the left and moving right by one position each time.

For example, given the following array and `k = 3`:

`[-1, 5, 13, 8, 2, 3, 3, 1]`

Your function should print out the following:

```text
5 <- median of [-1, 5, 13]
8 <- median of [5, 13, 8]
8 <- median of [13, 8, 2]
3 <- median of [8, 2, 3]
3 <- median of [2, 3, 3]
3 <- median of [3, 3, 1]
```


To solve this problem, we'll slide a window of size `k` over the array `arr`, and for each window, we'll compute its median and print it out. To compute the median efficiently for each window, we can use a data structure like two heaps.

The two heaps approach consists of maintaining two heaps:

- A max heap for the numbers less than the median, and
- A min heap for the numbers greater than the median.

The median is then either the top of the max heap (if the max heap has more elements) or the average of the tops of the two heaps (if the heaps have equal size).

Below is a Python implementation of the solution using the two heaps approach:

This code prints the median for each window of size `k` in the input array `arr` using the two heaps approach.


In [16]:
import heapq  # min-heap functionality
from collections import deque, defaultdict


def get_medians(arr, k):
    def balance_heaps():
        # The "lazy deletion" technique: If the top elements of the heaps
        # are no longer in our window, we remove them.
        # This is more efficient than directly removing an element from the heap.
        while (
            lowers and counts[-lowers[0]] <= 0
        ):  # Check if the top of max-heap is invalid
            heapq.heappop(lowers)
        while (
            highers and counts[highers[0]] <= 0
        ):  # Check if the top of min-heap is invalid
            heapq.heappop(highers)

        # Balance the heaps: We want the heaps to be roughly equal in size.
        # If they're imbalanced, we pop from one heap and push to the other.
        while len(lowers) > len(highers):
            heapq.heappush(highers, -heapq.heappop(lowers))
        while len(highers) > len(lowers) + 1:
            heapq.heappush(lowers, -heapq.heappop(highers))

    def get_median():
        # If the heaps are of equal size, the median is the average of the two tops.
        # Otherwise, it's the top of the heap that has one more element.
        if len(highers) > len(lowers):
            return float(highers[0])
        return (-lowers[0] + highers[0]) / 2.0

    # Initialization:
    # We use a max-heap for the lower half and a min-heap for the upper half.
    # This allows easy access to the median.
    lowers = []  # max-heap (we invert numbers to use Python's min-heap as a max-heap)
    highers = []  # min-heap
    counts = defaultdict(int)  # Keep track of numbers in our current window
    window = deque()  # The current window of numbers

    for i in range(len(arr)):
        num = arr[i]
        window.append(num)  # Add current number to the window
        counts[num] += 1  # Increase count of current number

        # Insert the number into one of the heaps.
        # If the number is less than the largest number in the lower half, it goes to the max-heap.
        # Otherwise, it goes to the min-heap.
        if not lowers or num <= -lowers[0]:
            heapq.heappush(lowers, -num)
        else:
            heapq.heappush(highers, num)

        # If our window has grown beyond k, remove the leftmost number.
        if len(window) > k:
            out_num = window.popleft()
            counts[out_num] -= 1  # Mark this number for "lazy deletion"

        balance_heaps()  # Ensure heaps represent the current window and are balanced

        # Once the first k numbers have been processed, we start printing medians.
        if i >= k - 1:
            print(get_median(), "<- median of", list(window))


# Example usage
arr = [-1, 5, 13, 8, 2, 3, 3, 1]
k = 3
get_medians(arr, k)

In [17]:
# Example

arr = [-1, 5, 13, 8, 2, 3, 3, 1]
k = 3
get_medians(arr, k)

-1.0 <- median of [-1, 5, 13]
8.0 <- median of [5, 13, 8]
8.0 <- median of [13, 8, 2]
5.5 <- median of [8, 2, 3]
3.0 <- median of [2, 3, 3]
3.0 <- median of [3, 3, 1]


# Complexity Analysis

For **time complexity**:

1. **Insertion into heaps**: Insertion into a heap (either `lowers` or `highers`) is \(O(\log k)\), where \(k\) is the window size.

2. **Removal from heaps**: We're using a lazy deletion approach. So, while the marking for deletion operation is \(O(1)\), the actual removal (via `heapq.heappop`) is \(O(\log k)\). However, we need to consider that in the worst case, all elements in the window could be marked for deletion, so we might end up with a removal operation that is \(O(k \log k)\).

3. **Balancing heaps**: The balancing steps (pushing an element from one heap to another) is \(O(\log k)\).

4. **Overall Loop**: We loop through the `arr` once, and for each of its \(n\) elements, we perform the operations mentioned above.

So, combining everything, the worst-case time complexity is:

\[O(n \log k + n \times k \log k) = O(nk \log k)\]

The \(nk \log k\) term dominates here due to the potential for the lazy deletion process.

For **space complexity**:

1. The two heaps together can have at most \(k\) elements, so that's \(O(k)\).

2. The sliding window (`window`) also uses \(O(k)\) space.

3. The `counts` dictionary can have at most \(k\) unique keys, with constant-sized values, so that's also \(O(k)\).

Combining these, the overall space complexity is \(O(k)\).
