# ?? The Ultimate Guide to Heaps in Python

Welcome! This notebook is your all-in-one guide to understanding the **Heap** data structure. We'll start from the absolute basics-why we even need heaps-and build up to solving complex interview problems that stump many developers.

Our journey will cover:
1.  **The "Why":** Understanding the Priority Queue.
2.  **The "How":** Deconstructing the Binary Heap and its array-based power.
3.  **The Core Mechanics:** Mastering Insertion, Deletion, and efficient heap construction.
4.  **The Applications:** Learning when, where, and why to use heaps.
5.  **The Masterclass:** Solving classic LeetCode problems step-by-step.

In [2]:
import heapq
import numpy as np

## ?? Chapter 1: The "Why" - Introducing the Priority Queue

### ?? Motivation: Beyond First-In, First-Out

Most of us are familiar with a standard **Queue**, which operates on a "First-In, First-Out" (FIFO) basis. Think of it as a line at a coffee shop; the first person to get in line is the first person to get coffee. The priority is simply the arrival time.

But what if we need a more flexible system? Imagine an emergency room. Patients aren't treated in the order they arrive; they're treated based on the *severity* of their condition. A patient with a critical injury (high priority) will be seen before someone with a minor scrape (low priority), even if the person with the scrape arrived first.

This is the exact idea behind a **Priority Queue (PQ)**. It's an abstract data type where each element has an associated priority, and elements with higher priority are served before elements with lower priority.

### ?? The Abstract Data Type (ADT)

A Priority Queue is defined by its operations, not its implementation. The two core operations are:

- `insert(item, priority)`: Adds a new item to the queue with its priority.
- `extract_min()` or `extract_max()`: Removes and returns the item with the highest priority (which could be the smallest or largest value, depending on the setup).

### ?? Naive Implementations & The Need for Speed

How could we build a PQ? Let's consider two simple approaches using arrays.

1.  **Unsorted Array/List**
    -   `insert()`: Just add the new element to the end of the array. This is super fast: $O(1)$.
    -   `extract_min()`: You have to scan the *entire* array to find the minimum element. This is slow: $O(n)$.

2.  **Sorted Array/List**
    -   `insert()`: To keep the array sorted, you must find the correct position and shift elements over. This is slow: $O(n)$.
    -   `extract_min()`: The minimum element is always at the beginning. This is super fast: $O(1)$.

Here's the trade-off:

| Implementation      | `insert()` | `extract_min()` |
|---------------------|------------|-----------------|
| Unsorted Array/List | $O(1)$     | $O(n)$          |
| Sorted Array/List   | $O(n)$     | $O(1)$          |

Neither is ideal. We want an implementation that is fast for *both* operations. This is where the **Binary Heap** comes in. It's the perfect underlying structure for a Priority Queue, giving us the best of both worlds:

| Implementation | `insert()`   | `extract_min()` |
|----------------|--------------|-----------------|
| **Binary Heap** | **$O(\log n)$** | **$O(\log n)$** |

## ?? Chapter 2: Under the Hood - Deconstructing the Binary Heap

A Binary Heap is a specific type of binary tree that satisfies two fundamental properties. Think of them as the two golden rules of heaps.

### Rule 1: Structural Property - It's a Complete Binary Tree

A binary tree is **complete** if all levels are completely filled, except possibly for the last level, which must be filled from left to right.

**Analogy:** Imagine seating people in a movie theater. You fill the first row completely, then the second, and so on. In the current row you're filling, you seat people from left to right without leaving any gaps. That's a complete tree.

### Rule 2: Heap Property - The Parent Rules All

This rule dictates the relationship between a parent node and its children. There are two flavors:

-   **Min-Heap:** The value of each node is *less than or equal to* the value of its children. This means the smallest element in the entire heap is always at the root.
-   **Max-Heap:** The value of each node is *greater than or equal to* the value of its children. This means the largest element is always at the root.

**Throughout this notebook, we'll primarily focus on Min-Heaps, but the concepts are identical for Max-Heaps.**

### The Key Insight: Representing a Tree with an Array

Here's the most brilliant part of heaps: while we *conceptualize* them as trees, we almost never implement them with node objects and pointers. Instead, we use a simple array.

Because the tree is *complete*, there's a direct and predictable mapping between a node and its position in an array. We can simply read the tree's nodes level by level, from left to right, and store them sequentially in an array.

This array representation allows us to navigate the tree using simple arithmetic formulas. For any node at index `i` in the array:

-   `parent(i) = floor((i - 1) / 2)`
-   `left_child(i) = 2*i + 1`
-   `right_child(i) = 2*i + 2`

This is incredibly efficient! No pointers, no node objects, just an array and some math. This minimizes memory overhead and improves cache performance.

## ??? Chapter 3: The Core Operations - Building and Maintaining the Heap

The core of the heap data structure lies in two algorithms that maintain the heap property after an insertion or deletion: `percolate_up` and `percolate_down`.

### Insertion: The "Percolate Up" (or Sift Up) Algorithm

When we add a new element to the heap, we must maintain both the structural and heap properties. The algorithm is simple:

1.  **Maintain Structure:** Add the new element to the first available spot in the tree. In our array, this simply means appending it to the end. The tree is now complete again, but the heap property might be violated.
2.  **Restore Heap Property:** The new element might be smaller than its parent. We fix this by repeatedly comparing the new element to its parent and swapping them if it's smaller. This process, of "swimming" the element up the tree, is called **percolating up**. We stop when the new element is no longer smaller than its parent, or when it reaches the root.

This operation has a time complexity of **$O(\log n)$** because the height of a complete binary tree with $n$ nodes is $\log n$, and in the worst case, we travel the full height of the tree.

#### Visualizing Insertion
Let's trace the insertion of the value `11` into the min-heap from the slides.

**Initial State:**
```
Tree:                     Array:
        8                 [8, 18, 29, 20, 28, 39, 66, 37, 26, 76, 32, 74, 89]
      /   \
    18     29
   / \   /  \
 20  28 39   66
 ...
```

**Step 1:** Add `11` to the end of the array (the next available spot in the tree).
```
Array: [..., 89, 11]  (at index 13)
Parent of index 13 is floor((13-1)/2) = 6. The parent value is 66.
```

**Step 2:** Percolate up. `11` is smaller than its parent `66` (at index 6). Swap them.
```
Tree:                     Array:
        8                 [..., 11, ..., 66]
      /   \
    18     29
   / \   /  \ 
 20  28 39   11        <-- 11 is now here
             / 
            66
New parent of index 6 is floor((6-1)/2) = 2. Parent value is 29.
```

**Step 3:** Percolate up again. `11` is smaller than its new parent `29` (at index 2). Swap them.
```
Tree:                     Array:
        8                 [8, 18, 11, ..., 29, ...]
      /   \
    18     11           <-- 11 is now here
   / \   /  \ 
 20  28 39   29
New parent of index 2 is floor((2-1)/2) = 0. Parent value is 8.
```

**Step 4:** Stop. `11` is greater than its new parent `8`. The heap property is restored.
```
Final Array: [8, 18, 11, 20, 28, 39, 29, 37, 26, 76, 32, 74, 89, 66]
```

### Deletion: The "Percolate Down" (or Sift Down) Algorithm

In a priority queue, the only element we can remove is the one with the highest priority-the root. The `extract_min()` operation works as follows:

1.  **Save the Minimum:** Store the value of the root (at index 0) to be returned later.
2.  **Maintain Structure:** To fill the hole at the root, we take the *last* element in the heap and move it to the root's position. We then shrink the heap size by one. The tree is now complete again, but the heap property is almost certainly violated at the root.
3.  **Restore Heap Property:** The new root is likely larger than one or both of its children. We fix this by repeatedly comparing it to its children and swapping it with the *smaller* of the two. This process of "sinking" the element down the tree is called **percolating down**. We stop when the element is smaller than both of its children, or when it becomes a leaf node.

Like insertion, this operation has a time complexity of **$O(\log n)$** because the element travels at most the height of the tree.

#### Visualizing Deletion (Extract-Min)

Let's trace `extract_min()` on the heap we just built.

**Initial State:** Root is `8`.
```
Tree:                     Array:
        8                 [8, 18, 11, 20, 28, 39, 29, ...]
      /   \
    18     11
   / \   /  \
 20  28 39   29
```

**Step 1:** Remove `8`. Replace it with the last element, `66`. The heap is now smaller.
```
Tree:                     Array:
       66                 [66, 18, 11, ...]
      /   \
    18     11
```

**Step 2:** Percolate down. Compare `66` to its children `18` and `11`. The smaller child is `11`. Since `66 > 11`, we swap them.
```
Tree:                     Array:
       11                 [11, 18, 66, ...]
      /   \
    18     66
```

**Step 3:** Percolate down again. `66` is now at index 2. Its children are `39` (index 5) and `29` (index 6). The smaller child is `29`. Since `66 > 29`, we swap them.
```
Tree:                     Array:
       11                 [11, 18, 29, ..., 66]
      /   \
    18     29
         /   \
        39    66 
```

**Step 4:** Stop. `66` is now a leaf node. The heap property is restored. The final time complexity is **$O(\log n)$**.

### Building a Heap in O(n) - The Floyd-Williams Algorithm

What if we're given an array of unsorted elements and want to turn it into a valid heap?

-   **The Obvious Way:** We could create an empty heap and insert each of the $n$ elements one by one. Since each insertion costs $O(\log n)$, the total time would be $O(n \log n)$.
-   **The Clever Way (`buildHeap`):** A much faster, in-place algorithm exists that runs in linear time: **$O(n)$**.

The algorithm works backwards from the last non-leaf node up to the root, performing a `percolate_down` on each one. The last non-leaf node is at index `floor(n/2) - 1`.

**Why is it $O(n)$?** The intuition is that most of the nodes in a heap are near the bottom (the leaves). The cost of `percolate_down` is proportional to the height of the node. Nodes at the bottom have a tiny height, so their operations are very cheap. Nodes at the top are expensive, but there are very few of them. When you sum up the work, it converges to a linear $O(n)$ complexity, not $O(n \log n)$.

## ?? Chapter 4: Practical Applications and Problem Patterns

### The "When to Use a Heap" Checklist

Heaps are your go-to data structure when you encounter problems involving:

? **Finding Min/Max:** Anytime a problem asks for the smallest, largest, highest, or lowest item from a collection.
? **Top 'K' Elements:** Finding the 'K' largest or smallest items in a large dataset (e.g., "Top 10 scores").
? **Streaming Data:** When you have a stream of incoming data and can't store it all, but need to maintain some aggregate like the K largest elements seen so far.
? **Dynamic Medians:** Finding the median of a dataset that is constantly changing.
? **Famous Algorithms:** Heaps are critical components in many famous algorithms, including:
   - Dijkstra's Shortest Path Algorithm
   - Prim's Minimum Spanning Tree Algorithm
   - Huffman Coding for data compression
   - Heap Sort

### Heap Sort

We can use a heap to sort an array in-place.

1.  **Build a Max-Heap:** First, convert the unsorted array into a max-heap in $O(n)$ time using the `buildHeap` algorithm.
2.  **Repeatedly Extract Max:** The largest element is now at the root (index 0). Swap it with the last element of the heap. Now the largest element is in its final sorted position.
3.  **Heapify Down:** The heap is now one element smaller. The new root may violate the heap property, so perform a `percolate_down` from the root to fix it.
4.  **Repeat:** Repeat steps 2 and 3 until the heap is empty.

The result is a sorted array. The total time complexity is dominated by the $n$ extractions, each costing $O(\log n)$, so Heap Sort is an **$O(n \log n)$** sorting algorithm. Because it's done in-place, it has an **$O(1)$** space complexity (if you don't count the input array).

### ?? Implementation in Python with `heapq`

Python's standard library provides the `heapq` module, which is an efficient implementation of a **min-heap**. It cleverly operates directly on standard Python lists.

Here are the key functions:
- `heapq.heappush(list, item)`: Pushes an item onto the heap (list). $O(\log n)$.
- `heapq.heappop(list)`: Pops and returns the smallest item. $O(\log n)$.
- `heapq.heapify(list)`: Transforms a list into a min-heap in-place. $O(n)$.
- `list[0]`: Accessing the smallest item without popping it. $O(1)$.

Let's see it in action.

In [3]:
# Create a list of numbers
data = [3, 1, 4, 1, 5, 9, 2, 6]

# Turn it into a min-heap in O(n) time
heapq.heapify(data)

print(f"Heapified list (min-heap): {data}")
print(f"The smallest element is: {data[0]}")

# Add a new element
heapq.heappush(data, 0)
print(f"After pushing 0: {data}")

# Pop the smallest element
smallest = heapq.heappop(data)
print(f"Popped smallest element: {smallest}")
print(f"Heap after popping: {data}")

Heapified list (min-heap): [1, 1, 2, 3, 5, 9, 4, 6]
The smallest element is: 1
After pushing 0: [0, 1, 2, 1, 5, 9, 4, 6, 3]
Popped smallest element: 0
Heap after popping: [1, 1, 2, 3, 5, 9, 4, 6]


#### The Max-Heap Simulation Trick

`heapq` only provides a min-heap. What if you need a max-heap? The standard trick is to **store the negative of the numbers**. 

The smallest negative number corresponds to the largest positive number. When you pop from the min-heap, you just negate the result to get your original maximum value back.

In [4]:
data = [3, 1, 4, 1, 5, 9, 2, 6]

# Create a max-heap by pushing the negative of each value
max_heap = []
for item in data:
    heapq.heappush(max_heap, -item)

print(f"Max-heap (internal representation): {max_heap}")

# The largest element is the negative of the root
largest = -max_heap[0]
print(f"The largest element is: {largest}")

# Pop the largest element
popped_largest = -heapq.heappop(max_heap)
print(f"Popped largest element: {popped_largest}")
print(f"Max-heap after popping: {max_heap}")

Max-heap (internal representation): [-9, -6, -5, -4, -1, -3, -2, -1]
The largest element is: 9
Popped largest element: 9
Max-heap after popping: [-6, -4, -5, -1, -1, -3, -2]


## ?? Chapter 5: LeetCode Masterclass - Solving Problems with Heaps

### Case Study 1: Kth Largest Element in a Stream (LeetCode 703/215)

**Problem:** Find the $k^{th}$ largest element in an unsorted array or a stream of numbers. For example, in `[3,2,1,5,6,4]` the 2nd largest is `5`.

**Suboptimal Approaches:**
1.  **Sort:** Sort the array in descending order and pick the element at index `k-1`. Time: $O(N \log N)$, Space: $O(1)$ or $O(N)$ depending on sort implementation.
2.  **Max-Heap:** Build a max-heap from all $N$ elements ($O(N)$) and then pop $k$ times ($O(k \log N)$). Total Time: $O(N + k \log N)$, Space: $O(N)$. This is better if $k$ is small, but still requires storing all $N$ items.

**The Optimal Solution: A Min-Heap of Size K**

The most efficient approach uses a clever inversion of logic. Instead of tracking all $N$ items, we only track the **top K largest items seen so far**. And the perfect tool to do this is a **min-heap of size exactly K**.

**Algorithm:**
1.  Create an empty min-heap.
2.  Iterate through the numbers. For each number `num`:
    -   Push `num` onto the min-heap.
    -   If the heap's size grows larger than $k$, pop the smallest element (the root).
3.  After iterating through all numbers, the heap contains the $k$ largest elements. The root of the min-heap is the smallest of these, which is precisely the $k^{th}$ largest element overall.

**Analysis:**
-   **Time:** We process $N$ elements. Each heap operation (push or pop) costs $O(\log k)$ because the heap never grows larger than size $k$. Total time is **$O(N \log k)$**.
-   **Space:** The heap stores at most $k$ elements. Total space is **$O(k)$**.

This is a massive improvement, especially for streaming data where $N$ is huge and you can't store everything!

Let's implement this optimal solution.

In [7]:
def findKthLargest(nums, k):
    """Finds the Kth largest element using a min-heap of size K."""
    min_heap = []
    for num in nums:
        heapq.heappush(min_heap, num)
        # If the heap exceeds size k, remove the smallest element
        if len(min_heap) > k:
            heapq.heappop(min_heap)
    
    # The root of the heap is the Kth largest element
    return min_heap[0]

# Example
nums = [3, 2, 1, 5, 6, 4]
k = 2
result = findKthLargest(nums, k)
print(f"The {k}nd largest element in {nums} is: {result}")

nums = [3, 2, 3, 1, 2, 4, 5, 5, 6, 7]
k = 4
result = findKthLargest(nums, k)
print(f"The {k}th largest element in {nums} is: {result}")

The 2nd largest element in [3, 2, 1, 5, 6, 4] is: 5
The 4th largest element in [3, 2, 3, 1, 2, 4, 5, 5, 6, 7] is: 5


### Case Study 2: Find Median from Data Stream (LeetCode 295)

**Problem:** Design a data structure that supports adding numbers from a stream and, at any time, finding the median of all numbers added so far.

The median is the middle value in an ordered list. If the list size is even, it's the average of the two middle values.

This problem is challenging because the data is constantly growing. Re-sorting the entire list every time we need the median would be too slow ($O(n \log n)$ per query).

**The Optimal Solution: The Two-Heap Strategy**

The brilliant solution is to maintain two heaps that divide the number stream into two halves:

1.  A **Max-Heap (`lowers`)**: Stores the smaller half of all numbers seen so far. The largest number in this heap (its root) is the largest of the small numbers.
2.  A **Min-Heap (`highers`)**: Stores the larger half of all numbers seen so far. The smallest number in this heap (its root) is the smallest of the large numbers.

We also enforce a **balancing constraint**: the sizes of the two heaps must never differ by more than 1.

!(https://i.imgur.com/1Yf6E78.png)

With this structure, the median is always available at the top of the heaps!

#### The Algorithm in Detail

**1. Adding a Number `num`**
- First, add the number to one of the heaps. A simple rule is to always add to the `lowers` (max-heap) first.
- To maintain the property that everything in `lowers` is less than or equal to everything in `highers`, we immediately move the largest element from `lowers` (its root) to `highers`.

**2. Rebalancing the Heaps**
- After the initial add-and-move, the heaps might be unbalanced (size difference > 1).
- If `highers` has more elements than `lowers`, we move the smallest element from `highers` (its root) back to `lowers`.
- This ensures `lowers` is either equal in size to `highers` or has one more element.

**3. Finding the Median**
- If the heaps have the **same size**, it means we have an even number of total elements. The median is the average of the two middle elements: `(root of lowers + root of highers) / 2`.
- If the heaps have **different sizes**, the `lowers` heap will have one more element. This means we have an odd number of total elements, and the median is simply the root of the `lowers` heap.

Here is the full Python implementation of the `MedianFinder` class.

In [None]:
class MedianFinder:

    def __init__(self):
        # `lowers` is a max-heap, storing the smaller half of the numbers.
        # We simulate it by storing negative values in a min-heap.
        self.lowers = []  # max-heap

        # `highers` is a min-heap, storing the larger half of the numbers.
        self.highers = [] # min-heap

    def addNum(self, num: int) -> None:
        # 1. Add to max-heap (lowers)
        heapq.heappush(self.lowers, -1 * num)
        
        # 2. Ensure every element in lowers is <= every element in highers
        # Move the largest element from lowers to highers.
        if self.lowers and self.highers and (-1 * self.lowers[0]) > self.highers[0]:
            val = -1 * heapq.heappop(self.lowers)
            heapq.heappush(self.highers, val)
        
        # 3. Rebalance the heaps if sizes differ by more than 1
        if len(self.lowers) > len(self.highers) + 1:
            val = -1 * heapq.heappop(self.lowers)
            heapq.heappush(self.highers, val)
            
        if len(self.highers) > len(self.lowers) + 1:
            val = heapq.heappop(self.highers)
            heapq.heappush(self.lowers, -1 * val)

    def findMedian(self) -> float:
        # If sizes are equal, median is the average of the two roots
        if len(self.lowers) > len(self.highers):
            return -1 * self.lowers[0]
        
        if len(self.highers) > len(self.lowers):
            return self.highers[0]
        
        # If heaps are empty
        if not self.lowers and not self.highers:
            return 0.0
            
        return (-1 * self.lowers[0] + self.highers[0]) / 2.0


# Let's test it
finder = MedianFinder()
stream = [5, 11, 22, 0, 2, 54, 8, 9]
print(f"Processing stream: {stream}\n")
for number in stream:
    finder.addNum(number)
    print(f"Added {number:2d}, current median is: {finder.findMedian()}")

## ? Conclusion

Congratulations on making it through this deep dive into heaps!

**Key Takeaways:**
-   A **Priority Queue** is an abstract concept; a **Binary Heap** is its most efficient implementation, offering $O(\log n)$ inserts and extracts.
-   The magic of heaps comes from their **complete binary tree structure** which allows for a super-efficient **array representation**.
-   The core mechanics are **percolating up** (for insertion) and **percolating down** (for deletion).
-   When you see a problem asking for "Top K", "smallest/largest", or "running median", your mind should immediately jump to heaps.
-   The **two-heap pattern** for finding medians is a powerful technique that elegantly splits a dataset into two manageable halves.

Heaps are a fundamental tool in a programmer's toolkit. By understanding them deeply, you've unlocked efficient solutions to a wide class of important problems.