**⭐ 1. What This Pattern Solves**

Efficiently find the top-K largest or smallest elements in a large dataset.

Useful in analytics pipelines for:

Trending items

High-value transactions

Most active users/events

Avoids full sort (O(n log n)) when only K results are needed.

Handles streaming or incremental data efficiently.

**⭐ 2. SQL Equivalent**

In [0]:
%sql
SELECT user_id, SUM(amount) AS total
FROM transactions
GROUP BY user_id
ORDER BY total DESC
LIMIT K;


**⭐ 3. Core Idea**

Maintain a fixed-size heap of K elements.

Push new candidates, pop smallest/largest to keep heap size = K.

Guarantees top-K at the end without sorting entire dataset.

**⭐ 4. Template Code (MEMORIZE THIS)**

In [0]:
import heapq

def top_k_elements(iterable, k, reverse=False):
    heap = []
    for x in iterable:
        if len(heap) < k:
            heapq.heappush(heap, x if not reverse else -x)
        else:
            heapq.heappushpop(heap, x if not reverse else -x)
    return sorted(heap, reverse=reverse) if not reverse else sorted([-x for x in heap], reverse=True)

**⭐ 5. Detailed Example**

In [0]:
data = [5, 1, 8, 3, 10, 7]
k = 3

import heapq

def top_k_elements(data, k):
    heap = []
    for x in data:
        if len(heap) < k:
            heapq.heappush(heap, x)
        else:
            heapq.heappushpop(heap, x)
    return sorted(heap, reverse=True)

top_k_elements(data, 3)
# [10, 8, 7]

**⭐ 6. Mini Practice Problems**

Find the top 5 most expensive transactions from a stream of 10,000 transactions.

Return the 3 smallest response times from a log file of 1 million entries.

Find top-K trending hashtags from a live Twitter stream using a min-heap.

**⭐ 7. Full Data Engineering Scenario**

Problem: Compute top 10 highest-value customers from a transaction log file with millions of rows.

Expected Output:

customer_id | total_amount
--------------------------
1023        | 98500
4501        | 97200
...

In [0]:
import heapq
from collections import defaultdict

totals = defaultdict(int)
for customer_id, amount in transactions_stream:
    totals[customer_id] += amount

top_customers = heapq.nlargest(10, totals.items(), key=lambda x: x[1])


**⭐ 8. Time & Space Complexity**

Time: O(n log K) — each of n elements may trigger a heap push/pop of size K.

Space: O(K) — only K elements stored in heap.

**⭐ 9. Common Pitfalls & Mistakes**

❌ Using sorted(data)[-K:] on large datasets — uses O(n log n) instead of O(n log K).
❌ Forgetting to invert values for max-heap behavior using Python’s min-heap.
❌ Heap size not capped at K — memory usage grows unnecessarily.
✔ Correct approach: heappushpop keeps heap size = K efficiently.
✔ Use heapq.nlargest / heapq.nsmallest for built-in efficiency when possible.