**Pattern 1** — Two-Pointer Technique

**Problem:** Remove consecutive duplicate log timestamps (sorted list)

**Task:** Remove duplicates in-place so each timestamp appears only once, and return the new length.



In [0]:
## You’re given a sorted list of timestamps (strings or ints) that may have duplicates:
timestamps = [
    "2025-11-19T10:01:00",
    "2025-11-19T10:01:00",
    "2025-11-19T10:02:00",
    "2025-11-19T10:05:00",
    "2025-11-19T10:05:00",
]


In [0]:
"""
Idea (Two Pointers: slow and fast)
fast scans the list.
slow marks the position of the last unique element.
When timestamps[fast] != timestamps[slow], we move slow forward and copy the new unique value.

"""

def remove_consecutive_duplicates(timestamps):
    if not timestamps:
        return 0
    slow = 0
    for fast in range(1, len(timestamps)):
        if timestamps[fast] != timestamps[slow]:
            slow += 1
            timestamps[slow] = timestamps[fast]
    return slow + 1

%md
**Why this is Two-Pointer**
Both pointers move forward (no backtracking).
fast reads, slow writes.

_Works great on sorted or grouped data (logs ordered by time, IDs, etc.)._

**Time & Space Complexity**

Time: We make a single pass over the list → O(n).
Space: We modify in-place and use O(1) extra variables → O(1).

In [0]:
n = remove_consecutive_duplicates(timestamps)
print(timestamps[:n])

**Pattern 2** — Sliding Window

**Problem:** Moving average of last k latencies

**Task :**  Compute the moving average over every consecutive k requests


In [0]:
## You have a list of request latencies in milliseconds:
"""
Expected:
Window [120, 90, 100] → avg = 310/3
Window [90, 100, 130] → avg = 320/3
"""
latencies = [120, 90, 100, 130, 80, 110]  # in ms
k = 3


In [0]:
"""
Idea (Fixed-size Sliding Window)
Maintain:
1. window_sum = sum of current window of size k
2. Move window one step at a time:
2.1 Add new element entering the window
2.2 Subtract element leaving the window
"""

def moving_avaerage(latencies, k):
    if k <= 0 or k > len(latencies):
        return []
    
    result = []
    window_sum = sum(latencies[:k])
    result.append(window_sum / k)
    
    for i in range(k, len(latencies)):
        window_sum += latencies[i]          # add new element
        window_sum -= latencies[i - k]      # remove old element
        result.append(window_sum / k)
        
    return result
    


**Why this is Sliding Window**
1. You keep a window of size k over the list.
2. At each step, you “slide” that window one element to the right.
3. You reuse previous work instead of recomputing the sum from scratch.

**Time & Space Complexity**

Time:
Initial sum of first k elements: O(k)
Each of the remaining n - k steps does O(1) work (add one, subtract one).

Total: O(n).

Space:
result has ~n - k + 1 elements → O(n) output space.
Extra variables (window_sum, indices) → O(1) auxiliary.

If interviewer explicitly counts only auxiliary space, you say O(1).

In [0]:
print(moving_avaerage(latencies, k))

**Pattern 3** —  List Comprehensions

**Problem:**  Clean and filter raw records

**Task:** Produce a clean list of lowercased, trimmed, non-empty records.

In [0]:
## You have raw string records from a log file:
records = ["  SUCCESS  ", "", None, " error ", "SUCCESS", "  timeout "]

In [0]:
clean = [r.strip().lower() for r in records if r]

In [0]:
success = [
    r.strip().lower()
    for r in records
    if r and r.strip().lower() == "success"
]

**Why this is a Pattern** It’s a compact map/filter operation.

_Under the hood, CPython implements the loop in C, which is faster than an explicit Python for with append._

**Time & Space Complexity**

Let n = len(records).

Time: We visit each element once, do O(1) operations → O(n).

Space: We build a new list whose size ≤ n → O(n).

In [0]:
print(clean)
print(success)

**Pattern 4** — Batching / Chunking

**Problem:** Send user IDs to an API in batches of 500

**Task:** Process all IDs in batches of up to batch_size.


In [0]:
## You have a list of user IDs:
user_ids = list(range(1, 2300))  # 2299 user IDs
batch_size = 500

In [0]:
## Use a Chunking Helper
def chunk(lst,size):
    for i in range(0,len(lst),size):
        yield lst[i:i+size]

def send_to_api(batch):
    print(f"Sending batch {batch} to API")

**This yields batches:**
First: IDs 1–500
Second: 501–1000
…

Last: remaining IDs (<= 500)

Why this is a Pattern

Very common in ETL:

Sending bulk inserts to Snowflake/BigQuery

Batch-writing to Kafka, S3, or REST APIs

Limiting memory when processing huge lists

Here, chunk is streaming: we only hold one batch in memory at a time.

**Time & Space Complexity**

Let n = number of elements, b = batch size.

Time:

We iterate through the whole list once → O(n).

Space:

Each individual chunk has size ≤ b.

The generator holds only:

index i

one slice lst[i:i+b] at a time

So peak additional space is O(b).

If b is much smaller than n, this is memory efficient.

In [0]:
for batch in chunk(user_ids,batch_size):
    send_to_api(batch)

**Pattern 5** — Sorting + Key Functions

**Problem:** Sort events by timestamp, then deduplicate by event_id keeping the latest

**Task:** For each id, keep only the latest event (highest timestamp).

In [0]:
## You have a list of event dicts:
"""
Expected: keep
A @ 7
B @ 3
C @ 2
"""
events = [
    {"id": "A", "timestamp": 5, "value": 10},
    {"id": "B", "timestamp": 3, "value": 20},
    {"id": "A", "timestamp": 7, "value": 15},  # newer A
    {"id": "C", "timestamp": 2, "value": 30},
]


In [0]:
## Step 1: Sort by (id, timestamp) Sort so that for the same id, newer events come later:
events.sort(key=lambda e:(e["id"],e["timestamp"]))

In [0]:
print(events)

In [0]:
## Step 2: Scan and keep only last per id (Two-pointer / sweep)

deduped = []
prev_id = None

for e in events:
    if e["id"] != prev_id:
        deduped.append(e)
        prev_id = e["id"]
    else:
        deduped[-1] = e

**Why this is a Pattern**

Use sort(key=...) to:

Order logs by timestamp

Group by some key (like id, service)

**Prepare data for windowing or deduplication**

Python’s key function lets you sort by:

Multiple fields ((id, timestamp))

Derived values (e.g., len(x), x["timestamp"], parse datetime, etc.)

Time & Space Complexity

Let n = number of events.

**Sorting:**

Python’s Timsort: average/best O(n log n).

The key function is called once per element, O(1) each → O(n) overhead, dominated by sort.

**Sweep (dedup):**

Single pass over sorted events → O(n).

So:

**Total Time:**

O(n log n) (due to sort).

**Space:**

events.sort(...) is in-place from Python’s perspective → O(1) extra.

deduped list stores up to n items → O(n) for result.

If they ask only auxiliary space besides inputs/outputs, you can say O(1).

In [0]:
print(deduped)

In [0]:
## Scenario A — Stream Buffer Before Kafka Write
## Pattern: Append + reset.
buffer = []
for event in stream:
    buffer.append(event)
    if len(buffer) == 1000:
        send_to_kafka(buffer)
        buffer.clear()


In [0]:
## Scenario B — Flatten JSON with Recursion
## List operations dominate here.

def flatten_json(data, parent_key='', sep='.'):
    items = []
    for k, v in data.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, dict):
            items.extend(flatten_json(v, new_key, sep))
        else:
            items.append((new_key, v))
    return items


In [0]:
## Scenario C — Deduplicate Ordered Logs
def dedup(seq):
    out = []
    seen = set()
    for item in seq:
        if item not in seen:
            seen.add(item)
            out.append(item)
    return out

Avoid These List Anti-Patterns (Interview Traps)
❌ Using list as a queue

O(n) pops → use deque.

❌ Using list for membership tests

Use a set instead (O(1) membership).

❌ Using excessive append in loops where comprehension is enough

Comprehensions are much faster.

❌ Using list for grouping

Use defaultdict(list).

In [0]:
## Problem 1 — Remove consecutive duplicates (logs)
def remove_consecutive(seq):
    out = [seq[0]]
    for x in seq[1:]:
        if x != out[-1]:
            out.append(x)
    return out

In [0]:
## Problem 2 — Sliding window average
def moving_avg(nums, k):
    out = []
    s = sum(nums[:k])
    out.append(s / k)
    for i in range(k, len(nums)):
        s += nums[i] - nums[i-k]
        out.append(s / k)
    return out

In [0]:
## Problem 3 — Chunk list
def chunk(lst, n):
    return [lst[i:i+n] for i in range(0, len(lst), n)]

In [0]:
## Problem 4 — Second largest
def second_largest(nums):
    first = second = float('-inf')
    for x in nums:
        if x > first:
            second = first
            first = x
        elif first > x > second:
            second = x
    return second


In [0]:
## Problem 5 — Merge k sorted streams

import heapq

def merge_streams(lists):
    heap = []
    for i, lst in enumerate(lists):
        if lst:
            heapq.heappush(heap, (lst[0], i, 0))
    out = []
    while heap:
        val, li, idx = heapq.heappop(heap)
        out.append(val)
        if idx + 1 < len(lists[li]):
            heapq.heappush(heap, (lists[li][idx+1], li, idx+1))
    return out


In [0]:
## Problem 6 — Validate sorted timestamps
def is_sorted(records):
    return all(records[i] <= records[i+1] for i in range(len(records)-1))

In [0]:
## Problem 7 — Flatten nested list

def flatten(lst):
    out = []
    for x in lst:
        if isinstance(x, list):
            out.extend(flatten(x))
        else:
            out.append(x)
    return out

In [0]:
## Problem 8 — Deduplicate but keep order

def dedup_ordered(seq):
    seen = set()
    out = []
    for x in seq:
        if x not in seen:
            seen.add(x)
            out.append(x)
    return out

In [0]:
## Problem 9 — Find missing number (0..n)

def missing(nums):
    n = len(nums)
    return n*(n+1)//2 - sum(nums)

In [0]:
## Problem 10 — Pair sums (watch for duplicates)
def pair_sum(nums, target):
    seen = set()
    for i, x in enumerate(nums):
        y = target - x
        if y in seen:
            return True
        seen.add(x)
    return False


Summary

Lists = dynamic arrays with reference storage

Append = amortized O(1)

Insert/remove in middle = O(n)

Use cases:

streaming buffers

flattened JSON

ETL batch processing

ordered logs

Don't use lists for membership tests → use set

Don't use lists as queues → use deque

Use list comprehensions for best performance

Know sliding windows, batching, sorting, two-pointer patterns