**1) Nested loops used as a join (quadratic join)**

**Problem example (real)**: you have users (100k dicts) and orders (1M dicts). You want to attach a user's profile to each order by matching user_id. A naive nested loop checks every user for every order.

In [0]:
## Bad approach (conceptual):
for order in orders:
    for user in users:
        if order['user_id'] == user['user_id']:
            order['user'] = user

**Why this hurts:** it compares every pair — with n = len(orders) and m = len(users) you get O(n * m) comparisons. This explodes quickly: 100k × 1M is impossible.

In [0]:
## Better (hash-based lookup):
users_by_id = {u['user_id']: u for u in users}          # O(m) time, O(m) extra space
for order in orders:                                    # O(n)
    user = users_by_id.get(order['user_id'])
    if user is not None:
        order['user'] = user

**Complexities**

Bad (nested loops):

Time: O(n * m)

Extra space: O(1) (ignoring input storage)

Good (hash join):

Time: O(n + m) amortized

Extra space: O(m) for the index

**Trade-offs / notes**

Hash join uses extra memory for the dictionary. If m is huge and memory constrained, consider:

Sort both lists on id and do a merge-join (O(n log n + m log m) for sort, O(n + m) for merge) with lower extra RAM.

Use disk-backed joins (SQLite, pandas on-disk, or Spark) for very large datasets.

If keys are not unique (one-to-many), store lists in the index: {id: [rows...]}.

**2) Using a list for membership testing**

**Problem example (real):** deduplicating incoming event IDs in a high-throughput loop:

In [0]:
## Bad approach:
seen = []
for eid in event_stream:
    if eid not in seen:     # linear search across seen
        seen.append(eid)
        handle(eid)

**Why this hurts:** in on a list is O(k) where k is the list length. If you process n events and many are unique, overall time becomes O(n²) in the worst case.

In [0]:
## Better: use a set
seen = set()
for eid in event_stream:
    if eid not in seen:     # avg O(1)
        seen.add(eid)
        handle(eid)


**Complexities**

Bad (list membership):

Time: O(n²) worst-case (or O(n * k) if there’s bounded distincts)

Space: O(k) where k = number of distinct ids

Good (set):

Time: O(n) amortized (O(1) per membership & insert)

Space: O(k) (hash table overhead)

**Trade-offs / notes**

Sets use more memory per item than lists (hash table overhead). If k is tiny, list might be fine for simplicity.

If memory is critical and IDs are integers in a small range, consider a bitarray/bitmap to reduce space.

If items are unhashable, use something else (e.g., frozenset of serializations or a custom key).

**3) Repeated string concatenation inside a loop**

**Problem example (real)**: assembling a huge log or CSV line-by-line by s += line:

In [0]:
## Bad approach:
out = ""
for chunk in chunks:
    out = out + chunk      # creates a new string each iteration


**Why this hurts:** Python strings are immutable — each + allocates a new string and copies contents. Repeatedly concatenating n chunks leads to O(n²) copying work in many cases.

In [0]:
## Better approaches : Collect pieces and join once:

parts = []
for chunk in chunks:
    parts.append(chunk)
out = "".join(parts)       # single pass to allocate final string

## Or use io.StringIO (streamed construction) for many small writes:
from io import StringIO
buf = StringIO()
for chunk in chunks:
    buf.write(chunk)
out = buf.getvalue()

**Complexities**

Bad (repeated +):

Time: O(total_length * n) in pathological cases (commonly approximated as O(n²) where n is number of pieces)

Space: O(total_length) but with lots of temporary allocations

Good (join or StringIO):

Time: O(total_length) (single pass)

Space: O(total_length) with only one final allocation (plus small overhead)

**Trade-offs / notes**

For bytes, prefer bytearray or b"".join(...).

If pieces are streamed and you cannot hold all parts, write directly to a file or stream (avoid building a huge in-memory string).

join requires that you can store references to all pieces (the list) temporarily — if pieces count is huge but each is small, StringIO may be friendlier.

**4) Using exceptions for normal control-flow (heavy exception cost)**

**Problem example (real):** processing thousands of records and using try/except to check key presence in a dict every time:

In [0]:
## Bad approach:
for r in rows:
    try:
        val = mydict[r['k']]
        use(val)
    except KeyError:
        handle_missing(r)

**Why this hurts:** exceptions are comparatively expensive to raise and catch. If missing keys are common, the interpreter pays a large overhead repeatedly. Also using exceptions as logic obscures intent.

In [0]:
## Better approaches : If missing keys are expected often: use .get():

val = mydict.get(r['k'])
if val is None:
    handle_missing(r)
else:
    use(val)

## Or use defaultdict or setdefault if appropriate:
from collections import defaultdict
mydict = defaultdict(lambda: default_value)

**Complexities**

Both methods do an average O(1) dict lookup.

Bad (exceptions often): still O(1) lookup for present keys, but overhead of exception handling makes per-iteration constant factor much larger.

Good (get): O(1) average with much smaller constant overhead.

**Trade-offs / notes**

If missing keys are rare and the common case is that the key exists, a try/except can be slightly faster than in + lookup (because it avoids a second hash lookup). In microbenchmarks the “ask for forgiveness” pattern sometimes wins — but only when misses are truly rare. Always benchmark if you think this matters.

Keep code readable: prefer .get() when it expresses intent (check or provide default).

**5) Doing heavy compute or blocking I/O per item instead of batching**

**Problem example (real):** for each incoming record you call an external API or write to database immediately:

In [0]:
## Bad approach:

for record in stream:
    result = compute_expensive(record)   # CPU or I/O heavy
    db.write(row=result)                 # individual DB writes

**Why this hurts:** performing heavy operations per item multiplies overheads (network round-trip, DB transaction cost, context switches). Throughput collapses.

In [0]:
## Better (buffer & batch):

batch = []
BATCH_SIZE = 1000
for record in stream:
    batch.append(record)
    if len(batch) >= BATCH_SIZE:
        results = compute_batch(batch)        # vectorized / bulk operation
        db.write_many(results)
        batch.clear()

# flush remaining
if batch:
    results = compute_batch(batch)
    db.write_many(results)

**Complexities**

Bad (per-item expensive I/O):

Time: O(n * (latency + processing_per_item)) — latency dominates

Space: minimal (O(1) buffer)

Good (batching):

Time: O(n) but with much smaller constant — overheads amortized over BATCH_SIZE

Space: O(BATCH_SIZE) extra memory for the buffer

**Trade-offs / notes**

Batching increases memory use proportional to batch_size. Tune batch size to fit memory constraints and optimal throughput (DB or API often have sweet spots).

For CPU-bound work, vectorized operations (NumPy, pandas) or parallelism (multiprocessing) are better than naive loops.

If latency per item matters (real-time constraints), you may need a small batch or use async pipelines that pipeline compute and I/O.

Consider backpressure: if downstream is slower, use a bounded queue and apply flow control rather than unbounded buffering.

| Pattern                 |                         Bad (common) time |                 Bad space |                         Improved time | Improved space |
| ----------------------- | ----------------------------------------: | ------------------------: | ------------------------------------: | -------------: |
| Nested loops join       |                                    O(n·m) |                      O(1) |                       O(n + m) (hash) |           O(m) |
| List membership         |                              O(n·k) worst |                      O(k) |                  O(n) amortized (set) |           O(k) |
| Repeated string `+=`    |                             ~O(n²) copies | O(total) with temporaries |                              O(total) |       O(total) |
| Exceptions for flow     | O(n) but high constant if many exceptions |                      O(1) |              O(n) with small constant |           O(1) |
| Compute inside I/O loop |                            O(n * latency) |                      O(1) | O(n) with smaller constant (batching) |  O(batch_size) |
