**✔ Principle 1** — Use Built-Ins Over Python Loops
**Why it matters**

Built-ins like list comprehensions, sum(), max(), sorted() run in optimized C code → far fewer Python-level operations → faster.

In [0]:
## Compute the total transaction amount for each value doubled.
## Slow (Python loop):
out = []
for x in arr:
    out.append(x * 2)

## Fast
out = [x * 2 for x in arr]

| Operation          | Time | Space |
| ------------------ | ---- | ----- |
| Loop               | O(n) | O(n)  |
| List comprehension | O(n) | O(n)  |

Time is same, but Python-level overhead is far lower → 2–4× faster.

**✔ Principle 2** — Avoid Repeated String Concatenation
Why it matters

+= on strings reallocates memory every time → O(n²).

In [0]:
## Slow O(n²):
s = ""
for part in parts:
    s += part

##Fast O(n):
s = "".join(parts)

| Operation       | Time  | Space |
| --------------- | ----- | ----- |
| Repeated concat | O(n²) | O(n)  |
| Join            | O(n)  | O(n)  |

**Where this shows up**

log aggregation

building SQL queries

constructing large JSON blobs

**✔ Principle 3** — Avoid Linear Membership Tests on Lists
Why it matters

List membership is O(n).
Set/dict membership is O(1) avg case (hash lookup).

In [0]:
## Slow:
if user_id in fraud_list:   # O(n)
##Fast
if user_id in fraud_set:    # O(1)

| Structure | Membership Time | Space |
| --------- | --------------- | ----- |
| list      | O(n)            | O(n)  |
| set       | O(1) average    | O(n)  |

**Interview scenario**

"How do you optimize filtering millions of records?"

"Why use a set instead of a list?"

**✔ Principle 4** — Use Local Variables
Why it matters

Global variable or module attribute lookup (e.g., math.sqrt) is slow because Python climbs scope chains repeatedly.

In [0]:
## Slow
for x in arr:
    total += math.sqrt(x)

## Fast
sqrt = math.sqrt
for x in arr:
    total += sqrt(x)

| Operation     | Time                           | Space |
| ------------- | ------------------------------ | ----- |
| Global lookup | O(n * k) where k = scope depth | O(1)  |
| Local caching | O(n)                           | O(1)  |

**Where this matters**

Tight loops over millions of values

Pandas UDFs

Data parsing

**✔ Principle 5** — Avoid Python Function Calls Inside Hot Loops

**Why it matters**

Function calls have overhead: stack frame creation, argument binding, lookups.

Inlining simple expressions can cut loop time drastically.

In [0]:
## Slow

def normalize(x, m):
    return x - m

mn = min(arr)
out = []
for x in arr:
    out.append(normalize(x, mn))

## Fast (inline):
mn = min(arr)
out = [x - mn for x in arr]

| Operation                 | Time     | Space |
| ------------------------- | -------- | ----- |
| Function call inside loop | O(n * f) | O(n)  |
| Inline expression         | O(n)     | O(n)  |

f = overhead of Python function call.

**When it appears**

ETL row-by-row transformations

Parsing logs

Data cleaning code in take-homes


**✔ Principle 6** — Use Generators for Streaming
Why it matters

Iterators/generators don't allocate memory for entire datasets → process streams efficiently.

In [0]:
## Process 5GB log file line by line.

## Fast / low memory (streaming):
def read_lines():
    with open("big.log") as f:
        for line in f:
            yield process(line)

for row in read_lines():
    write(row)

## Slow / memory explosion (loading full file):
rows = [process(line) for line in open("big.log")]

| Operation | Time | Space |
| --------- | ---- | ----- |
| List load | O(n) | O(n)  |
| Generator | O(n) | O(1)  |


**✔ Principle 7** — Choose the Right Data Structure for the Job
Why it matters

Correct DS = huge performance gains.
Wrong DS = catastrophic slowdowns.

In [0]:
## 1. Need fast lookup? → dict / set

fraud_users = set(fraud_ids)
if uid in fraud_users:     # O(1)

## ✔ Time: O(1) lookups
## ✔ Space: O(n)

In [0]:
## 2. Need ordering? → list
sorted_events = sorted(events)   # maintains order

## ✔ Best for sequences
## ✔ Time: O(n log n) sort
## ✔ Space: O(n)

In [0]:
## 3. Need sliding window? → deque

from collections import deque

window = deque(maxlen=1000)

## ✔ append/pop left = O(1)
## ✔ ideal for moving averages, rate-limits

In [0]:
## 4. Need grouping? → defaultdict(list)

from collections import defaultdict
groups = defaultdict(list)
for user, txn in records:
    groups[user].append(txn)
## ✔ grouping = O(n)

In [0]:
## 5. Need counting? → Counter
from collections import Counter
counts = Counter(event_types)

## ✔ O(n)
## ✔ optimized C implementation

In [0]:
## 6. Need schema? → dataclass

from dataclasses import dataclass

@dataclass
class Event:
    user: str
    amount: float

## ✔ Faster than dicts
## ✔ More memory-efficient
## ✔ Clearer in take-home assignments

| Task       | Best DS     | Time        | Space |
| ---------- | ----------- | ----------- | ----- |
| Lookup     | set/dict    | O(1)        | O(n)  |
| FIFO queue | deque       | O(1)        | O(n)  |
| Counting   | Counter     | O(n)        | O(n)  |
| Grouping   | defaultdict | O(n)        | O(n)  |
| Ordered    | list        | O(n)        | O(n)  |
| Schema     | dataclass   | O(1) access | O(1)  |


**Prefer generators over lists**

Use streaming:

In [0]:
def records():
    for line in open("logs"):
        yield json.loads(line)

**Avoid holding entire datasets when unnecessary**

Process in chunks:

In [0]:
def read_chunks(file, size=10000):
    chunk = []
    for line in file:
        chunk.append(line)
        if len(chunk) == size:
            yield chunk
            chunk.clear()
    if chunk:
        yield chunk

**Use tuples for immutable, lightweight rows**

In [0]:
records = [(id, amt, ts) for id, amt, ts in data]

**Use __slots__ in Classes**

In [0]:
## Regular class
class Event:
    def __init__(self, ts, user_id, amount):
        self.ts = ts
        self.user_id = user_id
        self.amount = amount

In [0]:
class Event:
    __slots__ = ("ts", "user_id", "amount")

    def __init__(self, ts, user_id, amount):
        self.ts = ts
        self.user_id = user_id
        self.amount = amount