
# From Python to Production â€” Day 3  
## Mastering Iteration & Comprehensions  
By **Prerna Joshi** | #25DaysOfDataTech | #PythonToProduction

**Focus:** Real data tasks using `enumerate`, `zip`, and concise list/dict/set comprehensions.



### ðŸŽ¯ Learning Objectives
- Replace errorâ€‘prone index loops with `enumerate()` and multi-sequence loops with `zip()`  
- Write clean, performant *comprehensions* (list, dict, set, generator) with filters & inline conditionals  
- Apply these patterns to realistic data tasks (logs, CSVâ€‘like rows, API responses)  
- Understand pitfalls (unequal lengths with `zip`, readability vs. cleverness, memory costs)



### ðŸ§© Setup: Tiny Realistic Datasets
We'll use lightweight, realistic structures you often see in data work.


In [18]:

# Sample data: orders and users
orders = [
    {"order_id": "O-1001", "user_id": 1, "amount": 49.99, "currency": "USD"},
    {"order_id": "O-1002", "user_id": 2, "amount": 14.50, "currency": "USD"},
    {"order_id": "O-1003", "user_id": 1, "amount": 5.00,  "currency": "USD"},
    {"order_id": "O-1004", "user_id": 3, "amount": None,   "currency": "USD"},  # bad row
]

users = [
    {"user_id": 1, "name": "Ava"},
    {"user_id": 2, "name": "Ben"},
    {"user_id": 3, "name": "Chen"},
]

# CSV-like rows where position matters
rows = [
    ["2025-12-01", "INFO",  "ETL job started"],
    ["2025-12-01", "WARN",  "Missing value in column: amount"],
    ["2025-12-01", "ERROR", "Failed to write record: O-1004"],
]
print(len(orders), len(users), len(rows))


4 3 3



## 1) `enumerate()` â€” Ditch `range(len(...))`
**Why:** It couples the index with the item, making loops safer and more readable.


In [19]:

# Anti-pattern: manual index management
bad = []
for i in range(len(rows)):
    ts, level, msg = rows[i]   # brittle if shape changes
    bad.append(f"{i}:{level}:{msg}")
bad[:2]


['0:INFO:ETL job started', '1:WARN:Missing value in column: amount']

In [20]:

# Preferred: enumerate couples index + row in one step
good = [f"{i}:{level}:{msg}" for i, (ts, level, msg) in enumerate(rows, start=1)]
good[:2]


['1:INFO:ETL job started', '2:WARN:Missing value in column: amount']


**Tip:** Use `start=1` when you want human-friendly indexing (e.g., line numbers).



## 2) `zip()` â€” Iterate Sequences in Lockstep
**Why:** Pairs elements from multiple iterables for clean, aligned iteration.


In [21]:

dates   = [r[0] for r in rows]
levels  = [r[1] for r in rows]
messages= [r[2] for r in rows]

# Produce compact log lines by zipping aligned lists
zipped_lines = [f"[{d}] {lvl}: {m}" for d, lvl, m in zip(dates, levels, messages)]
zipped_lines


['[2025-12-01] INFO: ETL job started',
 '[2025-12-01] WARN: Missing value in column: amount',
 '[2025-12-01] ERROR: Failed to write record: O-1004']


**Important:** `zip()` stops at the shortest iterable. Use `itertools.zip_longest` if you need to fill missing values.


In [22]:

from itertools import zip_longest

a = [1, 2, 3]
b = ["x", "y"]
list(zip(a, b)), list(zip_longest(a, b, fillvalue=None))


([(1, 'x'), (2, 'y')], [(1, 'x'), (2, 'y'), (3, None)])


## 3) Comprehensions â€” Concise, Expressive, Fast Enough
Types youâ€™ll use constantly:
- **List**: `[expr for x in xs if cond]`
- **Dict**: `{key_expr: val_expr for x in xs if cond]`
- **Set**: `{expr for x in xs if cond]` (deduplicates)
- **Generator** *(lazy)*: `(expr for x in xs if cond)`



### 3.1 List Comprehensions (transform + filter)


In [23]:

# Extract clean amounts (ignore None), convert to cents as int
amount_cents = [int(round(o["amount"] * 100)) 
                for o in orders 
                if o["amount"] is not None]
amount_cents


[4999, 1450, 500]

In [24]:

# Flag high-value orders in a compact way (ternary expression)
flags = [("HIGH" if o["amount"] and o["amount"] >= 20 else "LOW") 
         for o in orders]
flags


['HIGH', 'LOW', 'LOW', 'LOW']


### 3.2 Dict Comprehensions (construct mapping tables)


In [25]:

# user_id -> name lookup
user_name_by_id = {u["user_id"]: u["name"] for u in users}
user_name_by_id


{1: 'Ava', 2: 'Ben', 3: 'Chen'}

In [26]:

# order_id -> (user_name, amount) only for valid amounts
order_summary = {
    o["order_id"]: (user_name_by_id.get(o["user_id"], "UNKNOWN"), o["amount"])
    for o in orders
    if o["amount"] is not None
}
order_summary


{'O-1001': ('Ava', 49.99), 'O-1002': ('Ben', 14.5), 'O-1003': ('Ava', 5.0)}


### 3.3 Set Comprehensions (unique, unordered)


In [27]:

# Unique log levels from rows
unique_levels = {lvl for _, lvl, _ in rows}
unique_levels


{'ERROR', 'INFO', 'WARN'}


### 3.4 Generator Expressions (lazy pipelines)
Use when you want **streamed** processing to reduce memory spikes.


In [28]:

# Sum amounts lazily (ignores None)
total_amount = sum(
    o["amount"] for o in orders
    if o["amount"] is not None
)
total_amount


69.49000000000001


## 4) Real Tasks: Clean, Join, and Report


In [29]:

# 4.1 Clean orders and attach user names
cleaned_orders = [
    {
        "order_id": o["order_id"],
        "user": user_name_by_id.get(o["user_id"], "UNKNOWN"),
        "amount": o["amount"],
        "currency": o["currency"],
        "rownum": i,  # from enumerate
    }
    for i, o in enumerate(orders, start=1)
    if o["amount"] is not None
]
cleaned_orders


[{'order_id': 'O-1001',
  'user': 'Ava',
  'amount': 49.99,
  'currency': 'USD',
  'rownum': 1},
 {'order_id': 'O-1002',
  'user': 'Ben',
  'amount': 14.5,
  'currency': 'USD',
  'rownum': 2},
 {'order_id': 'O-1003',
  'user': 'Ava',
  'amount': 5.0,
  'currency': 'USD',
  'rownum': 3}]

In [30]:

# 4.2 Simple aggregation with a dict comp + set comp
amount_by_user = {
    name: sum(o["amount"] for o in orders if o["amount"] is not None and user_name_by_id.get(o["user_id"]) == name)
    for name in {u["name"] for u in users}
}
amount_by_user


{'Ben': 14.5, 'Chen': 0, 'Ava': 54.99}

In [31]:

# 4.3 Aligning columns produced from different passes (use zip)
order_ids   = [o["order_id"] for o in cleaned_orders]
user_names  = [o["user"]     for o in cleaned_orders]
amounts     = [o["amount"]   for o in cleaned_orders]

report_lines = [f"{oid} | {uname:<5} | ${amt:,.2f}"
                for oid, uname, amt in zip(order_ids, user_names, amounts)]
report_lines


['O-1001 | Ava   | $49.99',
 'O-1002 | Ben   | $14.50',
 'O-1003 | Ava   | $5.00']


### Bonus: Unzip Trick
Use `zip(*pairs)` to transpose/unwrap.


In [32]:

pairs = list(zip(order_ids, amounts))
ids2, amts2 = zip(*pairs)  # unzip
pairs[:2], ids2[:2], amts2[:2]


([('O-1001', 49.99), ('O-1002', 14.5)], ('O-1001', 'O-1002'), (49.99, 14.5))


## 5) Performance Notes (Rule of Thumb)
- Comprehensions are often **faster** than equivalent `for` loops in Python due to optimized bytecode.  
- Prefer **generator expressions** when the full materialized list isn't required.  
- Don't over-optimize prematurely; optimize hotspots guided by profiling.


In [33]:
# Quick micro-benchmark (illustrative, not rigorous)
import timeit

setup = "data = list(range(10000))"

stmt_loop = """\
out = []
for x in data:
    out.append(x*x)
"""

stmt_comp = "out = [x*x for x in data]"
stmt_gen  = "total = sum(x*x for x in data)"

loop_time = timeit.timeit(stmt_loop, setup=setup, number=500)
comp_time = timeit.timeit(stmt_comp, setup=setup, number=500)
gen_time  = timeit.timeit(stmt_gen,  setup=setup, number=500)

loop_time, comp_time, gen_time


(0.284898100013379, 0.21554319997085258, 0.48656500002834946)


## 6) Pitfalls & Best Practices
- `zip()` truncates to shortest iterable. Use `itertools.zip_longest` if you need padding.  
- Avoid *overly clever* nested comprehensions; prefer a small, named helper function for readability.  
- For large data, a list comprehension can spike memoryâ€”switch to a generator expression `( ... )` or stream in chunks.  
- Keep expressions **pure** inside comprehensions (avoid mutating external state).



## 7) Practice
1. Using `enumerate`, add a `rownum` (starting from 101) to each `rows` item as a new list of dicts: `{"rownum": ..., "level": ..., "message": ...}`.  
2. Build a **dict** mapping user name â†’ list of that user's valid order IDs.  
3. From `report_lines`, create a **set** of user names who have any order â‰¥ \$20.  
4. Use a **generator** to compute the average valid order amount.



<details>
<summary><strong>Show Solutions</strong></summary>

```python
# 1) enumerate with custom start
row_dicts = [
    {"rownum": i, "level": lvl, "message": msg}
    for i, (_, lvl, msg) in enumerate(rows, start=101)
]

# 2) dict comp: name -> list of order_ids
name_to_orders = {
    u["name"]: [o["order_id"] for o in orders if o["amount"] is not None and o["user_id"] == u["user_id"]]
    for u in users
}

# 3) set comp with condition
names_ge_20 = {
    user_name_by_id[o["user_id"]]
    for o in orders
    if o["amount"] is not None and o["amount"] >= 20
}

# 4) generator for average
valid_amounts = (o["amount"] for o in orders if o["amount"] is not None)
# Protect against division by zero:
valid_amounts_list = list(valid_amounts)
avg = (sum(valid_amounts_list) / len(valid_amounts_list)) if valid_amounts_list else 0.0
```
</details>



---  
**Next:** Handling Data Efficiently - Working with CSV, JSON, APIs, and file systems; automation examples. 
