# Lab 0: Python Best Practices for Deep Learning

## Learning Objectives

By the end of this lab, you will be able to:
1. Write type-annotated Python functions using modern syntax
2. Apply Pythonic patterns (comprehensions, context managers, enumerate/zip)
3. Perform NumPy array operations including broadcasting and reshaping
4. Understand OOP conventions used in PyTorch (classes, `__call__`, etc.)
5. Write generators for memory-efficient data processing
6. Debug Python code effectively

## Prerequisites

- Basic Python programming (variables, loops, functions, classes)
- Familiarity with importing packages

## Why This Lab?

Deep learning code has conventions that may be unfamiliar:
- **Type hints** make code self-documenting and catch bugs early
- **NumPy broadcasting** is essential for understanding tensor operations
- **Generators** power PyTorch DataLoaders
- **OOP patterns** like `__call__` are central to `nn.Module`

This lab ensures you have the Python foundations needed for Labs 6-10.

In [None]:
# ==== Environment Setup ====
import os
import sys

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running on Google Colab")
else:
    print("Running locally")

In [None]:
# ==== Device Setup ====
import torch

def get_device():
    """Get best available device: CUDA > MPS > CPU."""
    if torch.cuda.is_available():
        device = torch.device('cuda')
        print(f"Using CUDA GPU: {torch.cuda.get_device_name(0)}")
    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        device = torch.device('mps')
        print("Using Apple MPS (Metal)")
    else:
        device = torch.device('cpu')
        print("Using CPU")
    return device

DEVICE = get_device()

---

# Part 2: Python Foundations for Deep Learning

---

## 2.1 Type Hints

Type hints make code self-documenting and enable better IDE support.

### Basic Syntax (Python 3.10+)

In [None]:
# Basic type hints
def greet(name: str) -> str:
    return f"Hello, {name}!"

# Collections - use lowercase (Python 3.10+)
def average(values: list[float]) -> float:
    return sum(values) / len(values)

# Dictionaries
def word_count(text: str) -> dict[str, int]:
    words = text.lower().split()
    return {word: words.count(word) for word in set(words)}

# Optional values (can be None)
def find_index(items: list[str], target: str) -> int | None:
    try:
        return items.index(target)
    except ValueError:
        return None

# Test
print(f"greet('World'): {greet('World')}")
print(f"average([1, 2, 3, 4, 5]): {average([1, 2, 3, 4, 5])}")
print(f"word_count('the cat and the dog'): {word_count('the cat and the dog')}")
print(f"find_index(['a', 'b', 'c'], 'b'): {find_index(['a', 'b', 'c'], 'b')}")
print(f"find_index(['a', 'b', 'c'], 'x'): {find_index(['a', 'b', 'c'], 'x')}")

<details>
<summary><b>Q: Why use `int | None` instead of `Optional[int]`?</b></summary>

**A:** `int | None` is the modern Python 3.10+ syntax. It's more readable and doesn't require importing from `typing`. The older `Optional[int]` still works but is more verbose.

```python
# Old style (pre-3.10)
from typing import Optional, List
def f(x: Optional[int]) -> List[str]: ...

# Modern style (3.10+)
def f(x: int | None) -> list[str]: ...
```
</details>

### Exercise: Add Type Hints

Add type hints to the following functions:

In [None]:
# Exercise: Add type hints to these functions

def calculate_loss(predictions, targets):
    """Calculate mean squared error loss."""
    return sum((p - t) ** 2 for p, t in zip(predictions, targets)) / len(predictions)

def get_batch(data, batch_idx, batch_size):
    """Get a batch from data. Returns None if batch_idx out of range."""
    start = batch_idx * batch_size
    if start >= len(data):
        return None
    return data[start:start + batch_size]

def create_optimizer_config(lr, momentum, weight_decay):
    """Create optimizer configuration dictionary."""
    return {"lr": lr, "momentum": momentum, "weight_decay": weight_decay}

# Test (uncomment after adding hints):
# print(calculate_loss([1.0, 2.0], [1.1, 2.2]))
# print(get_batch([1,2,3,4,5], 0, 2))
# print(create_optimizer_config(0.01, 0.9, 1e-4))

## 2.2 Docstrings

Use Google-style docstrings for complex functions:

In [None]:
def train_model(
    model,
    train_loader,
    epochs: int = 10,
    learning_rate: float = 0.001,
    verbose: bool = True
) -> dict[str, list[float]]:
    """
    Train a PyTorch model.
    
    Args:
        model: PyTorch model to train (nn.Module)
        train_loader: DataLoader with training data
        epochs: Number of training epochs
        learning_rate: Learning rate for optimizer
        verbose: Whether to print progress
    
    Returns:
        Dictionary with 'train_loss' history
    
    Raises:
        ValueError: If epochs < 1
    
    Example:
        >>> history = train_model(model, loader, epochs=5)
        >>> plt.plot(history['train_loss'])
    """
    if epochs < 1:
        raise ValueError("epochs must be >= 1")
    # ... training code ...
    return {"train_loss": []}

# For simple/obvious functions, a one-liner is fine:
def relu(x: float) -> float:
    """Return max(0, x)."""
    return max(0, x)

## 2.3 Pythonic Patterns

### List Comprehensions

In [None]:
# Instead of:
squares_loop = []
for i in range(10):
    squares_loop.append(i ** 2)

# Use:
squares = [i ** 2 for i in range(10)]
print(f"Squares: {squares}")

# With condition
evens = [i for i in range(20) if i % 2 == 0]
print(f"Evens: {evens}")

# Dict comprehension
word_lengths = {word: len(word) for word in ["cat", "elephant", "dog"]}
print(f"Word lengths: {word_lengths}")

# Set comprehension (removes duplicates)
unique_lengths = {len(word) for word in ["cat", "bat", "elephant", "ant"]}
print(f"Unique lengths: {unique_lengths}")

### enumerate, zip, sorted

In [None]:
# enumerate - get index and value
fruits = ["apple", "banana", "cherry"]
for i, fruit in enumerate(fruits):
    print(f"{i}: {fruit}")

# zip - iterate multiple sequences together
names = ["Alice", "Bob", "Charlie"]
scores = [85, 92, 78]
for name, score in zip(names, scores):
    print(f"{name}: {score}")

# sorted with key function
students = [("Alice", 85), ("Bob", 92), ("Charlie", 78)]
by_score = sorted(students, key=lambda x: x[1], reverse=True)
print(f"By score (desc): {by_score}")

<details>
<summary><b>Q: When should you use a list comprehension vs a regular loop?</b></summary>

**A:** Use comprehensions when:
- Building a new list/dict/set from an iterable
- The logic fits on one readable line

Use regular loops when:
- You need complex logic or multiple statements
- You're modifying in place rather than creating new
- Readability suffers from one-liner

**Rule of thumb:** If you can't understand it in 5 seconds, use a loop.
</details>

### Exercise: Pythonic Refactoring

Refactor this verbose code to use Pythonic patterns:

In [None]:
# VERBOSE VERSION - refactor this to be Pythonic!

# Task 1: Create list of (name, score) tuples where score > 80
names = ["Alice", "Bob", "Charlie", "Diana"]
scores = [95, 72, 88, 65]
high_scorers = []
for i in range(len(names)):
    if scores[i] > 80:
        high_scorers.append((names[i], scores[i]))

# Task 2: Create dict mapping filename -> extension
files = ["data.csv", "model.pt", "config.json", "README.md"]
extensions = {}
for f in files:
    parts = f.split(".")
    name = parts[0]
    ext = parts[1]
    extensions[name] = ext

# Task 3: Read file, count non-empty lines (use context manager!)
f = open("test_file.txt", "w")
f.write("line1\n\nline2\nline3\n")
f.close()

f = open("test_file.txt", "r")
lines = f.readlines()
f.close()
count = 0
for line in lines:
    if line.strip() != "":
        count = count + 1

# Cleanup
import os
os.remove("test_file.txt")

print(f"High scorers: {high_scorers}")
print(f"Extensions: {extensions}")
print(f"Non-empty lines: {count}")

<details>
<summary><b>Solution: Pythonic Refactoring</b></summary>

```python
# Task 1: zip + list comprehension with filter
high_scorers = [(n, s) for n, s in zip(names, scores) if s > 80]

# Task 2: dict comprehension with split unpacking
extensions = {f.split(".")[0]: f.split(".")[1] for f in files}
# Or cleaner:
extensions = {Path(f).stem: Path(f).suffix[1:] for f in files}

# Task 3: context manager + sum with generator
with open("test_file.txt", "w") as f:
    f.write("line1\n\nline2\nline3\n")

with open("test_file.txt", "r") as f:
    count = sum(1 for line in f if line.strip())
```
</details>

### Context Managers

In [None]:
# Context managers ensure cleanup (files close, locks release, etc.)

# File I/O - always use 'with'
from pathlib import Path

# Write
with open("test.txt", "w") as f:
    f.write("Hello, World!")

# Read
with open("test.txt", "r") as f:
    content = f.read()
print(f"File content: {content}")

# Clean up
Path("test.txt").unlink()

# PyTorch example: disable gradients for inference
import torch
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)

with torch.no_grad():
    y = x * 2  # No gradient tracking here
print(f"y.requires_grad: {y.requires_grad}")

In [ ]:
# Basic try/except
def safe_divide(a: float, b: float) -> float | None:
    try:
        return a / b
    except ZeroDivisionError:
        print("Warning: Division by zero")
        return None

print(safe_divide(10, 2))
print(safe_divide(10, 0))

# Multiple exception types with proper chaining
def parse_int(s: str) -> int:
    try:
        return int(s)
    except ValueError as e:
        raise ValueError(f"Cannot parse '{s}' as integer") from e  # Chain exceptions!
    except TypeError as e:
        raise TypeError(f"Expected string, got {type(s)}") from e

# finally - always runs (cleanup)
def read_with_cleanup(filename: str) -> str:
    f = None
    try:
        f = open(filename, "r")
        return f.read()
    except FileNotFoundError:
        return ""
    finally:
        if f:
            f.close()
            print("File closed")

# Demo exception chaining
try:
    parse_int("abc")
except ValueError as e:
    print(f"Caught: {e}")
    print(f"Original cause: {e.__cause__}")

<details>
<summary><b>Q: When should you catch exceptions vs let them propagate?</b></summary>

**A:** 
- **Catch** when you can handle it meaningfully (retry, default value, cleanup)
- **Propagate** when the caller should decide how to handle it

**Bad:** Catching everything and hiding errors
```python
try:
    result = do_something()
except:  # Never do this!
    pass
```

**Good:** Catch specific exceptions you can handle
```python
try:
    data = load_file(path)
except FileNotFoundError:
    data = default_data
```
</details>

---

# Part 3: NumPy Essentials

NumPy is the foundation for all deep learning frameworks. Understanding it is essential.

---

## 3.1 Why NumPy Matters for Deep Learning

In [None]:
import numpy as np
import time

# Vectorization is MUCH faster than loops
size = 1_000_000

# Loop version
a_list = list(range(size))
b_list = list(range(size))

start = time.time()
c_list = [a + b for a, b in zip(a_list, b_list)]
loop_time = time.time() - start

# NumPy version
a_np = np.arange(size)
b_np = np.arange(size)

start = time.time()
c_np = a_np + b_np
numpy_time = time.time() - start

print(f"Loop time: {loop_time:.4f}s")
print(f"NumPy time: {numpy_time:.4f}s")
print(f"NumPy is {loop_time/numpy_time:.1f}x faster")

<details>
<summary><b>Deep Dive: Why is NumPy so fast?</b></summary>

NumPy achieves 10-100x speedups through several mechanisms:

1. **Contiguous Memory Layout**: Arrays store data in continuous memory blocks, enabling efficient CPU cache utilization. Python lists store pointers to scattered objects.

2. **Compiled C/Fortran Backend**: Core operations are implemented in optimized C code, not interpreted Python.

3. **SIMD Vectorization**: Modern CPUs can process multiple numbers per instruction (Single Instruction, Multiple Data). NumPy operations leverage this automatically.

4. **No Type Checking Per Element**: Python lists check types dynamically for each element. NumPy arrays have uniform dtype - no per-element overhead.

5. **No Python Object Overhead**: Each Python object has ~28 bytes of overhead (reference count, type pointer, etc.). NumPy stores raw numbers.

```python
# Memory comparison
import sys
py_list = [1.0] * 1000
np_array = np.ones(1000)
print(f"Python list: {sys.getsizeof(py_list) + sum(sys.getsizeof(x) for x in py_list)} bytes")
print(f"NumPy array: {np_array.nbytes} bytes")  # Just 8000 bytes (8 bytes per float64)
```

**Rule**: If you're looping over array elements in Python, you're probably doing it wrong.
</details>

## 3.2 Array Creation & Indexing

In [None]:
import numpy as np

# Creating arrays
a = np.array([1, 2, 3, 4, 5])          # From list
b = np.zeros((3, 4))                     # 3x4 zeros
c = np.ones((2, 3))                      # 2x3 ones
d = np.arange(0, 10, 2)                  # [0, 2, 4, 6, 8]
e = np.linspace(0, 1, 5)                 # 5 points from 0 to 1
f = np.random.randn(3, 3)                # 3x3 standard normal

print(f"zeros shape: {b.shape}")
print(f"arange: {d}")
print(f"linspace: {e}")

# Indexing
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"\narr:\n{arr}")
print(f"arr[0, 1]: {arr[0, 1]}")         # Single element
print(f"arr[0, :]: {arr[0, :]}")         # First row
print(f"arr[:, 1]: {arr[:, 1]}")         # Second column
print(f"arr[0:2, 1:3]:\n{arr[0:2, 1:3]}")  # Subarray

# Boolean indexing
print(f"\narr > 5: {arr[arr > 5]}")

## 3.3 Broadcasting

Broadcasting allows operations between arrays of different shapes.

### Rules:
1. Compare shapes from right to left
2. Dimensions match if they're equal OR one of them is 1
3. Missing dimensions are treated as 1

**Before running each cell below, predict the output shape!**

In [None]:
import numpy as np

# Scalar broadcasts to any shape
a = np.array([[1, 2, 3], [4, 5, 6]])  # Shape: (2, 3)
print(f"a + 10:\n{a + 10}")  # 10 broadcasts to (2, 3)

# Row vector broadcasts across rows
row = np.array([100, 200, 300])  # Shape: (3,)
print(f"\na + row:\n{a + row}")  # (3,) -> (2, 3)

# Column vector broadcasts across columns
col = np.array([[10], [20]])  # Shape: (2, 1)
print(f"\na + col:\n{a + col}")  # (2, 1) -> (2, 3)

# Outer product via broadcasting
x = np.array([1, 2, 3])[:, np.newaxis]  # Shape: (3, 1)
y = np.array([10, 20])                   # Shape: (2,)
print(f"\nOuter product (x * y):\n{x * y}")  # (3, 1) * (2,) -> (3, 2)

<details>
<summary><b>Q: Why does `np.array([1,2]) + np.array([[1],[2],[3]])` work?</b></summary>

**A:** Let's trace the broadcasting:
- Left: shape (2,)
- Right: shape (3, 1)

Align from right:
```
     (2,)  ->  (1, 2)  [add dimension]
  (3, 1)   ->  (3, 1)
  Result:      (3, 2)  [both expand]
```

Each expands where it has size 1:
```python
[[1, 2],      [[1, 1],     [[2, 3],
 [1, 2],  +    [2, 2],  =   [3, 4],
 [1, 2]]       [3, 3]]      [4, 5]]
```
</details>

In [None]:
# Broadcasting Debugger - useful helper function
def broadcast_shapes(*shapes):
    """Visualize how shapes align and what the result will be."""
    max_dims = max(len(s) for s in shapes)
    
    # Pad shapes with 1s on the left
    padded = [((1,) * (max_dims - len(s))) + s for s in shapes]
    
    print("Shape alignment (right-aligned):")
    for i, (orig, pad) in enumerate(zip(shapes, padded)):
        print(f"  Array {i+1}: {str(orig):>15} -> {pad}")
    
    # Compute result shape
    result = []
    for dims in zip(*padded):
        if len(set(d for d in dims if d != 1)) > 1:
            print(f"\n❌ INCOMPATIBLE: dimension has {dims} (multiple non-1 values)")
            return None
        result.append(max(dims))
    
    print(f"\n✓ Result shape: {tuple(result)}")
    return tuple(result)

# Test it
print("Example 1: (2,3) + (3,)")
broadcast_shapes((2, 3), (3,))

print("\nExample 2: (3,1) + (1,4)")
broadcast_shapes((3, 1), (1, 4))

print("\nExample 3: Incompatible shapes")
broadcast_shapes((3, 4), (5,))

### Exercise: Broadcasting

Fix the code to add bias to each sample:

In [None]:
import numpy as np

# Data: 100 samples, 784 features (like MNIST flattened)
X = np.random.randn(100, 784)
bias = np.random.randn(784)

# This should add bias to each row
result = X + bias  # Does this work?
print(f"X shape: {X.shape}")
print(f"bias shape: {bias.shape}")
print(f"result shape: {result.shape}")
assert result.shape == (100, 784), "Shape mismatch!"
print("Broadcasting worked!")

## 3.4 Common Operations

In [None]:
import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Original shape: {a.shape}")

# Reshape
b = a.reshape(3, 2)
print(f"Reshaped to (3,2):\n{b}")

# Transpose
print(f"Transposed:\n{a.T}")

# Flatten
print(f"Flattened: {a.flatten()}")

# Concatenate
c = np.array([[7, 8, 9]])
print(f"\nVertical concat:\n{np.concatenate([a, c], axis=0)}")

d = np.array([[10], [20]])
print(f"\nHorizontal concat:\n{np.concatenate([a, d], axis=1)}")

In [None]:
# Reductions along axes
a = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Array:\n{a}")

print(f"\nSum all: {a.sum()}")
print(f"Sum rows (axis=1): {a.sum(axis=1)}")     # Sum each row
print(f"Sum cols (axis=0): {a.sum(axis=0)}")     # Sum each column

print(f"\nMean all: {a.mean():.2f}")
print(f"Mean rows: {a.mean(axis=1)}")

# Matrix multiplication
W = np.random.randn(3, 4)  # 3x4
x = np.random.randn(4, 2)  # 4x2
y = W @ x                   # 3x2
print(f"\nW @ x: {W.shape} @ {x.shape} = {y.shape}")

<details>
<summary><b>Q: What's the difference between `axis=0` and `axis=1` in reductions?</b></summary>

**A:** The axis parameter specifies which dimension to "collapse":
- `axis=0`: Collapse rows → result has shape of a single row
- `axis=1`: Collapse columns → result has shape of a single column

Think of it as: "sum **along** this axis" or "reduce **this** dimension"

```python
a = [[1, 2, 3],
     [4, 5, 6]]  # Shape (2, 3)

a.sum(axis=0)  # [5, 7, 9]   - summed down columns, shape (3,)
a.sum(axis=1)  # [6, 15]     - summed across rows, shape (2,)
```
</details>

### Exercise: Shape Prediction

**Predict the output shapes before running!** Write your predictions, then verify.

In [None]:
import numpy as np

# VIEWS: Slicing creates a view (shares memory!)
original = np.array([1, 2, 3, 4, 5])
view = original[1:4]  # This is a VIEW

print(f"Original: {original}")
print(f"View: {view}")

# Modifying the view changes the original!
view[0] = 999
print(f"After modifying view[0]:")
print(f"  Original: {original}")  # Also changed!
print(f"  View: {view}")

# COPIES: Use .copy() to get independent data
original = np.array([1, 2, 3, 4, 5])
copy = original[1:4].copy()  # Explicit copy

copy[0] = 999
print(f"\nWith .copy():")
print(f"  Original: {original}")  # Unchanged!
print(f"  Copy: {copy}")

# How to check: views share memory
a = np.array([1, 2, 3])
b = a[:]      # View
c = a.copy()  # Copy

print(f"\nShares memory?")
print(f"  a and b: {np.shares_memory(a, b)}")  # True
print(f"  a and c: {np.shares_memory(a, c)}")  # False

<details>
<summary><b>Q: Is `arr.reshape(3, 4)` a view or a copy?</b></summary>

**A:** It depends! Reshape returns a **view** when possible (if the data is contiguous in memory), but may return a **copy** if the memory layout doesn't allow a view.

```python
a = np.arange(12).reshape(3, 4)  # Usually a view
b = a.T.reshape(6, 2)            # Must be a copy (transpose breaks contiguity)
```

**Safe approach:** If you need to be sure, use `.copy()` explicitly. If you want to ensure a view (and error otherwise), use `.reshape()` with `order='A'` or `np.ndarray.view()`.
</details>

## 3.5 Views vs Copies (Critical!)

Understanding when NumPy creates a view vs a copy prevents subtle bugs.

---

# Part 4: OOP for Deep Learning

PyTorch heavily uses OOP. Understanding these patterns is essential.

---

## 4.1 Classes Review

In [None]:
class NeuralNetwork:
    """A simple neural network class demonstrating OOP patterns."""
    
    # Class attribute (shared by all instances)
    default_activation = "relu"
    
    def __init__(self, input_size: int, hidden_size: int, output_size: int):
        """Initialize the network."""
        # Instance attributes (unique to each instance)
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        # Simulated weights
        self.weights = {
            "W1": np.random.randn(input_size, hidden_size) * 0.01,
            "W2": np.random.randn(hidden_size, output_size) * 0.01,
        }
    
    def forward(self, x: np.ndarray) -> np.ndarray:
        """Forward pass."""
        h = x @ self.weights["W1"]
        h = np.maximum(0, h)  # ReLU
        return h @ self.weights["W2"]

# Usage
net = NeuralNetwork(784, 128, 10)
x = np.random.randn(32, 784)  # Batch of 32
output = net.forward(x)
print(f"Input: {x.shape} -> Output: {output.shape}")

## 4.2 Naming Conventions

In [None]:
class DataProcessor:
    """Demonstrates Python naming conventions."""
    
    def __init__(self, data: list):
        self.data = data              # Public: anyone can access
        self._cache = {}              # Protected: internal use, but accessible
        self.__secret = "hidden"      # Private: name-mangled to _DataProcessor__secret
    
    def process(self):
        """Public method - part of the API."""
        return self._preprocess()
    
    def _preprocess(self):
        """Protected method - internal helper, but subclasses can override."""
        return [x * 2 for x in self.data]
    
    def __validate(self):
        """Private method - truly internal, not for subclasses."""
        return all(isinstance(x, (int, float)) for x in self.data)

dp = DataProcessor([1, 2, 3])
print(f"Public data: {dp.data}")
print(f"Protected _cache: {dp._cache}")  # Works but discouraged
# print(dp.__secret)  # AttributeError!
print(f"Mangled name: {dp._DataProcessor__secret}")  # How to access if needed

<details>
<summary><b>Q: When should you use `_protected` vs `__private`?</b></summary>

**A:**
- **`_protected`**: Use for internal methods that subclasses might need to override. It's a convention saying "internal, but accessible."

- **`__private`**: Use when you truly want to prevent accidental override in subclasses. Python mangles the name to `_ClassName__method`, making it harder (but not impossible) to access.

**In practice:** Most Python code uses `_protected`. Use `__private` sparingly.
</details>

## 4.3 Dunder Methods

Dunder (double underscore) methods let you customize how objects behave.

In [None]:
class Tensor:
    """A simple tensor class demonstrating dunder methods."""
    
    def __init__(self, data: list):
        self.data = np.array(data)
    
    def __repr__(self) -> str:
        """For developers - unambiguous representation."""
        return f"Tensor(shape={self.data.shape}, dtype={self.data.dtype})"
    
    def __str__(self) -> str:
        """For users - readable representation."""
        return f"Tensor with shape {self.data.shape}"
    
    def __len__(self) -> int:
        """Enable len(tensor)."""
        return len(self.data)
    
    def __getitem__(self, idx):
        """Enable tensor[idx]."""
        return self.data[idx]
    
    def __call__(self, x):
        """Enable tensor(x) - used heavily in PyTorch!"""
        return self.data @ x

t = Tensor([[1, 2], [3, 4]])
print(f"repr: {repr(t)}")
print(f"str: {str(t)}")
print(f"len: {len(t)}")
print(f"t[0]: {t[0]}")
print(f"t([1, 1]): {t(np.array([1, 1]))}")  # Callable!

<details>
<summary><b>Q: Why does PyTorch use `__call__` for the forward pass?</b></summary>

**A:** In PyTorch, `model(x)` calls `model.__call__(x)`, which internally calls `model.forward(x)` but also handles:
- Hooks (callbacks before/after forward)
- Gradient tracking setup
- Module state management

This is why you define `forward()` but call `model(x)`, not `model.forward(x)`.
</details>

## 4.5 Inheritance (Essential for PyTorch)

PyTorch's `nn.Module` uses inheritance heavily. You'll subclass it for every model.

<details>
<summary><b>Q: Why do we call `super().__init__()` in subclasses?</b></summary>

**A:** `super().__init__()` calls the parent class's `__init__` method, ensuring proper initialization of inherited attributes. Without it:

```python
class Linear(Module):
    def __init__(self, in_features, out_features):
        # WRONG: forgot super().__init__()
        self.weight = ...
        
layer = Linear(10, 5)
print(layer.training)  # AttributeError! .training was never set
```

In PyTorch, forgetting `super().__init__()` is a common bug that breaks module registration, parameter tracking, and device movement.
</details>

In [None]:
# Simplified nn.Module-like base class
class Module:
    """Base class demonstrating PyTorch's Module pattern."""
    
    def __init__(self):
        self._modules = {}
        self.training = True
    
    def __call__(self, x):
        """When you call model(x), this runs."""
        return self.forward(x)
    
    def forward(self, x):
        """Subclasses MUST override this."""
        raise NotImplementedError("Subclasses must implement forward()")
    
    def train(self, mode: bool = True):
        self.training = mode
        return self
    
    def eval(self):
        return self.train(False)


# Subclass: A simple linear layer
class Linear(Module):
    """Linear layer: y = x @ W + b"""
    
    def __init__(self, in_features: int, out_features: int):
        super().__init__()  # Call parent's __init__
        self.weight = np.random.randn(in_features, out_features) * 0.01
        self.bias = np.zeros(out_features)
    
    def forward(self, x: np.ndarray) -> np.ndarray:
        return x @ self.weight + self.bias  # Broadcasting!


# Subclass: A two-layer network
class TwoLayerNet(Module):
    """Network that composes multiple layers."""
    
    def __init__(self, input_size: int, hidden_size: int, output_size: int):
        super().__init__()
        self.fc1 = Linear(input_size, hidden_size)
        self.fc2 = Linear(hidden_size, output_size)
    
    def forward(self, x: np.ndarray) -> np.ndarray:
        x = self.fc1(x)           # Note: uses __call__, not .forward()
        x = np.maximum(0, x)      # ReLU activation
        x = self.fc2(x)
        return x


# Usage - this is exactly how you'll use PyTorch!
model = TwoLayerNet(784, 128, 10)
x = np.random.randn(32, 784)
output = model(x)  # Calls __call__ -> forward
print(f"Input: {x.shape} -> Output: {output.shape}")
print(f"Training mode: {model.training}")
model.eval()
print(f"After eval(): {model.training}")

## 4.4 Decorators

In [None]:
class Model:
    def __init__(self, name: str):
        self._name = name
        self._is_training = True
    
    @property
    def name(self) -> str:
        """Property decorator - access like an attribute."""
        return self._name
    
    @property
    def is_training(self) -> bool:
        return self._is_training
    
    @is_training.setter
    def is_training(self, value: bool):
        """Setter for property."""
        self._is_training = value
        print(f"Training mode: {value}")
    
    @staticmethod
    def count_parameters(weights: dict) -> int:
        """Static method - doesn't need self."""
        return sum(w.size for w in weights.values())
    
    @classmethod
    def from_config(cls, config: dict):
        """Class method - alternative constructor."""
        return cls(name=config.get("name", "unnamed"))

# Usage
m = Model("MyModel")
print(f"Name: {m.name}")  # Property access
m.is_training = False     # Property setter

m2 = Model.from_config({"name": "ConfigModel"})  # Classmethod
print(f"From config: {m2.name}")

---

# Part 5: Practical Patterns

---

## 5.1 Generators & Iterators

Generators are crucial for memory-efficient data loading.

In [None]:
# Generator function - uses yield
def count_up_to(n: int):
    """Generate numbers from 0 to n-1."""
    i = 0
    while i < n:
        yield i  # Pauses here, returns value
        i += 1

# Usage
for num in count_up_to(5):
    print(num, end=" ")
print()

# Generator expression (like list comprehension but lazy)
squares_gen = (x**2 for x in range(1000000))  # No memory allocated yet!
print(f"Generator: {squares_gen}")
print(f"First 5: {[next(squares_gen) for _ in range(5)]}")

In [None]:
# Why generators matter for DL: memory efficiency
import sys

# List stores all values in memory
big_list = [i**2 for i in range(1000000)]
print(f"List size: {sys.getsizeof(big_list) / 1e6:.1f} MB")

# Generator computes on-demand
def big_gen():
    for i in range(1000000):
        yield i**2

gen = big_gen()
print(f"Generator size: {sys.getsizeof(gen)} bytes")

# DataLoader-style batching
def batch_generator(data: list, batch_size: int):
    """Yield batches from data."""
    for i in range(0, len(data), batch_size):
        yield data[i:i + batch_size]

data = list(range(100))
for batch in batch_generator(data, batch_size=32):
    print(f"Batch: {batch[:3]}... (size {len(batch)})")

<details>
<summary><b>Q: When should you use a generator vs a list?</b></summary>

**A:**
- **Generator**: When data is large, you only need one pass, or values are computed on-demand
- **List**: When you need random access, multiple passes, or the data is small

**DataLoaders use generators** because:
1. Training data is often huge (can't fit in RAM)
2. You only need one batch at a time
3. Data can be augmented on-the-fly
</details>

## 5.2 File I/O with Pathlib

In [None]:
from pathlib import Path

# Create paths (cross-platform!)
data_dir = Path("data")
model_path = data_dir / "models" / "best.pt"

print(f"Path: {model_path}")
print(f"Parent: {model_path.parent}")
print(f"Name: {model_path.name}")
print(f"Stem: {model_path.stem}")
print(f"Suffix: {model_path.suffix}")

# Check existence
print(f"\nExists: {model_path.exists()}")
print(f"Is file: {model_path.is_file()}")

# Find files
current = Path(".")
print(f"\nPython files in current dir: {list(current.glob('*.py'))[:3]}")
print(f"All .ipynb (recursive): {list(current.glob('**/*.ipynb'))[:3]}")

## 5.3 Debugging Strategies

In [None]:
# 1. Print debugging with f-strings
def debug_forward(x, W):
    print(f"DEBUG: x.shape={x.shape}, W.shape={W.shape}")
    result = x @ W
    print(f"DEBUG: result.shape={result.shape}")
    return result

# 2. Assertions - catch bugs early
def normalize(x: np.ndarray) -> np.ndarray:
    assert x.ndim == 2, f"Expected 2D array, got {x.ndim}D"
    assert x.shape[0] > 0, "Empty array"
    return (x - x.mean(axis=0)) / (x.std(axis=0) + 1e-8)

# 3. Shape annotations in comments
def attention(Q, K, V):
    # Q: (batch, heads, seq_len, d_k)
    # K: (batch, heads, seq_len, d_k)
    # V: (batch, heads, seq_len, d_v)
    
    scores = Q @ K.transpose(-2, -1)  # (batch, heads, seq_len, seq_len)
    weights = scores  # Simplified - normally softmax
    output = weights @ V  # (batch, heads, seq_len, d_v)
    return output

# Test
x = np.random.randn(32, 784)
W = np.random.randn(784, 128)
y = debug_forward(x, W)
z = normalize(x)
print(f"\nNormalized shape: {z.shape}")

---

# Part 6: Summary & Exercises

---

## Key Takeaways

### Python Foundations
- Use **type hints** (`def f(x: int) -> str`) for self-documenting code
- Use **comprehensions** for building collections, loops for complex logic
- Use **context managers** (`with`) for resource management
- **Catch specific exceptions**, let others propagate

### NumPy
- **Vectorize** operations - avoid Python loops on arrays
- **Broadcasting** aligns shapes from the right, expanding size-1 dimensions
- **axis=0** collapses rows, **axis=1** collapses columns

### OOP for DL
- **`_protected`** for internal methods, **`__private`** rarely
- **`__call__`** makes objects callable (used by `nn.Module`)
- **`@property`** for computed attributes, **`@classmethod`** for alternative constructors

### Practical Patterns
- **Generators** for memory-efficient iteration (DataLoaders!)
- **Pathlib** for cross-platform file paths
- **Shape comments** for debugging tensor operations

## Self-Assessment Checklist

Before proceeding to Lab 6, you should be able to:

- [ ] Write a function with type hints and a Google-style docstring
- [ ] Convert a loop to a list comprehension
- [ ] Explain what `axis=0` vs `axis=1` means in NumPy reductions
- [ ] Predict the output shape of broadcasting `(3,1) + (4,)`
- [ ] Explain why `__call__` is used in PyTorch modules
- [ ] Write a generator function with `yield`
- [ ] Use `pathlib.Path` to construct file paths

## Comprehensive Exercise: Mini Data Pipeline

Implement a simple data pipeline using the concepts from this lab:

In [None]:
# Exercise: Implement a data pipeline class

class DataPipeline:
    """
    A simple data pipeline for loading and batching data.
    
    TODO: Implement the following methods:
    1. __init__: Store data and batch_size
    2. __len__: Return number of batches
    3. __iter__: Yield batches (use a generator!)
    4. normalize: Normalize data using broadcasting
    """
    
    def __init__(self, data: np.ndarray, batch_size: int = 32):
        # TODO: Store data and batch_size
        pass
    
    def __len__(self) -> int:
        # TODO: Return number of batches (hint: use ceiling division)
        pass
    
    def __iter__(self):
        # TODO: Yield batches of data
        pass
    
    def normalize(self) -> np.ndarray:
        # TODO: Return normalized data (zero mean, unit std per feature)
        pass

# Test your implementation:
# data = np.random.randn(100, 784)
# pipeline = DataPipeline(data, batch_size=32)
# print(f"Number of batches: {len(pipeline)}")
# for i, batch in enumerate(pipeline):
#     print(f"Batch {i}: shape {batch.shape}")
# normalized = pipeline.normalize()
# print(f"Mean after normalization: {normalized.mean(axis=0)[:5]}")  # Should be ~0

<details>
<summary><b>Solution</b></summary>

```python
class DataPipeline:
    def __init__(self, data: np.ndarray, batch_size: int = 32):
        self.data = data
        self.batch_size = batch_size
    
    def __len__(self) -> int:
        return (len(self.data) + self.batch_size - 1) // self.batch_size
    
    def __iter__(self):
        for i in range(0, len(self.data), self.batch_size):
            yield self.data[i:i + self.batch_size]
    
    def normalize(self) -> np.ndarray:
        mean = self.data.mean(axis=0)  # Shape: (784,)
        std = self.data.std(axis=0)    # Shape: (784,)
        return (self.data - mean) / (std + 1e-8)  # Broadcasting!
```
</details>

## References

1. [Python Type Hints Cheat Sheet](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html)
2. [NumPy Broadcasting Rules](https://numpy.org/doc/stable/user/basics.broadcasting.html)
3. [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
4. [Real Python - Generators](https://realpython.com/introduction-to-python-generators/)
5. [PyTorch nn.Module Source](https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/module.py)

---

**Next:** Lab 6 - RNN Foundations