# Phase 6: Pydantic and Type Safety
## Making invalid state unrepresentable

Odibi's entire configuration system is built on **Pydantic**. 
When someone writes a YAML config, Pydantic validates every field automatically. 
Wrong type? Error. Missing required field? Error. Invalid enum value? Error.

By the end of this notebook, you will understand `odibi/config.py` completely 
and be able to build your own config system for mini-odibi.

---
## Section 1: Type Hints

Type hints tell Python (and other developers) what types a function expects. 
Python does NOT enforce them at runtime -- but Pydantic does.

In [None]:
from typing import List, Dict, Optional, Union

# Basic type hints
def process_node(name: str, rows: int, active: bool = True) -> str:
    """Process a node and return a status message."""
    return f"{name}: {rows} rows, active={active}"

print(process_node("customers", 1542))

# Complex type hints
def summarize(nodes: List[str], counts: Dict[str, int]) -> Optional[str]:
    """Summarize pipeline. Returns None if no nodes."""
    if not nodes:
        return None
    total = sum(counts.get(n, 0) for n in nodes)
    return f"{len(nodes)} nodes, {total} total rows"

result = summarize(["customers", "orders"], {"customers": 1542, "orders": 8930})
print(result)

---
## Section 2: Pydantic BaseModel

Pydantic models look like dataclasses but with automatic validation.

In [None]:
from pydantic import BaseModel, Field
from typing import List, Optional

class NodeConfig(BaseModel):
    """Configuration for a single pipeline node."""
    name: str
    source: str
    format: str = "csv"
    write_mode: str = "overwrite"
    keys: List[str] = []
    enabled: bool = True

# Create with valid data
node = NodeConfig(name="customers", source="raw_customers.csv", write_mode="upsert")
print(node)
print(node.name)
print(node.keys)  # [] (default)

# Convert to dict
print(node.model_dump())

# Pydantic auto-converts types when possible
node2 = NodeConfig(name="orders", source="orders.csv", enabled="true")  # String "true" -> bool True
print(node2.enabled)  # True

### Exercise 2.1
Create a `ConnectionConfig` model with: name (str), type (str, default 'local'), 
base_path (str), options (Optional[Dict[str, str]], default None).

In [None]:
# Exercise 2.1
# YOUR CODE HERE
from pydantic import BaseModel
from typing import Optional, Dict


---
## Section 3: Validators

Pydantic validators let you add custom validation logic.

In [None]:
from pydantic import BaseModel, field_validator, model_validator
from typing import List, Optional
from enum import Enum

class WriteMode(str, Enum):
    OVERWRITE = "overwrite"
    APPEND = "append"
    UPSERT = "upsert"

class ValidatedNodeConfig(BaseModel):
    name: str
    source: str
    write_mode: WriteMode = WriteMode.OVERWRITE
    keys: List[str] = []

    @field_validator('name')
    @classmethod
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError("name cannot be empty")
        return v.strip().lower()

    @model_validator(mode='after')
    def keys_required_for_upsert(self):
        if self.write_mode == WriteMode.UPSERT and not self.keys:
            raise ValueError("keys are required when write_mode is upsert")
        return self

# Valid
n1 = ValidatedNodeConfig(name="  Customers  ", source="data.csv")
print(n1.name)  # customers (cleaned by validator)

# Invalid: upsert without keys
try:
    n2 = ValidatedNodeConfig(name="orders", source="o.csv", write_mode="upsert")
except Exception as e:
    print(f"Validation error: {e}")

### Exercise 3.1
Create a `TransformConfig` model with: type (str), params (dict). 
Add a validator that ensures type is not empty and params is not empty.

In [None]:
# Exercise 3.1
# YOUR CODE HERE


---
## Section 4: Nested Models and YAML Loading

Models can contain other models. This is how Odibi builds pipeline configs.

In [None]:
from pydantic import BaseModel
from typing import List, Optional

class TransformStep(BaseModel):
    type: str
    params: dict = {}

class NodeDef(BaseModel):
    name: str
    source: str
    transforms: List[TransformStep] = []
    write_mode: str = "overwrite"

class PipelineConfig(BaseModel):
    name: str
    engine: str = "pandas"
    nodes: List[NodeDef]

# Build from a dict (this is what happens when you load YAML)
config_dict = {
    "name": "sales_pipeline",
    "engine": "pandas",
    "nodes": [
        {
            "name": "customers",
            "source": "data/customers.csv",
            "transforms": [{"type": "rename_columns", "params": {"old": "id", "new": "customer_id"}}],
            "write_mode": "upsert",
        }
    ],
}

pipeline = PipelineConfig(**config_dict)
print(pipeline.name)
print(pipeline.nodes[0].name)
print(pipeline.nodes[0].transforms[0].type)

In [None]:
# Loading from YAML
import yaml
from pathlib import Path

# Create a sample YAML config
yaml_content = """
name: test_pipeline
engine: pandas
nodes:
  - name: customers
    source: data/customers.csv
    write_mode: upsert
    transforms:
      - type: rename_columns
        params:
          old: id
          new: customer_id
  - name: orders
    source: data/orders.csv
    write_mode: append
"""

# Parse YAML -> dict -> Pydantic model
raw = yaml.safe_load(yaml_content)
pipeline = PipelineConfig(**raw)

print(f"Pipeline: {pipeline.name}")
print(f"Engine: {pipeline.engine}")
for node in pipeline.nodes:
    print(f"  Node: {node.name} ({node.write_mode}), transforms: {len(node.transforms)}")

### Exercise 4.1: Build a mini-odibi config
Create Pydantic models for: ConnectionConfig, NodeConfig, PipelineConfig. 
Load them from a YAML string. This is the config system you will use in Phase 8.

In [None]:
# Exercise 4.1
# YOUR CODE HERE


---
## Checkpoint

You now understand Pydantic:
- Type hints and why they matter
- BaseModel for automatic validation
- Field validators (@field_validator, @model_validator)
- Enum integration
- Nested models
- YAML -> dict -> Pydantic model pipeline

You can now read `odibi/config.py` (4000+ lines) and understand every pattern in it.

**Next:** Phase 7 -- Pandas Deep Dive.