# Pydantic Fundamentals - 5 Essential Questions

This notebook covers the essential foundations of Pydantic, from beginner to intermediate level.

**Topics Covered:**
1. Optional Fields & Defaults
2. Type Coercion
3. Validation Errors
4. Model Creation Methods
5. Nested Models & Lists

## Setup

First, let's import the required libraries.

In [None]:
from pydantic import BaseModel, ValidationError, Field, field_validator
from typing import Optional
from datetime import datetime
import time
import logging

print("Pydantic imported successfully!")

---

## Question 1: How do you define optional fields and default values in Pydantic models?

**Key Concepts:**
- Required fields (no default)
- Optional fields with defaults
- Nullable fields (can be None)

### Required Fields (No Default)

Fields without default values are required - they must be provided when creating an instance.

In [None]:
class User(BaseModel):
    id: int          # Required - must be provided
    username: str    # Required - must be provided

# This works
user = User(id=1, username="alice")
print(f"Created user: {user}")

# This fails - missing required fields
try:
    user = User(id=1)  # Missing 'username'
except ValidationError as e:
    print(f"\nValidation Error: {e}")

### Optional Fields with Default Values

Fields with default values are optional - if not provided, the default value is used.

In [None]:
class UserWithDefaults(BaseModel):
    id: int
    username: str
    role: str = "user"           # Optional with default
    is_active: bool = True       # Optional with default
    credits: int = 0             # Optional with default

# Only required fields provided - defaults are used
user = UserWithDefaults(id=1, username="alice")
print(f"Role: {user.role}")
print(f"Is Active: {user.is_active}")
print(f"Credits: {user.credits}")

print("\n--- Override defaults ---")

# Override defaults
user2 = UserWithDefaults(id=2, username="bob", role="admin", credits=100)
print(f"Role: {user2.role}")
print(f"Credits: {user2.credits}")

### Nullable Fields (Can Be None)

Use `Optional[Type]` or `Type | None` for fields that can accept None as a value.

In [None]:
class UserNullable(BaseModel):
    id: int
    username: str
    email: Optional[str] = None      # Can be None, defaults to None
    phone: str | None = None          # Python 3.10+ syntax
    bio: Optional[str] = "No bio"     # Can be None, but defaults to string

# All valid
user1 = UserNullable(id=1, username="alice")
print(f"User1 email: {user1.email}")

user2 = UserNullable(id=2, username="bob", email="bob@example.com")
print(f"User2 email: {user2.email}")

user3 = UserNullable(id=3, username="charlie", email=None)
print(f"User3 email: {user3.email} (explicitly set to None)")

### Important Distinction: Default Values vs Nullable Fields

In [None]:
class Product(BaseModel):
    name: str
    price: float
    
    # These are DIFFERENT:
    description: str = "No description"        # Always string, never None
    category: Optional[str] = None             # Can be None or string
    tags: list[str] = []                       # Always list, never None
    metadata: Optional[dict] = None            # Can be None or dict
    
product = Product(name="Coffee", price=4.99)
print(f"Description: {product.description} (type: {type(product.description).__name__})")
print(f"Category: {product.category} (type: {type(product.category)})")
print(f"Tags: {product.tags} (type: {type(product.tags).__name__})")
print(f"Metadata: {product.metadata} (type: {type(product.metadata)})")

### Using Field() for Complex Defaults

Use `Field()` for validation constraints and `default_factory` for mutable defaults.

In [None]:
class Config(BaseModel):
    app_name: str
    debug: bool = Field(default=False)
    max_connections: int = Field(default=100, ge=1, le=1000)
    allowed_hosts: list[str] = Field(default_factory=list)  # Mutable default

config = Config(app_name="MyApp")
print(f"Debug: {config.debug}")
print(f"Max Connections: {config.max_connections}")
print(f"Allowed Hosts: {config.allowed_hosts}")

---

## Question 2: How does type coercion work in Pydantic and when might it cause issues?

**Key Concepts:**
- Automatic type conversion
- Potential data loss
- Strict mode

### Common Coercion Patterns

In [None]:
class CoercionModel(BaseModel):
    integer: int
    floating: float
    text: str
    flag: bool

# Type coercion in action
m = CoercionModel(
    integer="123",       # str → int
    floating="3.14",     # str → float
    text=456,            # int → str
    flag=1               # int → bool
)

print(f"integer: {m.integer} (type: {type(m.integer).__name__})")
print(f"floating: {m.floating} (type: {type(m.floating).__name__})")
print(f"text: {m.text} (type: {type(m.text).__name__})")
print(f"flag: {m.flag} (type: {type(m.flag).__name__})")

### Data Loss Example - Float to Int

In [None]:
class Score(BaseModel):
    points: int

# Decimal is truncated, not rounded
score1 = Score(points=99.9)
print(f"Score with 99.9: {score1.points} (lost 0.9!)")

score2 = Score(points=99.1)
print(f"Score with 99.1: {score2.points} (lost 0.1!)")

### Boolean Coercion - Surprising Behavior

In [None]:
class Settings(BaseModel):
    enabled: bool

# These all become True
print("Values that become True:")
print(f"  1 → {Settings(enabled=1).enabled}")
print(f"  'yes' → {Settings(enabled='yes').enabled}")
print(f"  'true' → {Settings(enabled='true').enabled}")

# These all become False
print("\nValues that become False:")
print(f"  0 → {Settings(enabled=0).enabled}")
print(f"  '' → {Settings(enabled='').enabled}")
print(f"  'false' → {Settings(enabled='false').enabled}")

### String Coercion - Almost Anything to String

In [None]:
class Log(BaseModel):
    message: str

# Almost anything becomes a string
print(f"123 → '{Log(message=123).message}'")
print(f"3.14 → '{Log(message=3.14).message}'")
print(f"True → '{Log(message=True).message}'")
print(f"[1, 2, 3] → '{Log(message=[1, 2, 3]).message}'")

### When Coercion Causes Problems

In [None]:
# Financial calculation issue
class Transaction(BaseModel):
    amount: int  # Cents

# User sends dollars as float
transaction = Transaction(amount=19.99)
print(f"Amount: {transaction.amount} cents, not 1999 cents!")

# This silently loses money!

### Solution - Strict Mode

In [None]:
class StrictModel(BaseModel):
    user_id: int = Field(strict=True)
    price: float = Field(strict=True)

# Now coercion is disabled
try:
    model = StrictModel(user_id="123", price="9.99")
except ValidationError as e:
    print("Validation failed with strict mode:")
    for error in e.errors():
        print(f"  {error['loc'][0]}: {error['msg']}")
    
# Only exact types work
print("\nWith correct types:")
model = StrictModel(user_id=123, price=9.99)
print(f"  Created: {model}")

---

## Question 3: How do you handle validation errors in Pydantic and extract useful error information?

**Key Concepts:**
- Catching ValidationError
- Accessing error details
- Custom error messages

### Basic Error Handling

In [None]:
class UserValidation(BaseModel):
    id: int
    username: str
    age: int

# Catch validation errors
try:
    user = UserValidation(id="abc", username=123, age="invalid")
except ValidationError as e:
    print("Validation Error:")
    print(e)

### Accessing Error Details Programmatically

In [None]:
try:
    user = UserValidation(id="abc", username=123, age="invalid")
except ValidationError as e:
    # Get error count
    print(f"Found {e.error_count()} errors\n")
    
    # Get list of error dictionaries
    for error in e.errors():
        print(f"Field: {error['loc']}")
        print(f"Message: {error['msg']}")
        print(f"Type: {error['type']}")
        print(f"Input: {error['input']}")
        print("---")

### Getting JSON Error Response (Perfect for APIs)

In [None]:
try:
    user = UserValidation(id="abc", username=123, age="invalid")
except ValidationError as e:
    import json
    error_json = e.json()
    print("JSON Error Response:")
    print(json.dumps(json.loads(error_json), indent=2))

### Custom Error Messages with Validators

In [None]:
class ProductValidation(BaseModel):
    name: str
    price: float
    
    @field_validator('price')
    @classmethod
    def price_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError('Price must be greater than zero')
        return v

try:
    product = ProductValidation(name="Coffee", price=-5.99)
except ValidationError as e:
    for error in e.errors():
        print(f"Field: {error['loc'][0]}")
        print(f"Message: {error['msg']}")

### Nested Model Errors

In [None]:
class AddressNested(BaseModel):
    street: str
    zip_code: str = Field(pattern=r'^\d{5}$')

class PersonNested(BaseModel):
    name: str
    address: AddressNested

try:
    person = PersonNested(
        name="Alice",
        address={"street": "Main St", "zip_code": "invalid"}
    )
except ValidationError as e:
    for error in e.errors():
        print(f"Location: {error['loc']}")
        print(f"Message: {error['msg']}")

---

## Question 4: What is the difference between `model_validate()` and creating an instance directly?

**Key Concepts:**
- Direct instantiation
- model_validate() for dictionaries
- model_validate_json() for JSON strings
- model_construct() for bypassing validation

### Direct Instantiation (Keyword Arguments)

In [None]:
class UserCreation(BaseModel):
    id: int
    username: str

# Direct instantiation with keyword arguments
user = UserCreation(id=1, username="alice")
print(f"Direct: {user}")

### model_validate() - Parse Dictionary

In [None]:
# When you have a dictionary
data = {"id": 2, "username": "bob"}
user = UserCreation.model_validate(data)
print(f"model_validate: {user}")

# Equivalent to unpacking
user2 = UserCreation(**data)  # Same result
print(f"Unpacking: {user2}")

### model_validate_json() - Parse JSON String

In [None]:
# When you have a JSON string
json_string = '{"id": 3, "username": "charlie"}'
user = UserCreation.model_validate_json(json_string)
print(f"model_validate_json: {user}")

### model_construct() - Skip Validation (Dangerous!)

⚠️ **Warning**: Only use when you're 100% sure data is valid!

In [None]:
# Create without validation - use with extreme caution
user = UserCreation.model_construct(id="not_an_int", username=12345)
print(f"model_construct: {user}")
print(f"id type: {type(user.id).__name__} - no validation happened!")

### When to Use Each Method

In [None]:
class Event(BaseModel):
    name: str
    timestamp: datetime

# 1. Direct instantiation - when you have individual values
event1 = Event(name="Meeting", timestamp=datetime.now())
print(f"1. Direct: {event1}")

# 2. model_validate() - when parsing dict from API/database
api_response = {"name": "Conference", "timestamp": "2024-06-15T10:00:00"}
event2 = Event.model_validate(api_response)
print(f"2. model_validate: {event2}")

# 3. model_validate_json() - when parsing JSON string
json_data = '{"name":"Workshop","timestamp":"2024-07-01T14:00:00"}'
event3 = Event.model_validate_json(json_data)
print(f"3. model_validate_json: {event3}")

# 4. model_construct() - when loading trusted data at scale
database_row = {"name": "Seminar", "timestamp": datetime(2024, 8, 1)}
event4 = Event.model_construct(**database_row)  # No validation
print(f"4. model_construct: {event4}")

### Performance Comparison

In [None]:
# Setup
data_dict = {"id": 1, "username": "test"}
iterations = 10000

# Method 1: Direct with unpacking
start = time.time()
for _ in range(iterations):
    user = UserCreation(**data_dict)
time1 = time.time() - start

# Method 2: model_validate()
start = time.time()
for _ in range(iterations):
    user = UserCreation.model_validate(data_dict)
time2 = time.time() - start

# Method 3: model_construct() (no validation)
start = time.time()
for _ in range(iterations):
    user = UserCreation.model_construct(**data_dict)
time3 = time.time() - start

print(f"Performance ({iterations} iterations):")
print(f"  Direct unpacking: {time1:.3f}s")
print(f"  model_validate(): {time2:.3f}s")
print(f"  model_construct(): {time3:.3f}s (fastest but no validation!)")

---

## Question 5: How do you work with nested models and lists in Pydantic?

**Key Concepts:**
- Nested model validation
- Lists of models
- Complex hierarchical structures
- Serialization

### Basic Nested Model

In [None]:
class Address(BaseModel):
    street: str
    city: str
    zip_code: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested model

# Create with nested dictionary
person = Person(
    name="Alice",
    age=30,
    address={
        "street": "123 Main St",
        "city": "Boston",
        "zip_code": "02101"
    }
)

# Access nested attributes
print(f"Name: {person.name}")
print(f"City: {person.address.city}")
print(f"Zip Code: {person.address.zip_code}")

### Nested Validation Cascades

In [None]:
# Invalid nested data is caught
try:
    person = Person(
        name="Bob",
        age="invalid",  # Error at Person level
        address={
            "street": "456 Oak Ave",
            "city": 999,  # Error at Address level
            "zip_code": "10001"
        }
    )
except ValidationError as e:
    print("Validation errors at multiple levels:")
    for error in e.errors():
        print(f"  {error['loc']}: {error['msg']}")

### List of Primitives

In [None]:
class TodoList(BaseModel):
    title: str
    items: list[str]
    tags: list[str] = []

todo = TodoList(
    title="Shopping",
    items=["Milk", "Bread", "Eggs"],
    tags=["groceries", "urgent"]
)

print(f"Title: {todo.title}")
print(f"Items: {todo.items}")
print(f"Number of items: {len(todo.items)}")

### List of Nested Models

In [None]:
class Item(BaseModel):
    name: str
    price: float
    quantity: int

class Order(BaseModel):
    order_id: int
    items: list[Item]  # List of Item models

# Create with list of dictionaries
order = Order(
    order_id=1001,
    items=[
        {"name": "Coffee", "price": 4.99, "quantity": 2},
        {"name": "Muffin", "price": 3.50, "quantity": 1}
    ]
)

# Access items
print("Order items:")
for item in order.items:
    print(f"  {item.name}: ${item.price} x {item.quantity}")

# Calculate total
total = sum(item.price * item.quantity for item in order.items)
print(f"Total: ${total:.2f}")

### Complex Nested Structure

In [None]:
class Contact(BaseModel):
    email: str
    phone: str | None = None

class CompanyAddress(BaseModel):
    street: str
    city: str
    country: str = "USA"

class Company(BaseModel):
    name: str
    employees: list[str]
    address: CompanyAddress
    contact: Contact

class CompanyUser(BaseModel):
    username: str
    company: Company

# Deep nesting with validation at all levels
user = CompanyUser(
    username="alice",
    company={
        "name": "Tech Corp",
        "employees": ["Bob", "Charlie", "Diana"],
        "address": {
            "street": "100 Tech Blvd",
            "city": "San Francisco"
        },
        "contact": {
            "email": "info@techcorp.com",
            "phone": "+1-555-0100"
        }
    }
)

# Deep attribute access
print(f"Username: {user.username}")
print(f"Company: {user.company.name}")
print(f"City: {user.company.address.city}")
print(f"Email: {user.company.contact.email}")
print(f"Number of employees: {len(user.company.employees)}")

### Optional Nested Models

In [None]:
class Profile(BaseModel):
    bio: str
    website: str | None = None

class UserProfile(BaseModel):
    username: str
    profile: Optional[Profile] = None  # Nested model can be None

# User without profile
user1 = UserProfile(username="bob")
print(f"User1 profile: {user1.profile}")

# User with profile
user2 = UserProfile(
    username="alice",
    profile={"bio": "Software Engineer", "website": "alice.dev"}
)
print(f"User2 bio: {user2.profile.bio}")

### Serialization of Nested Models

In [None]:
class AddressSer(BaseModel):
    street: str
    city: str

class PersonSer(BaseModel):
    name: str
    address: AddressSer

person = PersonSer(
    name="Alice",
    address={"street": "123 Main St", "city": "NYC"}
)

# Convert to dict - maintains nested structure
data = person.model_dump()
print("model_dump():")
print(data)

# Convert to JSON
json_str = person.model_dump_json(indent=2)
print("\nmodel_dump_json():")
print(json_str)

### Real-World Example - API Response

In [None]:
class Author(BaseModel):
    id: int
    name: str

class Comment(BaseModel):
    id: int
    text: str
    author: Author
    created_at: datetime

class Post(BaseModel):
    id: int
    title: str
    content: str
    author: Author
    comments: list[Comment]
    tags: list[str] = []

# Parse complex API response
api_response = {
    "id": 1,
    "title": "Learning Pydantic",
    "content": "Pydantic is awesome!",
    "author": {"id": 100, "name": "Alice"},
    "comments": [
        {
            "id": 1,
            "text": "Great post!",
            "author": {"id": 101, "name": "Bob"},
            "created_at": "2024-01-15T10:30:00"
        },
        {
            "id": 2,
            "text": "Very helpful!",
            "author": {"id": 102, "name": "Charlie"},
            "created_at": "2024-01-15T11:00:00"
        }
    ],
    "tags": ["python", "pydantic", "tutorial"]
}

post = Post.model_validate(api_response)

# Easy access to nested data
print(f"Post by {post.author.name}")
print(f"{len(post.comments)} comments:")
for comment in post.comments:
    print(f"  - {comment.author.name}: {comment.text}")

---

## Summary: Beginner to Intermediate Mastery

These five questions cover the essential foundations of Pydantic:

1. **Optional Fields & Defaults** - Understanding field requirements and nullability
2. **Type Coercion** - How automatic type conversion works and its pitfalls
3. **Validation Errors** - Catching, inspecting, and communicating errors effectively
4. **Model Creation Methods** - Choosing the right method for different scenarios
5. **Nested Models** - Building and validating complex hierarchical structures

Mastering these concepts will prepare you for most real-world Pydantic use cases, from simple API validation to complex data processing pipelines.