# Dataclasses in Python

## What You'll Learn
- The problem with traditional Python classes (boilerplate code)
- How the `@dataclass` decorator simplifies class creation
- Creating dataclasses with default values
- Field options (frozen, default_factory)
- When to use dataclasses vs regular classes
- Automatic methods (`__init__`, `__repr__`, `__eq__`)

---

## The Problem: Repetitive Boilerplate Code

Traditional Python classes require writing a lot of repetitive code for simple data containers:

In [None]:
# ❌ Traditional class with lots of boilerplate
class Product:
    def __init__(self, name: str, price: float, quantity: int):
        self.name = name
        self.price = price
        self.quantity = quantity
    
    def __repr__(self):
        return f"Product(name={self.name!r}, price={self.price!r}, quantity={self.quantity!r})"
    
    def __eq__(self, other):
        if not isinstance(other, Product):
            return False
        return (self.name == other.name and 
                self.price == other.price and 
                self.quantity == other.quantity)

# Create product
laptop = Product("Laptop", 999.99, 5)
print(laptop)

**Problems:**
- Must write `__init__` and assign each parameter manually
- Must write `__repr__` for readable string representation
- Must write `__eq__` for equality comparison
- Easy to make mistakes (typos, forgot to update a method)
- Lots of repetitive code for simple data storage

---

## The Solution: Dataclasses

The `@dataclass` decorator automatically generates common methods, reducing boilerplate code significantly:

In [None]:
from dataclasses import dataclass

# ✅ Dataclass - much simpler!
@dataclass
class Product:
    name: str
    price: float
    quantity: int

# Create product (same usage)
laptop = Product("Laptop", 999.99, 5)
print(laptop)

# Check equality
another_laptop = Product("Laptop", 999.99, 5)
print(f"Are they equal? {laptop == another_laptop}")

**What happens:**
1. `@dataclass` decorator automatically generates `__init__`, `__repr__`, and `__eq__`
2. Type hints are required for each field
3. Fields are defined as class variables with type annotations
4. Instances behave like regular class instances

---

## Creating Dataclasses

### Basic Dataclass

In [None]:
from dataclasses import dataclass

@dataclass
class User:
    """Represents a user in the system."""
    username: str
    email: str
    age: int

# Create users
user1 = User("alice", "alice@example.com", 28)
user2 = User("bob", "bob@example.com", 32)

print(user1)
print(user2)

# Access attributes
print(f"\n{user1.username}'s email: {user1.email}")

### Dataclass with Default Values

Fields can have default values. Fields with defaults must come after fields without defaults:

In [None]:
@dataclass
class User:
    """User with optional fields."""
    username: str
    email: str
    age: int = 18  # Default value
    is_active: bool = True  # Default value

# Create users with and without defaults
user1 = User("alice", "alice@example.com")  # Uses defaults
user2 = User("bob", "bob@example.com", 32, False)  # Override defaults

print(user1)
print(user2)

### Using default_factory for Mutable Defaults

**Never use mutable objects (lists, dicts) as default values directly.** Use `field(default_factory=...)` instead:

In [None]:
from dataclasses import dataclass, field

# ❌ WRONG: Don't do this!
# @dataclass
# class ShoppingCart:
#     items: list = []  # BAD: All instances share the same list!

# ✅ CORRECT: Use default_factory
@dataclass
class ShoppingCart:
    """Shopping cart for a user."""
    user: str
    items: list[str] = field(default_factory=list)
    total: float = 0.0

# Create carts
cart1 = ShoppingCart("alice")
cart2 = ShoppingCart("bob")

# Add items
cart1.items.append("Laptop")
cart2.items.append("Mouse")

print(f"Alice's cart: {cart1}")
print(f"Bob's cart: {cart2}")

**What happens:**
1. `field(default_factory=list)` creates a **new** empty list for each instance
2. Without `default_factory`, all instances would share the same list (bug!)
3. Works for any callable: `list`, `dict`, `set`, or custom functions

---

## Field Options

### Frozen Dataclasses (Immutable)

Use `frozen=True` to make instances immutable (cannot change attributes after creation):

In [None]:
@dataclass(frozen=True)
class Point:
    """Immutable 2D point."""
    x: float
    y: float

point = Point(3.0, 4.0)
print(point)

# Try to modify (will raise error)
try:
    point.x = 5.0
except Exception as e:
    print(f"Error: {e}")

### Field Options: init, repr, compare

Control which methods include specific fields:

In [None]:
from dataclasses import dataclass, field

@dataclass
class Product:
    """Product with internal tracking."""
    name: str
    price: float
    # Internal field: not in __init__, not in __repr__
    _internal_id: int = field(default=0, init=False, repr=False)
    # Exclude from comparison
    stock_count: int = field(default=0, compare=False)

product1 = Product("Laptop", 999.99, 10)
product2 = Product("Laptop", 999.99, 5)

print(product1)
print(f"Are they equal? {product1 == product2}")  # True (stock_count ignored)

---

## Comparison: Regular Class vs Dataclass

| Feature | Regular Class | Dataclass |
|---------|---------------|------------|
| `__init__` | Manual | ✅ Auto-generated |
| `__repr__` | Manual | ✅ Auto-generated |
| `__eq__` | Manual | ✅ Auto-generated |
| Type hints | Optional | Required |
| Boilerplate | High | Low |
| Best for | Complex logic | Data storage |
| Methods | Any custom methods | Any custom methods |

---

## Practical Example: Order Management System

In [None]:
from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class OrderItem:
    """Single item in an order."""
    product_name: str
    quantity: int
    unit_price: float
    
    def total_price(self) -> float:
        """Calculate total price for this item."""
        return self.quantity * self.unit_price

@dataclass
class Order:
    """Customer order."""
    order_id: str
    customer: str
    items: list[OrderItem] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    
    def add_item(self, item: OrderItem) -> None:
        """Add item to order."""
        self.items.append(item)
    
    def total_amount(self) -> float:
        """Calculate total order amount."""
        return sum(item.total_price() for item in self.items)

# Create order
order = Order("ORD-001", "Alice")

# Add items
order.add_item(OrderItem("Laptop", 1, 999.99))
order.add_item(OrderItem("Mouse", 2, 25.50))

# Display order
print(order)
print(f"\nTotal: ${order.total_amount():.2f}")

**What happens:**
1. `OrderItem` stores product details and calculates item total
2. `Order` uses `default_factory` for mutable `items` list
3. `created_at` uses `datetime.now` as factory (timestamp per order)
4. Both classes can have custom methods alongside auto-generated ones

---

## When to Use Dataclasses

### ✅ Use Dataclasses When:
- Storing related data together (data containers)
- Need automatic `__init__`, `__repr__`, `__eq__`
- Working with typed data structures
- Building simple models or DTOs (Data Transfer Objects)
- Want less boilerplate code

### ❌ Don't Use Dataclasses When:
- Need complex initialization logic
- Want to control `__init__` behavior precisely
- Class has more methods than data attributes
- Need validation on attribute assignment (use Pydantic instead)

### Examples:

In [None]:
# ✅ Good use case: Data container
@dataclass
class Address:
    """Physical address."""
    street: str
    city: str
    state: str
    zip_code: str

# ✅ Good use case: Configuration
@dataclass
class DatabaseConfig:
    """Database connection configuration."""
    host: str = "localhost"
    port: int = 5432
    database: str = "mydb"
    username: str = "user"

config = DatabaseConfig(database="production")
print(config)

---

## Best Practices

### ✅ Do:
- Use dataclasses for simple data containers
- Add docstrings to dataclasses
- Use `frozen=True` for immutable data
- Combine dataclasses with type hints
- Add custom methods when needed

### ❌ Don't:
- Don't use dataclasses when you need complex validation
- Don't put fields without defaults before fields with defaults
- Don't use dataclasses for classes with lots of behavior

### Examples:

In [None]:
from dataclasses import dataclass, field

# ✅ Good: Proper use of defaults and default_factory
@dataclass
class UserProfile:
    """User profile with preferences."""
    username: str
    email: str
    age: int = 18
    tags: list[str] = field(default_factory=list)
    settings: dict[str, bool] = field(default_factory=dict)

# ❌ Bad: Field ordering
# @dataclass
# class BadExample:
#     name: str = "Unknown"  # Has default
#     age: int  # ERROR: No default after field with default

# ✅ Good: Correct field ordering
@dataclass
class GoodExample:
    """Correct field ordering."""
    age: int  # No default
    name: str = "Unknown"  # Has default

example = GoodExample(25)
print(example)

---

## Summary

### Key Concepts:
- **Dataclasses** eliminate boilerplate for data-focused classes
- `@dataclass` decorator auto-generates `__init__`, `__repr__`, `__eq__`
- Type hints are required for all fields
- Use `field(default_factory=...)` for mutable default values
- `frozen=True` makes instances immutable
- Best for data containers, not classes with complex logic

### Syntax Reference:
```python
from dataclasses import dataclass, field

# Basic dataclass
@dataclass
class Person:
    name: str
    age: int
    active: bool = True

# With mutable defaults
@dataclass
class Team:
    name: str
    members: list[str] = field(default_factory=list)

# Frozen (immutable)
@dataclass(frozen=True)
class Point:
    x: float
    y: float
```

### Next Steps:
Next, learn about [Pydantic Basics](03-pydantic-basics.ipynb) for dataclasses with built-in validation and more powerful features.