üî¥ Avanc√© | ‚è± 60 min | üîë Concepts : BaseModel, Field, validators, serialization, Settings

# Pydantic : Validation de Donn√©es avec Python

## Objectifs

√Ä la fin de ce notebook, vous serez capable de :
- Cr√©er des mod√®les de donn√©es avec Pydantic
- Utiliser les types et contraintes de validation
- √âcrire des validateurs personnalis√©s
- S√©rialiser et d√©s√©rialiser des donn√©es
- Cr√©er des mod√®les imbriqu√©s
- G√©rer la configuration avec Settings
- Int√©grer Pydantic dans des pipelines de donn√©es

## Pr√©requis

- Python 3.8+
- Compr√©hension des types Python (type hints)
- Bases de la programmation orient√©e objet

## 1. Pourquoi Valider les Donn√©es ?

**"Garbage in, garbage out"** : des donn√©es invalides produisent des r√©sultats incorrects.

**Probl√®mes sans validation :**
- Erreurs silencieuses (types incorrects)
- Bugs difficiles √† d√©boguer
- Manque de documentation du sch√©ma de donn√©es
- Incoh√©rences entre syst√®mes

**Pydantic apporte :**
- ‚úÖ Validation automatique des types
- ‚úÖ Conversion de types intelligente
- ‚úÖ Messages d'erreur clairs
- ‚úÖ Documentation auto-g√©n√©r√©e
- ‚úÖ Performances optimales (Rust sous le capot en v2)

**Installation :**
```bash
pip install pydantic
```

In [None]:
from pydantic import BaseModel, Field, field_validator, model_validator, ValidationError
from pydantic_settings import BaseSettings
from typing import Optional, List, Dict
from datetime import datetime, date
from decimal import Decimal
import pydantic

print(f"Pydantic version : {pydantic.__version__}")

## 2. BaseModel : Cr√©er un Mod√®le

La classe de base pour tous les mod√®les Pydantic.

In [None]:
# Mod√®le simple
class User(BaseModel):
    id: int
    name: str
    email: str
    age: int

# Cr√©ation d'une instance
user = User(id=1, name="Alice", email="alice@example.com", age=30)
print("User cr√©√©:")
print(user)
print(f"Type: {type(user)}")

# Acc√®s aux attributs
print(f"\nNom: {user.name}")
print(f"Email: {user.email}")

# Conversion automatique de types
user2 = User(id="2", name="Bob", email="bob@example.com", age="25")
print(f"\nUser2 (conversion auto):")
print(f"ID type: {type(user2.id)} = {user2.id}")
print(f"Age type: {type(user2.age)} = {user2.age}")

# Validation d'erreur
try:
    user_invalid = User(id="abc", name="Charlie", email="charlie@example.com", age=30)
except ValidationError as e:
    print("\n‚ùå Erreur de validation:")
    print(e)

## 3. Types Support√©s

Pydantic supporte une large gamme de types Python.

In [None]:
from typing import List, Dict, Set, Tuple

class TypesDemo(BaseModel):
    # Types de base
    text: str
    number: int
    price: float
    is_active: bool
    
    # Types sp√©ciaux
    created_at: datetime
    birth_date: date
    precise_value: Decimal
    
    # Types optionnels
    optional_field: Optional[str] = None
    
    # Collections
    tags: List[str]
    metadata: Dict[str, str]
    unique_ids: Set[int]
    coordinates: Tuple[float, float]

# Exemple
demo = TypesDemo(
    text="Hello",
    number=42,
    price=19.99,
    is_active=True,
    created_at="2024-01-01T12:00:00",  # Conversion auto
    birth_date="1990-05-15",  # Conversion auto
    precise_value="123.45",
    tags=["python", "pydantic"],
    metadata={"source": "api", "version": "1.0"},
    unique_ids={1, 2, 3, 3},  # Les doublons sont retir√©s
    coordinates=(48.8566, 2.3522)
)

print("TypesDemo:")
print(demo)
print(f"\nType de created_at: {type(demo.created_at)}")
print(f"Type de precise_value: {type(demo.precise_value)}")
print(f"unique_ids (doublons retir√©s): {demo.unique_ids}")

## 4. Field : Valeurs par D√©faut et Contraintes

Field() permet d'ajouter des m√©tadonn√©es et contraintes aux champs.

In [None]:
from pydantic import Field

class Product(BaseModel):
    id: int = Field(..., description="Identifiant unique du produit")
    name: str = Field(..., min_length=1, max_length=100, description="Nom du produit")
    description: Optional[str] = Field(None, max_length=500)
    price: float = Field(..., gt=0, description="Prix en euros (> 0)")
    quantity: int = Field(0, ge=0, description="Quantit√© en stock (>= 0)")
    discount: float = Field(0.0, ge=0, le=1, description="Remise (0-1)")
    tags: List[str] = Field(default_factory=list, description="Tags du produit")
    
    # Alias pour le nom de champ
    sku: str = Field(..., alias="product_sku", pattern="^[A-Z]{3}-[0-9]{4}$")

# Exemple valide
product = Product(
    id=1,
    name="Laptop",
    price=999.99,
    quantity=10,
    discount=0.1,
    product_sku="LAP-0001"  # alias
)
print("Produit valide:")
print(product)

# Test des contraintes
print("\n=== Tests de validation ===")

# Prix n√©gatif
try:
    Product(id=2, name="Phone", price=-10, product_sku="PHO-0002")
except ValidationError as e:
    print("\n‚ùå Prix n√©gatif:")
    print(e.errors()[0]['msg'])

# Nom trop long
try:
    Product(id=3, name="A" * 101, price=100, product_sku="AAA-0003")
except ValidationError as e:
    print("\n‚ùå Nom trop long:")
    print(e.errors()[0]['msg'])

# SKU invalide (pattern)
try:
    Product(id=4, name="Tablet", price=500, product_sku="invalid")
except ValidationError as e:
    print("\n‚ùå SKU invalide:")
    print(e.errors()[0]['msg'])

# Discount > 1
try:
    Product(id=5, name="Monitor", price=300, discount=1.5, product_sku="MON-0005")
except ValidationError as e:
    print("\n‚ùå Discount > 1:")
    print(e.errors()[0]['msg'])

## 5. Validators : Validation Personnalis√©e

Cr√©er des r√®gles de validation complexes.

In [None]:
from pydantic import field_validator, model_validator
import re

class Order(BaseModel):
    order_id: int
    customer_email: str
    product: str
    quantity: int
    unit_price: float
    total: Optional[float] = None
    
    # Validator sur un champ (mode='after' = apr√®s conversion)
    @field_validator('customer_email')
    @classmethod
    def validate_email(cls, v: str) -> str:
        pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(pattern, v):
            raise ValueError('Email invalide')
        return v.lower()  # Normaliser en minuscules
    
    @field_validator('quantity')
    @classmethod
    def validate_quantity(cls, v: int) -> int:
        if v <= 0:
            raise ValueError('La quantit√© doit √™tre > 0')
        if v > 100:
            raise ValueError('Quantit√© maximale : 100')
        return v
    
    # Validator sur le mod√®le complet
    @model_validator(mode='after')
    def calculate_total(self):
        # Calculer le total si non fourni
        if self.total is None:
            self.total = self.quantity * self.unit_price
        # V√©rifier la coh√©rence
        expected = self.quantity * self.unit_price
        if abs(self.total - expected) > 0.01:
            raise ValueError(f'Total incoh√©rent : {self.total} != {expected}')
        return self

# Tests
print("=== Tests de validators ===")

# Valide
order1 = Order(
    order_id=1,
    customer_email="Alice@Example.COM",  # Sera normalis√©
    product="Laptop",
    quantity=2,
    unit_price=999.99
)
print("\n‚úÖ Order valide:")
print(f"Email normalis√©: {order1.customer_email}")
print(f"Total calcul√©: {order1.total}")

# Email invalide
try:
    Order(
        order_id=2,
        customer_email="invalid-email",
        product="Phone",
        quantity=1,
        unit_price=699.99
    )
except ValidationError as e:
    print("\n‚ùå Email invalide:")
    print(e.errors()[0]['msg'])

# Quantit√© invalide
try:
    Order(
        order_id=3,
        customer_email="bob@example.com",
        product="Tablet",
        quantity=150,
        unit_price=449.99
    )
except ValidationError as e:
    print("\n‚ùå Quantit√© invalide:")
    print(e.errors()[0]['msg'])

# Total incoh√©rent
try:
    Order(
        order_id=4,
        customer_email="charlie@example.com",
        product="Monitor",
        quantity=2,
        unit_price=349.99,
        total=500.00  # Incorrect
    )
except ValidationError as e:
    print("\n‚ùå Total incoh√©rent:")
    print(e.errors()[0]['msg'])

## 6. Serialization : model_dump() et model_dump_json()

Convertir les mod√®les en dictionnaires ou JSON.

In [None]:
order = Order(
    order_id=1,
    customer_email="alice@example.com",
    product="Laptop",
    quantity=2,
    unit_price=999.99
)

# model_dump() : vers dict
order_dict = order.model_dump()
print("model_dump():")
print(order_dict)
print(f"Type: {type(order_dict)}")

# model_dump_json() : vers JSON string
order_json = order.model_dump_json()
print("\nmodel_dump_json():")
print(order_json)
print(f"Type: {type(order_json)}")

# Options de serialization
order_dict_partial = order.model_dump(include={'order_id', 'customer_email', 'total'})
print("\nAvec include:")
print(order_dict_partial)

order_dict_exclude = order.model_dump(exclude={'customer_email'})
print("\nAvec exclude:")
print(order_dict_exclude)

# Indentation pour JSON lisible
order_json_pretty = order.model_dump_json(indent=2)
print("\nJSON indent√©:")
print(order_json_pretty)

## 7. Deserialization : model_validate()

Cr√©er des instances depuis dict/JSON.

In [None]:
import json

# Depuis un dict
data_dict = {
    "order_id": 2,
    "customer_email": "bob@example.com",
    "product": "Phone",
    "quantity": 1,
    "unit_price": 699.99
}

order_from_dict = Order.model_validate(data_dict)
print("Depuis dict:")
print(order_from_dict)

# Depuis JSON string
json_string = '{"order_id": 3, "customer_email": "charlie@example.com", "product": "Tablet", "quantity": 2, "unit_price": 449.99}'
order_from_json = Order.model_validate_json(json_string)
print("\nDepuis JSON:")
print(order_from_json)

# Gestion d'erreur
invalid_data = {
    "order_id": "invalid",
    "customer_email": "not-an-email",
    "product": "Monitor",
    "quantity": -5,
    "unit_price": 349.99
}

try:
    Order.model_validate(invalid_data)
except ValidationError as e:
    print("\n‚ùå Erreurs de validation:")
    for error in e.errors():
        print(f"  - {error['loc'][0]}: {error['msg']}")

## 8. Mod√®les Imbriqu√©s (Nested Models)

Cr√©er des structures de donn√©es complexes.

In [None]:
class Address(BaseModel):
    street: str
    city: str
    postal_code: str = Field(..., pattern=r'^\d{5}$')
    country: str = "France"

class Customer(BaseModel):
    customer_id: str
    name: str
    email: str
    address: Address  # Mod√®le imbriqu√©
    signup_date: datetime

class OrderItem(BaseModel):
    product: str
    quantity: int = Field(..., gt=0)
    unit_price: float = Field(..., gt=0)
    
    @property
    def total(self) -> float:
        return self.quantity * self.unit_price

class CompleteOrder(BaseModel):
    order_id: int
    customer: Customer  # Mod√®le imbriqu√©
    items: List[OrderItem]  # Liste de mod√®les
    order_date: datetime
    notes: Optional[str] = None
    
    @property
    def total(self) -> float:
        return sum(item.total for item in self.items)

# Exemple
order_data = {
    "order_id": 1001,
    "customer": {
        "customer_id": "C001",
        "name": "Alice Dupont",
        "email": "alice@example.com",
        "address": {
            "street": "123 Rue de la Paix",
            "city": "Paris",
            "postal_code": "75001"
        },
        "signup_date": "2023-01-15T10:30:00"
    },
    "items": [
        {"product": "Laptop", "quantity": 1, "unit_price": 999.99},
        {"product": "Mouse", "quantity": 2, "unit_price": 29.99}
    ],
    "order_date": "2024-01-20T14:00:00",
    "notes": "Livraison express"
}

complete_order = CompleteOrder.model_validate(order_data)
print("Commande compl√®te:")
print(complete_order)
print(f"\nTotal: {complete_order.total:.2f}‚Ç¨")
print(f"Ville: {complete_order.customer.address.city}")
print(f"Nombre d'articles: {len(complete_order.items)}")

# Serialization avec nested models
print("\nJSON:")
print(complete_order.model_dump_json(indent=2))

## 9. Pydantic Settings : Configuration depuis .env

G√©rer la configuration d'application.

In [None]:
from pydantic_settings import BaseSettings, SettingsConfigDict
from pathlib import Path

# Cr√©er un fichier .env de test
env_content = """
DATABASE_URL=postgresql://user:pass@localhost:5432/mydb
REDIS_HOST=localhost
REDIS_PORT=6379
API_KEY=super-secret-key-123
DEBUG=true
MAX_CONNECTIONS=100
"""

with open('/tmp/.env', 'w') as f:
    f.write(env_content)

# Mod√®le de configuration
class Settings(BaseSettings):
    # Database
    database_url: str
    
    # Redis
    redis_host: str = "localhost"
    redis_port: int = 6379
    
    # API
    api_key: str
    
    # Application
    debug: bool = False
    max_connections: int = 50
    app_name: str = "MyApp"  # Valeur par d√©faut si pas dans .env
    
    model_config = SettingsConfigDict(
        env_file='/tmp/.env',
        env_file_encoding='utf-8',
        case_sensitive=False
    )

# Charger la configuration
settings = Settings()

print("Configuration charg√©e:")
print(f"DATABASE_URL: {settings.database_url}")
print(f"REDIS_HOST: {settings.redis_host}")
print(f"REDIS_PORT: {settings.redis_port}")
print(f"API_KEY: {settings.api_key[:10]}...")
print(f"DEBUG: {settings.debug}")
print(f"MAX_CONNECTIONS: {settings.max_connections}")
print(f"APP_NAME: {settings.app_name}")

# Utilisation
print("\nUtilisation dans le code:")
if settings.debug:
    print("Mode DEBUG activ√©")

print(f"Connexion √† Redis sur {settings.redis_host}:{settings.redis_port}")
print(f"Max connections: {settings.max_connections}")

## 10. Usage dans les Pipelines de Donn√©es

Valider les donn√©es d'entr√©e/sortie dans un pipeline.

In [None]:
import pandas as pd
from typing import List
from datetime import date

# Mod√®le pour une ligne du dataset e-commerce
class SalesRecord(BaseModel):
    order_id: int = Field(..., gt=0)
    date: date
    product: str = Field(..., min_length=1)
    category: str
    quantity: int = Field(..., gt=0, le=1000)
    unit_price: float = Field(..., gt=0)
    customer_id: str = Field(..., pattern=r'^C\d{3}$')
    city: str
    total: Optional[float] = None
    
    @model_validator(mode='after')
    def calculate_total(self):
        if self.total is None:
            self.total = self.quantity * self.unit_price
        return self
    
    @field_validator('city')
    @classmethod
    def validate_city(cls, v: str) -> str:
        allowed_cities = ['Paris', 'Lyon', 'Marseille', 'Toulouse', 'Bordeaux']
        if v not in allowed_cities:
            raise ValueError(f'Ville non autoris√©e: {v}')
        return v

# Fonction de validation de pipeline
def validate_sales_data(df: pd.DataFrame) -> tuple[List[SalesRecord], List[dict]]:
    """
    Valide un DataFrame de ventes.
    Retourne : (records valides, erreurs)
    """
    valid_records = []
    errors = []
    
    for idx, row in df.iterrows():
        try:
            record = SalesRecord.model_validate(row.to_dict())
            valid_records.append(record)
        except ValidationError as e:
            errors.append({
                'row': idx,
                'data': row.to_dict(),
                'errors': e.errors()
            })
    
    return valid_records, errors

# Donn√©es de test
import numpy as np
np.random.seed(42)

data = {
    "order_id": [1, 2, -3, 4, 5],  # -3 invalide
    "date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04", "invalid"],  # "invalid" invalide
    "product": ["Laptop", "Phone", "Tablet", "", "Monitor"],  # "" invalide
    "category": ["Electronics"] * 5,
    "quantity": [1, 2, 1, 3, 1],
    "unit_price": [999.99, 699.99, 449.99, 79.99, 349.99],
    "customer_id": ["C001", "C002", "C003", "C004", "INVALID"],  # "INVALID" invalide
    "city": ["Paris", "Lyon", "Marseille", "Toulouse", "Bordeaux"],
}

df = pd.DataFrame(data)

# Validation
valid_records, errors = validate_sales_data(df)

print(f"Records valides: {len(valid_records)}")
print(f"Erreurs: {len(errors)}")

print("\n=== Records valides ===")
for record in valid_records:
    print(f"Order {record.order_id}: {record.product} - {record.total:.2f}‚Ç¨")

print("\n=== Erreurs ===")
for error in errors:
    print(f"\nLigne {error['row']}:")
    for err in error['errors']:
        print(f"  - {err['loc'][0]}: {err['msg']}")

## 11. Pi√®ges Courants

### Pi√®ge 1 : Validation Stricte vs Lax

In [None]:
from pydantic import ConfigDict

# Mode strict : pas de conversion automatique
class StrictModel(BaseModel):
    model_config = ConfigDict(strict=True)
    
    value: int

# Mode normal (lax) : conversion automatique
class LaxModel(BaseModel):
    value: int

# Lax : conversion "10" ‚Üí 10
lax = LaxModel(value="10")
print(f"‚úÖ Lax mode: {lax.value} (type: {type(lax.value)})")

# Strict : erreur
try:
    strict = StrictModel(value="10")
except ValidationError as e:
    print(f"\n‚ùå Strict mode: {e.errors()[0]['msg']}")

print("\nüí° Conseil:")
print("- Mode lax par d√©faut : pratique mais peut masquer des erreurs")
print("- Mode strict : plus s√ªr pour les APIs et pipelines critiques")

### Pi√®ge 2 : Performance avec Gros Volumes

In [None]:
import time

class SimpleRecord(BaseModel):
    id: int
    name: str
    value: float

# G√©n√©rer des donn√©es
n = 10000
data = [
    {"id": i, "name": f"Item {i}", "value": i * 1.5}
    for i in range(n)
]

# ‚ùå Validation un par un
start = time.time()
records = [SimpleRecord.model_validate(d) for d in data]
time_individual = time.time() - start

# ‚úÖ Pandas + validation par batch (plus efficace pour gros volumes)
start = time.time()
df = pd.DataFrame(data)
# Validation du sch√©ma global
assert df['id'].dtype == 'int64'
assert df['name'].dtype == 'object'
assert df['value'].dtype == 'float64'
time_batch = time.time() - start

print(f"Validation {n} records:")
print(f"  Individual: {time_individual:.4f}s")
print(f"  Batch (pandas): {time_batch:.4f}s")
print(f"  Speedup: {time_individual/time_batch:.1f}x")
print("\nüí° Pour de gros volumes, valider par batch ou √©chantillon")

### Pi√®ge 3 : Immutabilit√© par D√©faut

In [None]:
# Par d√©faut, les mod√®les Pydantic sont modifiables
class MutableModel(BaseModel):
    value: int

m = MutableModel(value=10)
m.value = 20  # OK
print(f"Mutable: {m.value}")

# Immutabilit√© avec frozen
class ImmutableModel(BaseModel):
    model_config = ConfigDict(frozen=True)
    
    value: int

im = ImmutableModel(value=10)
try:
    im.value = 20
except ValidationError as e:
    print(f"\n‚ùå Immutable: {e.errors()[0]['msg']}")

print("\nüí° Utiliser frozen=True pour des mod√®les immuables (thread-safe)")

## 12. Mini-Exercices

### Exercice 1 : Mod√®le Order pour le Dataset

Cr√©ez un mod√®le `OrderRecord` pour valider le dataset e-commerce avec :
1. Tous les champs n√©cessaires (order_id, date, product, etc.)
2. Contraintes : order_id > 0, quantity entre 1 et 100, unit_price > 0
3. Validator pour calculer automatiquement le total
4. Validator pour v√©rifier que le produit est dans une liste autoris√©e
5. Testez avec des donn√©es valides et invalides

In [None]:
# Votre code ici


### Exercice 2 : Settings pour Config DB

Cr√©ez un mod√®le `DatabaseSettings` avec :
1. host, port, username, password, database
2. Valeurs par d√©faut appropri√©es
3. M√©thode pour g√©n√©rer une URL de connexion
4. Cr√©ez un fichier .env et chargez la configuration

In [None]:
# Votre code ici


### Exercice 3 : Validation Pipeline

Cr√©ez une fonction `validate_pipeline()` qui :
1. Prend un DataFrame en entr√©e
2. Valide chaque ligne avec un mod√®le Pydantic
3. Retourne un rapport avec : nb valides, nb invalides, d√©tail des erreurs
4. Sauvegarde les records valides dans un fichier CSV
5. Sauvegarde les erreurs dans un fichier JSON

In [None]:
# Votre code ici


---

## Solutions des Exercices

### Solution Exercice 1

In [None]:
ALLOWED_PRODUCTS = ['Laptop', 'Phone', 'Tablet', 'Headphones', 'Monitor']

class OrderRecord(BaseModel):
    order_id: int = Field(..., gt=0)
    date: date
    product: str
    category: str
    quantity: int = Field(..., ge=1, le=100)
    unit_price: float = Field(..., gt=0)
    customer_id: str
    city: str
    total: Optional[float] = None
    
    @field_validator('product')
    @classmethod
    def validate_product(cls, v: str) -> str:
        if v not in ALLOWED_PRODUCTS:
            raise ValueError(f'Produit non autoris√©: {v}. Autoris√©s: {ALLOWED_PRODUCTS}')
        return v
    
    @model_validator(mode='after')
    def calculate_total(self):
        if self.total is None:
            self.total = round(self.quantity * self.unit_price, 2)
        return self

# Tests
print("=== Tests OrderRecord ===")

# Valide
order = OrderRecord(
    order_id=1,
    date="2024-01-01",
    product="Laptop",
    category="Electronics",
    quantity=2,
    unit_price=999.99,
    customer_id="C001",
    city="Paris"
)
print(f"\n‚úÖ Valide: Order {order.order_id}, Total: {order.total}‚Ç¨")

# Produit invalide
try:
    OrderRecord(
        order_id=2,
        date="2024-01-02",
        product="InvalidProduct",
        category="Electronics",
        quantity=1,
        unit_price=100,
        customer_id="C002",
        city="Lyon"
    )
except ValidationError as e:
    print(f"\n‚ùå Produit invalide: {e.errors()[0]['msg']}")

# Quantit√© invalide
try:
    OrderRecord(
        order_id=3,
        date="2024-01-03",
        product="Phone",
        category="Electronics",
        quantity=150,
        unit_price=699.99,
        customer_id="C003",
        city="Marseille"
    )
except ValidationError as e:
    print(f"\n‚ùå Quantit√© invalide: {e.errors()[0]['msg']}")

### Solution Exercice 2

In [None]:
class DatabaseSettings(BaseSettings):
    host: str = "localhost"
    port: int = 5432
    username: str
    password: str
    database: str
    
    model_config = SettingsConfigDict(
        env_file='/tmp/.env.db',
        env_prefix='DB_'
    )
    
    def get_connection_url(self) -> str:
        return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"

# Cr√©er le fichier .env
env_db_content = """
DB_HOST=db.example.com
DB_PORT=5432
DB_USERNAME=myuser
DB_PASSWORD=mypassword
DB_DATABASE=mydatabase
"""

with open('/tmp/.env.db', 'w') as f:
    f.write(env_db_content)

# Charger
db_settings = DatabaseSettings()

print("Configuration DB:")
print(f"Host: {db_settings.host}")
print(f"Port: {db_settings.port}")
print(f"Database: {db_settings.database}")
print(f"\nConnection URL: {db_settings.get_connection_url()}")

### Solution Exercice 3

In [None]:
import json
from pathlib import Path

def validate_pipeline(df: pd.DataFrame, model_class: type[BaseModel], 
                     output_dir: str = '/tmp') -> dict:
    """
    Valide un DataFrame avec un mod√®le Pydantic.
    Sauvegarde les valides en CSV et les erreurs en JSON.
    """
    output_dir = Path(output_dir)
    output_dir.mkdir(exist_ok=True)
    
    valid_records = []
    errors = []
    
    # Validation
    for idx, row in df.iterrows():
        try:
            record = model_class.model_validate(row.to_dict())
            valid_records.append(record.model_dump())
        except ValidationError as e:
            errors.append({
                'row_index': int(idx),
                'row_data': row.to_dict(),
                'errors': [
                    {'field': err['loc'][0], 'message': err['msg']}
                    for err in e.errors()
                ]
            })
    
    # Sauvegarder les valides
    if valid_records:
        df_valid = pd.DataFrame(valid_records)
        valid_path = output_dir / 'valid_records.csv'
        df_valid.to_csv(valid_path, index=False)
    
    # Sauvegarder les erreurs
    if errors:
        errors_path = output_dir / 'validation_errors.json'
        with open(errors_path, 'w') as f:
            json.dump(errors, f, indent=2, default=str)
    
    # Rapport
    report = {
        'total_rows': len(df),
        'valid_rows': len(valid_records),
        'invalid_rows': len(errors),
        'success_rate': len(valid_records) / len(df) * 100 if len(df) > 0 else 0,
        'errors_summary': {}
    }
    
    # R√©sum√© des erreurs par champ
    for error in errors:
        for err in error['errors']:
            field = err['field']
            report['errors_summary'][field] = report['errors_summary'].get(field, 0) + 1
    
    return report

# Test
test_data = pd.DataFrame({
    "order_id": [1, 2, -3, 4],
    "date": ["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"],
    "product": ["Laptop", "Phone", "InvalidProduct", "Tablet"],
    "category": ["Electronics"] * 4,
    "quantity": [1, 2, 1, 150],
    "unit_price": [999.99, 699.99, 449.99, 79.99],
    "customer_id": ["C001", "C002", "C003", "C004"],
    "city": ["Paris", "Lyon", "Marseille", "Toulouse"],
})

report = validate_pipeline(test_data, OrderRecord)

print("=== Rapport de Validation ===")
print(f"Total: {report['total_rows']}")
print(f"Valides: {report['valid_rows']}")
print(f"Invalides: {report['invalid_rows']}")
print(f"Taux de succ√®s: {report['success_rate']:.1f}%")
print("\nErreurs par champ:")
for field, count in report['errors_summary'].items():
    print(f"  {field}: {count}")

print("\nFichiers g√©n√©r√©s:")
print("  - /tmp/valid_records.csv")
print("  - /tmp/validation_errors.json")