# 🚀 Servicios de Datos con FastAPI

Objetivo: construir un microservicio de datos con FastAPI usando modelos Pydantic, endpoints seguros y pruebas básicas, incorporando caché y límites de tasa opcionales.

- Duración: 90–120 min
- Dificultad: Media
- Prerrequisitos: Python intermedio, HTTP básico, Pydantic

## 0. Dependencias

- Instalar para ejecutar: `fastapi`, `uvicorn`, `pydantic` (si aún no disponible).
- Opcional: `redis` para caché externa y `slowapi` para rate limiting.
- Este notebook muestra el código; para correr el servidor, usa Uvicorn fuera del notebook.

### 🚀 **FastAPI: Framework Moderno para Data APIs**

**¿Por qué FastAPI para Data Engineering?**

1. **Performance**: 
   - Basado en Starlette (async ASGI)
   - ~3x más rápido que Flask (sync WSGI)
   - Comparable a Node.js y Go

2. **Type Safety**:
   - Validación automática con Pydantic
   - IDE autocomplete (IntelliSense)
   - Reduce bugs en producción

3. **Auto-Documentation**:
   - OpenAPI/Swagger UI automático (`/docs`)
   - ReDoc alternativo (`/redoc`)
   - Client SDK generation

4. **Async Support**:
   - Native async/await
   - Útil para I/O-bound tasks (queries, APIs)

**Casos de Uso en Data Engineering:**

```
┌─────────────────────────────────────┐
│  Data Product APIs                  │
├─────────────────────────────────────┤
│ • Feature Store (ML features)       │
│ • Metrics API (KPIs en tiempo real) │
│ • Data Catalog (metadata search)    │
│ • Query Gateway (SQL→API)           │
│ • Data Quality Dashboard            │
└─────────────────────────────────────┘
```

**Comparación con Flask:**

| Aspecto | Flask | FastAPI |
|---------|-------|---------|
| **Performance** | Sync (WSGI) | Async (ASGI) |
| **Validación** | Manual | Automática (Pydantic) |
| **Docs** | Manual (Flasgger) | Automática (OpenAPI) |
| **Type Hints** | Opcional | Core feature |
| **Async** | Via extensions | Native |
| **Testing** | unittest/pytest | TestClient built-in |

**Stack Recomendado:**
```
FastAPI (Web Framework)
  ↓
Uvicorn (ASGI Server) → Gunicorn (Process Manager)
  ↓
Nginx (Reverse Proxy, Load Balancer)
  ↓
Kubernetes/Docker
```

**Arquitectura de Microservicio:**
```python
app/
├── main.py              # FastAPI app instance
├── models/
│   ├── schemas.py       # Pydantic models (request/response)
│   └── database.py      # SQLAlchemy models
├── routers/
│   ├── ventas.py        # /ventas endpoints
│   └── clientes.py      # /clientes endpoints
├── services/
│   └── ventas_service.py  # Business logic
├── dependencies.py      # Dependency injection
└── config.py            # Settings (env vars)
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

## 1. App básica de FastAPI

### 🏗️ **FastAPI Fundamentals: Request/Response Lifecycle**

**Pydantic Models (Data Validation):**

```python
from pydantic import BaseModel, Field, validator
from datetime import datetime
from typing import Optional

class VentaCreate(BaseModel):
    """Request model (input)"""
    cliente_id: int = Field(..., gt=0, description="ID del cliente")
    producto_id: int = Field(..., gt=0)
    cantidad: int = Field(default=1, ge=1, le=1000)
    total: float = Field(..., ge=0)
    
    @validator('total')
    def validate_total(cls, v, values):
        if 'cantidad' in values and v < values['cantidad'] * 0.01:
            raise ValueError('Total suspiciosamente bajo')
        return v
    
    class Config:
        schema_extra = {
            "example": {
                "cliente_id": 10,
                "producto_id": 101,
                "cantidad": 2,
                "total": 100.50
            }
        }

class VentaResponse(BaseModel):
    """Response model (output)"""
    venta_id: int
    cliente_id: int
    producto_id: int
    cantidad: int
    total: float
    created_at: datetime
    
    class Config:
        orm_mode = True  # Para convertir desde SQLAlchemy models
```

**HTTP Methods y Status Codes:**

```python
@app.get('/ventas', response_model=List[VentaResponse])
def listar_ventas(
    skip: int = 0, 
    limit: int = 100,
    cliente_id: Optional[int] = None
):
    """GET: Retrieve resources"""
    # Query params: /ventas?skip=0&limit=10&cliente_id=5
    return ventas

@app.get('/ventas/{venta_id}', response_model=VentaResponse)
def obtener_venta(venta_id: int):
    """GET by ID: Retrieve single resource"""
    if venta := db.get(venta_id):
        return venta
    raise HTTPException(status_code=404, detail="Venta no encontrada")

@app.post('/ventas', response_model=VentaResponse, status_code=201)
def crear_venta(venta: VentaCreate):
    """POST: Create new resource"""
    # Body: JSON automáticamente parseado y validado
    return created_venta

@app.put('/ventas/{venta_id}', response_model=VentaResponse)
def actualizar_venta(venta_id: int, venta: VentaCreate):
    """PUT: Full update (replace entire resource)"""
    return updated_venta

@app.patch('/ventas/{venta_id}', response_model=VentaResponse)
def actualizar_parcial(venta_id: int, updates: dict):
    """PATCH: Partial update (modify specific fields)"""
    return patched_venta

@app.delete('/ventas/{venta_id}', status_code=204)
def eliminar_venta(venta_id: int):
    """DELETE: Remove resource"""
    # 204 No Content (success sin body)
    return
```

**Status Codes Comunes:**
- `200 OK`: Success (GET, PUT, PATCH)
- `201 Created`: Resource created (POST)
- `204 No Content`: Success without body (DELETE)
- `400 Bad Request`: Validation error
- `401 Unauthorized`: Authentication required
- `403 Forbidden`: Authenticated but no permission
- `404 Not Found`: Resource doesn't exist
- `409 Conflict`: Duplicate resource
- `422 Unprocessable Entity`: Pydantic validation failed
- `500 Internal Server Error`: Unexpected error

**Error Handling:**
```python
from fastapi import HTTPException

@app.exception_handler(ValueError)
async def value_error_handler(request, exc):
    return JSONResponse(
        status_code=400,
        content={"detail": str(exc)}
    )

# Custom exception
class VentaNotFoundError(Exception):
    pass

@app.exception_handler(VentaNotFoundError)
async def venta_not_found_handler(request, exc):
    return JSONResponse(
        status_code=404,
        content={"detail": "Venta no encontrada", "code": "VENTA_NOT_FOUND"}
    )
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [None]:
from typing import List, Optional
from pydantic import BaseModel, Field
from datetime import datetime

# Código en un módulo src/app.py (ilustrativo dentro del notebook)
app_code = r'''
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel, Field
from typing import List
from functools import lru_cache

app = FastAPI(title='Servicio de Datos Demo')

class Venta(BaseModel):
    venta_id: int = Field(..., gt=0)
    cliente_id: int
    total: float = Field(..., ge=0)

_DB = {1: Venta(venta_id=1, cliente_id=10, total=100.0)}

@lru_cache(maxsize=1024)
def get_precio_producto(producto_id: int) -> float:
    return 42.0

@app.get('/ventas', response_model=List[Venta])
def listar_ventas():
    return list(_DB.values())

@app.get('/ventas/{venta_id}', response_model=Venta)
def obtener_venta(venta_id: int):
    v = _DB.get(venta_id)
    if not v:
        raise HTTPException(404, 'No existe')
    return v

@app.post('/ventas', response_model=Venta, status_code=201)
def crear_venta(v: Venta):
    if v.venta_id in _DB:
        raise HTTPException(409, 'Duplicado')
    _DB[v.venta_id] = v
    return v

'''
print(app_code.splitlines()[:30])

### 🔧 **Dependency Injection: Clean Architecture**

**¿Qué es Dependency Injection?**

Patrón para pasar dependencias (DB connections, config, services) a endpoints sin acoplamiento fuerte.

**Ejemplo: Database Session**

```python
from fastapi import Depends
from sqlalchemy.orm import Session
from database import SessionLocal

def get_db():
    """Dependency que provee DB session"""
    db = SessionLocal()
    try:
        yield db  # Endpoint usa este db
    finally:
        db.close()  # Cleanup automático

@app.get('/ventas')
def listar_ventas(db: Session = Depends(get_db)):
    """db inyectado automáticamente"""
    return db.query(Venta).all()
```

**Ventajas:**
- ✅ Testing: Mock dependencies fácilmente
- ✅ Reusabilidad: Misma dependency en múltiples endpoints
- ✅ Cleanup: `finally` garantiza cierre de recursos

**Authentication Dependency:**

```python
from fastapi import Header, HTTPException

def verify_token(x_token: str = Header()):
    """Verifica token en header"""
    if x_token != "secret-token":
        raise HTTPException(401, "Token inválido")
    return x_token

@app.get('/admin/ventas')
def admin_ventas(token: str = Depends(verify_token)):
    """Endpoint protegido"""
    return {"message": "Admin access"}

# Aplicar a múltiples endpoints
admin_router = APIRouter(dependencies=[Depends(verify_token)])

@admin_router.get('/ventas')
def get_ventas():
    # Token validado automáticamente
    pass
```

**Caching con lru_cache:**

```python
from functools import lru_cache
from pydantic import BaseSettings

class Settings(BaseSettings):
    database_url: str
    api_key: str
    
    class Config:
        env_file = ".env"

@lru_cache()
def get_settings():
    """Singleton: crea Settings una sola vez"""
    return Settings()

@app.get('/config')
def show_config(settings: Settings = Depends(get_settings)):
    return {"db": settings.database_url}
```

**Service Layer Pattern:**

```python
# services/ventas_service.py
class VentasService:
    def __init__(self, db: Session):
        self.db = db
    
    def crear_venta(self, venta_data: VentaCreate) -> Venta:
        # Business logic aquí
        venta = Venta(**venta_data.dict())
        self.db.add(venta)
        self.db.commit()
        return venta

# Dependency
def get_ventas_service(db: Session = Depends(get_db)):
    return VentasService(db)

# Endpoint
@app.post('/ventas')
def crear_venta(
    venta: VentaCreate,
    service: VentasService = Depends(get_ventas_service)
):
    return service.crear_venta(venta)
```

**Sub-dependencies (Composición):**

```python
def get_current_user(token: str = Depends(oauth2_scheme)):
    user = decode_token(token)
    if not user:
        raise HTTPException(401)
    return user

def get_current_active_user(
    current_user: User = Depends(get_current_user)
):
    if not current_user.is_active:
        raise HTTPException(403, "Usuario inactivo")
    return current_user

@app.get('/me')
def read_me(user: User = Depends(get_current_active_user)):
    # Solo usuarios activos llegan aquí
    return user
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

### 1.1 Ejecutar con Uvicorn (opcional)

### 🧪 **Testing FastAPI: TestClient y Fixtures**

**TestClient (Synchronous Testing):**

```python
from fastapi.testclient import TestClient
from app.main import app
import pytest

client = TestClient(app)

def test_listar_ventas():
    response = client.get("/ventas")
    assert response.status_code == 200
    assert isinstance(response.json(), list)

def test_crear_venta():
    nueva_venta = {
        "cliente_id": 10,
        "producto_id": 101,
        "cantidad": 2,
        "total": 100.0
    }
    response = client.post("/ventas", json=nueva_venta)
    assert response.status_code == 201
    data = response.json()
    assert data["cliente_id"] == 10
    assert "venta_id" in data

def test_venta_not_found():
    response = client.get("/ventas/99999")
    assert response.status_code == 404
    assert "detail" in response.json()
```

**Fixtures para Testing:**

```python
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from database import Base
from main import app, get_db

# Test database
SQLALCHEMY_DATABASE_URL = "sqlite:///./test.db"
engine = create_engine(SQLALCHEMY_DATABASE_URL, connect_args={"check_same_thread": False})
TestingSessionLocal = sessionmaker(bind=engine)

@pytest.fixture
def test_db():
    """Create test DB before test, drop after"""
    Base.metadata.create_all(bind=engine)
    yield
    Base.metadata.drop_all(bind=engine)

@pytest.fixture
def client(test_db):
    """Override get_db dependency"""
    def override_get_db():
        db = TestingSessionLocal()
        try:
            yield db
        finally:
            db.close()
    
    app.dependency_overrides[get_db] = override_get_db
    with TestClient(app) as c:
        yield c
    app.dependency_overrides.clear()

def test_with_fixture(client):
    response = client.get("/ventas")
    assert response.status_code == 200
```

**Parametrized Tests:**

```python
@pytest.mark.parametrize("cliente_id,producto_id,cantidad,expected_status", [
    (10, 101, 1, 201),      # Valid
    (-1, 101, 1, 422),      # Invalid cliente_id
    (10, 101, 0, 422),      # Invalid cantidad
    (10, 101, 1001, 422),   # Cantidad > max
])
def test_crear_venta_validation(client, cliente_id, producto_id, cantidad, expected_status):
    response = client.post("/ventas", json={
        "cliente_id": cliente_id,
        "producto_id": producto_id,
        "cantidad": cantidad,
        "total": 100.0
    })
    assert response.status_code == expected_status
```

**Async Testing:**

```python
import pytest
from httpx import AsyncClient
from app.main import app

@pytest.mark.asyncio
async def test_async_endpoint():
    async with AsyncClient(app=app, base_url="http://test") as ac:
        response = await ac.get("/ventas")
    assert response.status_code == 200
```

**Mocking External APIs:**

```python
from unittest.mock import patch

@patch('app.services.external_api.get_precio')
def test_con_mock(mock_get_precio, client):
    mock_get_precio.return_value = 50.0
    
    response = client.post("/ventas", json={...})
    
    assert response.status_code == 201
    mock_get_precio.assert_called_once_with(producto_id=101)
```

**Coverage:**

```bash
pytest --cov=app --cov-report=html
# Genera htmlcov/index.html
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [None]:
print('Para ejecutar en terminal:')
print('uvicorn src.app:app --reload --port 8000')

## 2. Pruebas con requests (smoke test)

### ⚡ **Performance: Caching y Async I/O**

**In-Memory Caching (lru_cache):**

```python
from functools import lru_cache
import time

@lru_cache(maxsize=128)
def expensive_computation(param: str):
    """Cachea resultados en memoria"""
    time.sleep(2)  # Simula operación costosa
    return f"Result for {param}"

@app.get('/compute/{param}')
def compute(param: str):
    # Primera llamada: 2s
    # Llamadas subsecuentes: <1ms
    return {"result": expensive_computation(param)}

# Limpiar cache manualmente
expensive_computation.cache_clear()
```

**Redis Caching (Distributed):**

```python
from redis import Redis
from fastapi import Depends
import json

redis_client = Redis(host='localhost', port=6379, decode_responses=True)

async def get_ventas_cached(cliente_id: int):
    # Try cache first
    cache_key = f"ventas:cliente:{cliente_id}"
    cached = redis_client.get(cache_key)
    
    if cached:
        return json.loads(cached)
    
    # Cache miss → query DB
    ventas = db.query(Venta).filter(Venta.cliente_id == cliente_id).all()
    
    # Store in cache (TTL: 5 min)
    redis_client.setex(cache_key, 300, json.dumps(ventas))
    
    return ventas

@app.get('/ventas/cliente/{cliente_id}')
async def get_ventas(cliente_id: int):
    return await get_ventas_cached(cliente_id)
```

**Async Database Queries:**

```python
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.future import select

# Async engine
engine = create_async_engine("postgresql+asyncpg://user:pass@localhost/db")

async def get_db_async():
    async with AsyncSession(engine) as session:
        yield session

@app.get('/ventas')
async def listar_ventas(db: AsyncSession = Depends(get_db_async)):
    result = await db.execute(select(Venta))
    return result.scalars().all()
```

**Async External API Calls:**

```python
import httpx

async def fetch_external_data(url: str):
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.json()

@app.get('/combined')
async def combined_data():
    # Parallel execution (no blocking)
    task1 = fetch_external_data("https://api1.com/data")
    task2 = fetch_external_data("https://api2.com/data")
    
    import asyncio
    results = await asyncio.gather(task1, task2)
    
    return {"api1": results[0], "api2": results[1]}
```

**Background Tasks:**

```python
from fastapi import BackgroundTasks

def send_email(email: str, message: str):
    """Task que ejecuta en background"""
    time.sleep(3)  # Simula envío
    print(f"Email sent to {email}")

@app.post('/ventas')
def crear_venta(
    venta: VentaCreate,
    background_tasks: BackgroundTasks
):
    # Crea venta inmediatamente
    created = create_venta_in_db(venta)
    
    # Envía email en background (no bloquea response)
    background_tasks.add_task(send_email, "admin@example.com", f"Nueva venta {created.id}")
    
    return created  # Response inmediato
```

**Response Caching Headers:**

```python
from fastapi import Response

@app.get('/static-data')
def get_static_data(response: Response):
    # Browser/CDN cachea por 1 hora
    response.headers["Cache-Control"] = "public, max-age=3600"
    return {"data": "rarely changes"}

@app.get('/dynamic-data')
def get_dynamic_data(response: Response):
    # No cachear
    response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
    return {"data": "changes frequently"}
```

**Compression (GZip):**

```python
from fastapi.middleware.gzip import GZipMiddleware

app.add_middleware(GZipMiddleware, minimum_size=1000)
# Comprime responses >1KB automáticamente
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [None]:
import textwrap
smoke_test = textwrap.dedent('''
# Ejecutar con el servidor levantado
import requests
base = 'http://localhost:8000'
print(requests.get(f'{base}/ventas').json())
''')
print(smoke_test)

## 3. Extensiones: caché Redis y rate limit [opcional]

### 🛡️ **Security: Rate Limiting, CORS y Authentication**

**Rate Limiting (slowapi):**

```python
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.get('/api/data')
@limiter.limit("5/minute")  # Max 5 requests por minuto por IP
async def limited_endpoint(request: Request):
    return {"message": "Rate limited"}

@app.get('/api/expensive')
@limiter.limit("10/hour")  # Más restrictivo para operaciones costosas
async def expensive_operation(request: Request):
    return compute_heavy_task()
```

**CORS (Cross-Origin Resource Sharing):**

```python
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://frontend.example.com"],  # Dominios permitidos
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["*"],
)

# Development (permisivo)
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # ⚠️ Solo para dev!
    allow_methods=["*"],
    allow_headers=["*"],
)
```

**JWT Authentication:**

```python
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from jose import JWTError, jwt
from datetime import datetime, timedelta

SECRET_KEY = "your-secret-key"
ALGORITHM = "HS256"

security = HTTPBearer()

def create_access_token(data: dict, expires_delta: timedelta = None):
    to_encode = data.copy()
    expire = datetime.utcnow() + (expires_delta or timedelta(minutes=15))
    to_encode.update({"exp": expire})
    return jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)

def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    token = credentials.credentials
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
        user_id: str = payload.get("sub")
        if user_id is None:
            raise HTTPException(401, "Invalid token")
        return user_id
    except JWTError:
        raise HTTPException(401, "Could not validate credentials")

@app.post('/login')
def login(username: str, password: str):
    # Validate credentials (omitido)
    token = create_access_token({"sub": username})
    return {"access_token": token, "token_type": "bearer"}

@app.get('/protected')
def protected_route(user_id: str = Depends(verify_token)):
    return {"message": f"Hello {user_id}"}
```

**API Key Authentication:**

```python
from fastapi import Security, HTTPException
from fastapi.security.api_key import APIKeyHeader

API_KEY = "secret-api-key"
api_key_header = APIKeyHeader(name="X-API-Key")

def verify_api_key(api_key: str = Security(api_key_header)):
    if api_key != API_KEY:
        raise HTTPException(403, "Invalid API Key")
    return api_key

@app.get('/api/data')
def get_data(api_key: str = Depends(verify_api_key)):
    return {"data": "secured"}
```

**OAuth2 (Password Flow):**

```python
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm

oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")

@app.post('/token')
def login(form_data: OAuth2PasswordRequestForm = Depends()):
    # Validate username/password
    if form_data.username != "admin" or form_data.password != "secret":
        raise HTTPException(401, "Incorrect credentials")
    
    token = create_access_token({"sub": form_data.username})
    return {"access_token": token, "token_type": "bearer"}

@app.get('/users/me')
def read_me(token: str = Depends(oauth2_scheme)):
    payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
    return {"username": payload["sub"]}
```

**Security Headers:**

```python
from fastapi.middleware.trustedhost import TrustedHostMiddleware
from starlette.middleware.sessions import SessionMiddleware

# Solo permite requests de hosts específicos
app.add_middleware(TrustedHostMiddleware, allowed_hosts=["example.com", "*.example.com"])

# Session support
app.add_middleware(SessionMiddleware, secret_key="secret-session-key")

# Custom security headers
@app.middleware("http")
async def add_security_headers(request: Request, call_next):
    response = await call_next(request)
    response.headers["X-Content-Type-Options"] = "nosniff"
    response.headers["X-Frame-Options"] = "DENY"
    response.headers["X-XSS-Protection"] = "1; mode=block"
    return response
```

**Input Sanitization:**

```python
from pydantic import validator, constr

class UserInput(BaseModel):
    username: constr(min_length=3, max_length=50, regex=r'^[a-zA-Z0-9_]+$')
    email: EmailStr
    
    @validator('username')
    def sanitize_username(cls, v):
        # Prevenir SQL injection, XSS
        return v.strip().lower()
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

- Sustituir @lru_cache por Redis para cache distribuida.
- Usar slowapi (Flask-Limiter para FastAPI) para límites por IP/endpoint.
- Añadir autenticación JWT cuando expongas datos sensibles.