# ‚ôªÔ∏è DataOps y CI/CD para Pipelines de Datos

Objetivo: implementar pr√°cticas de DataOps con control de calidad, pruebas automatizadas, hooks de Git y pipelines de CI/CD (GitHub Actions) para flujos de datos.

- Duraci√≥n: 90 min
- Dificultad: Media
- Prerrequisitos: Pytest b√°sico, Git y GitHub

### ‚ôªÔ∏è **DataOps: DevOps para Datos**

**Definici√≥n:**  
DataOps es la aplicaci√≥n de pr√°cticas DevOps (CI/CD, IaC, monitoreo) a pipelines de datos para aumentar velocidad, calidad y confiabilidad del desarrollo de analytics.

**Principios Core:**

1. **Automation First:**
   - Tests autom√°ticos de calidad de datos
   - Deployment autom√°tico de pipelines
   - Alertas autom√°ticas ante anomal√≠as

2. **Version Control Everything:**
   - C√≥digo (Python, SQL)
   - Configuraci√≥n (YAML, JSON)
   - Schemas (Avro, Protobuf)
   - Infraestructura (Terraform, CloudFormation)

3. **Observability:**
   - Logging estructurado (JSON logs)
   - M√©tricas: latencia, throughput, error rate
   - Lineage: Trazabilidad origen ‚Üí destino

4. **Testing Pyramid para Datos:**

```
       ‚ñ≤
      /E2E\           ‚Üê End-to-End (pocos, lentos)
     /‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ\
    /Integr.\        ‚Üê Integration tests (medianos)
   /‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ\
  /Unit Tests \      ‚Üê Unit tests (muchos, r√°pidos)
 /‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ\
  Data Quality       ‚Üê Schema validation, nulls, ranges
```

**Comparaci√≥n con Software Engineering:**

| Software DevOps | DataOps |
|-----------------|---------|
| Unit tests | Schema validation |
| Integration tests | Pipeline tests |
| Code review | Data quality review |
| Blue/Green deploy | Dual-write patterns |
| Monitoring (APM) | Data observability |

**Herramientas:**

- **Testing**: Great Expectations, Pandera, dbt tests
- **CI/CD**: GitHub Actions, GitLab CI, Jenkins
- **Orchestration**: Airflow, Prefect, Dagster
- **Observability**: DataDog, Monte Carlo, Bigeye
- **Version Control**: Git + DVC (Data Version Control)

**Impacto:**

- ‚¨áÔ∏è 80% reducci√≥n en incidentes de datos
- ‚¨ÜÔ∏è 10x velocidad de deployment
- ‚¨ÜÔ∏è 95%+ confianza en datos para decisiones

---
**Autor:** Luis J. Raigoso V. (LJRV)

## 1. Pruebas de datos con Great Expectations y Pandera

### üõ°Ô∏è **Data Quality Testing: Great Expectations vs Pandera**

**Great Expectations:**

```python
# Approach declarativo con expectations
df.expect_column_values_to_not_be_null('user_id')
df.expect_column_values_to_be_between('age', 0, 120)
df.expect_column_values_to_be_in_set('status', ['active', 'inactive'])
df.expect_column_values_to_match_regex('email', r'^[\w\.-]+@[\w\.-]+\.\w+$')
```

**Caracter√≠sticas:**
- 300+ expectations predefinidas
- Data Docs: HTML reports autom√°ticos
- Checkpoints: Validaci√≥n en pipeline stages
- Profiling: Auto-genera expectations desde data samples

**Pandera (Type Hints para DataFrames):**

```python
from pandera import DataFrameSchema, Column, Check

schema = DataFrameSchema({
    'user_id': Column(int, Check.gt(0), nullable=False),
    'age': Column(int, Check.in_range(0, 120)),
    'email': Column(str, Check.str_matches(r'^[\w\.-]+@')),
    'created_at': Column('datetime64[ns]')
})

@pa.check_types
def process_users(df: DataFrame[schema]) -> DataFrame:
    # Validaci√≥n autom√°tica en runtime
    return df
```

**Ventajas:**
- Integraci√≥n nativa con type hints (mypy compatible)
- Lightweight (sin dependencias pesadas)
- Hip√≥tesis testing estad√≠stico integrado

**Comparaci√≥n:**

| Aspecto | Great Expectations | Pandera |
|---------|-------------------|---------|
| **Curva aprendizaje** | Media-alta | Baja |
| **Reporting** | Excelente (Data Docs) | B√°sico |
| **Performance** | M√°s lento (overhead) | M√°s r√°pido |
| **Type Safety** | No | S√≠ (mypy) |
| **Use Case** | Enterprise, governance | Fast prototyping |

**Estrategia H√≠brida:**
- Pandera: Desarrollo local + CI tests
- Great Expectations: Producci√≥n + auditor√≠a

**Niveles de Validaci√≥n:**
1. **Schema**: Columnas, tipos, nullability
2. **Rango**: Min, max, percentiles
3. **Relaciones**: Foreign keys, duplicates
4. **Distribuci√≥n**: Mean, std, outliers
5. **Negocio**: Reglas custom (ej: revenue >= costs)

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [2]:
import pandas as pd

# Pandera validation (simple y moderno)
try:
    from pandera import DataFrameSchema, Column, Check
    
    df = pd.DataFrame({
        'venta_id': [1, 2, 3],
        'total': [100.0, 50.0, 25.5],
        'metodo_pago': ['tarjeta', 'cash', 'tarjeta']
    })
    
    # Definir schema con Pandera
    schema = DataFrameSchema({
        'venta_id': Column(int, Check.gt(0), nullable=False),
        'total': Column(float, Check.in_range(0, 10000)),
        'metodo_pago': Column(str, Check.isin(['tarjeta', 'cash', 'transferencia']))
    })
    
    # Validar
    validated_df = schema.validate(df)
    print('‚úÖ Pandera validation passed!')
    print(f'   - {len(validated_df)} registros validados')
    print(f'   - Columnas: {list(validated_df.columns)}')
    
except ImportError as e:
    print(f'‚ùå Error de importaci√≥n: {e}')
    print('   Instala con: pip install pandera')
except Exception as e:
    print(f'‚ùå Error de validaci√≥n: {e}')

# Great Expectations (API moderna - opcional)
print('\nüìù Nota: Great Expectations ahora usa un API m√°s complejo.')
print('   Para uso b√°sico, Pandera es suficiente y m√°s simple.')
print('   Ver documentaci√≥n: https://docs.greatexpectations.io/')

‚úÖ Pandera validation passed!
   - 3 registros validados
   - Columnas: ['venta_id', 'total', 'metodo_pago']

üìù Nota: Great Expectations ahora usa un API m√°s complejo.
   Para uso b√°sico, Pandera es suficiente y m√°s simple.
   Ver documentaci√≥n: https://docs.greatexpectations.io/


top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```



## 2. Pytest: estructura m√≠nima

### üß™ **Pytest: Testing Framework para Data Pipelines**

**Estructura de Proyecto:**
```
proyecto/
‚îú‚îÄ‚îÄ src/
‚îÇ   ‚îú‚îÄ‚îÄ extract.py
‚îÇ   ‚îú‚îÄ‚îÄ transform.py
‚îÇ   ‚îî‚îÄ‚îÄ load.py
‚îú‚îÄ‚îÄ tests/
‚îÇ   ‚îú‚îÄ‚îÄ conftest.py         ‚Üê Fixtures compartidos
‚îÇ   ‚îú‚îÄ‚îÄ test_extract.py
‚îÇ   ‚îú‚îÄ‚îÄ test_transform.py
‚îÇ   ‚îî‚îÄ‚îÄ test_load.py
‚îú‚îÄ‚îÄ pytest.ini              ‚Üê Configuraci√≥n
‚îî‚îÄ‚îÄ requirements-dev.txt
```

**Fixtures (Setup/Teardown):**
```python
# conftest.py
import pytest
import pandas as pd

@pytest.fixture
def sample_df():
    """Fixture reutilizable para tests"""
    return pd.DataFrame({
        'id': [1, 2, 3],
        'value': [10, 20, 30]
    })

@pytest.fixture
def db_connection():
    conn = create_connection()
    yield conn  # Test ejecuta aqu√≠
    conn.close()  # Cleanup autom√°tico
```

**Parametrizaci√≥n (Data-Driven Tests):**
```python
@pytest.mark.parametrize("input,expected", [
    (10, 10.0),
    (-5, 0.0),
    ('invalid', 0.0),
    (None, 0.0)
])
def test_clean_total(input, expected):
    assert clean_total(input) == expected
```

**Mocking (Aislamiento de dependencias):**
```python
from unittest.mock import Mock, patch

@patch('requests.get')
def test_api_extractor(mock_get):
    mock_get.return_value.json.return_value = {'data': [1, 2, 3]}
    result = extract_from_api('https://api.example.com')
    assert len(result) == 3
    mock_get.assert_called_once()
```

**Coverage (Cobertura de C√≥digo):**
```bash
pytest --cov=src --cov-report=html
# Genera htmlcov/index.html con l√≠neas cubiertas/no cubiertas
```

**Markers (Categorizaci√≥n de Tests):**
```python
@pytest.mark.slow
def test_large_dataset():
    # Test que tarda >10s
    pass

@pytest.mark.integration
def test_db_connection():
    # Test que requiere DB real
    pass

# Ejecutar solo tests r√°pidos
pytest -m "not slow"
```

**Best Practices:**
- Tests independientes (sin orden)
- Nombres descriptivos: `test_transform_handles_null_values`
- Arrange-Act-Assert pattern
- Un assert por test (o conceptos relacionados)
- Tests < 1s (unit), < 10s (integration)

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [3]:
# tests/test_transform.py
sample_code = r'''
# archivo: src/transform.py
def clean_total(x):
    try:
        v = float(x)
        return max(v, 0.0)
    except Exception:
        return 0.0

# archivo: tests/test_transform.py
from src.transform import clean_total
def test_clean_total():
    assert clean_total(10) == 10.0
    assert clean_total(-5) == 0.0
    assert clean_total('oops') == 0.0
'''
print(sample_code)


# archivo: src/transform.py
def clean_total(x):
    try:
        v = float(x)
        return max(v, 0.0)
    except Exception:
        return 0.0

# archivo: tests/test_transform.py
from src.transform import clean_total
def test_clean_total():
    assert clean_total(10) == 10.0
    assert clean_total(-5) == 0.0
    assert clean_total('oops') == 0.0



## 3. Pre-commit hooks (lint, format, tests r√°pidos)

### ü™ù **Pre-commit Hooks: Quality Gates Locales**

**¬øQu√© son los Pre-commit Hooks?**  
Scripts que se ejecutan autom√°ticamente **antes de cada commit** para validar c√≥digo, formato, seguridad, etc. Bloquean el commit si fallan.

**Instalaci√≥n:**
```bash
pip install pre-commit
# Crear .pre-commit-config.yaml
pre-commit install  # Activa hooks en .git/hooks/
```

**Hooks Esenciales para Data Engineering:**

1. **Black (Code Formatter):**
   - Auto-formatea c√≥digo Python (PEP 8)
   - "The uncompromising formatter"
   ```python
   # Antes
   x=[1,2,3];y={'a':1,'b':2}
   
   # Despu√©s
   x = [1, 2, 3]
   y = {"a": 1, "b": 2}
   ```

2. **isort (Import Organizer):**
   - Ordena imports alfab√©ticamente
   ```python
   # Antes
   import sys
   from myproject import utils
   import os
   
   # Despu√©s
   import os
   import sys
   from myproject import utils
   ```

3. **flake8 (Linter):**
   - Detecta errores sint√°cticos, variables no usadas, imports redundantes
   - Warnings: E501 (l√≠nea >79 chars), F841 (variable asignada pero no usada)

4. **mypy (Type Checker):**
   - Valida type hints
   ```python
   def add(x: int, y: int) -> int:
       return x + y
   
   add("1", "2")  # ‚ùå mypy detecta error
   ```

5. **detect-secrets:**
   - Busca API keys, passwords hardcodeados
   - Bloquea commits con `AWS_SECRET_ACCESS_KEY = "xxx"`

**Hooks Custom para Datos:**
```yaml
- repo: local
  hooks:
    - id: check-sql-syntax
      name: Validate SQL files
      entry: sqlfluff lint
      language: system
      files: \.sql$
    
    - id: validate-schemas
      name: Check Avro schemas
      entry: python scripts/validate_schemas.py
      language: python
      files: schemas/.*\.avsc$
```

**Workflow:**
```bash
git add transform.py
git commit -m "Fix bug"
  ‚Üì
[pre-commit] black........................Passed
[pre-commit] isort.......................Passed
[pre-commit] flake8......................Failed
  - src/transform.py:10:1: F401 'pandas' imported but unused
  ‚Üì
[Commit bloqueado - fix errores y reintenta]
```

**Skip Hooks (Emergencia):**
```bash
git commit --no-verify -m "Hotfix cr√≠tico"
```

**Beneficios:**
- ‚¨ÜÔ∏è Calidad de c√≥digo consistente en equipo
- ‚¨áÔ∏è Menos issues en code review
- ‚ö° Feedback instant√°neo vs esperar CI

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [4]:
pre_commit_cfg = r'''
repos:
  - repo: https://github.com/psf/black
    rev: 22.6.0
    hooks:
      - id: black
  - repo: https://github.com/PyCQA/isort
    rev: 5.10.1
    hooks:
      - id: isort
  - repo: https://github.com/pycqa/flake8
    rev: 5.0.4
    hooks:
      - id: flake8
  - repo: local
    hooks:
      - id: pytest-quick
        name: pytest quick
        entry: pytest -q
        language: system
        types: [python]
'''
print(pre_commit_cfg)


repos:
  - repo: https://github.com/psf/black
    rev: 22.6.0
    hooks:
      - id: black
  - repo: https://github.com/PyCQA/isort
    rev: 5.10.1
    hooks:
      - id: isort
  - repo: https://github.com/pycqa/flake8
    rev: 5.0.4
    hooks:
      - id: flake8
  - repo: local
    hooks:
      - id: pytest-quick
        name: pytest quick
        entry: pytest -q
        language: system
        types: [python]



## 4. GitHub Actions: CI para validar el repositorio

### üîÑ **GitHub Actions: CI/CD Autom√°tico**

**Concepto:**  
GitHub Actions ejecuta workflows autom√°ticamente en eventos (push, PR, schedule) en runners de GitHub (Ubuntu/Windows/macOS).

**Anatom√≠a de un Workflow:**

```yaml
name: ci                          # Nombre del workflow
on: [push, pull_request]          # Triggers

jobs:
  test:                           # Job ID
    runs-on: ubuntu-latest        # Runner environment
    steps:
      - uses: actions/checkout@v3 # Action oficial (git clone)
      
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
          cache: 'pip'            # Cache de dependencias
      
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
      
      - name: Run tests
        run: pytest --cov=src --cov-report=xml
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage.xml
```

**Triggers Avanzados:**

1. **Schedule (Cron):**
   ```yaml
   on:
     schedule:
       - cron: '0 2 * * *'  # Diario a las 2 AM UTC
   ```

2. **Manual Dispatch:**
   ```yaml
   on:
     workflow_dispatch:
       inputs:
         environment:
           description: 'Target environment'
           required: true
           default: 'staging'
   ```

3. **Path Filters:**
   ```yaml
   on:
     push:
       paths:
         - 'src/**'
         - 'tests/**'
   ```

**Parallel Jobs (Matrix Strategy):**
```yaml
jobs:
  test:
    strategy:
      matrix:
        python-version: ['3.8', '3.9', '3.10', '3.11']
        os: [ubuntu-latest, windows-latest]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }}
```

**Secrets Management:**
```yaml
- name: Deploy to AWS
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  run: aws s3 sync dist/ s3://my-bucket/
```

**Pipeline Completo para Data:**
```yaml
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - run: flake8 src/
  
  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - run: pytest -v
  
  data-quality:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - run: great_expectations checkpoint run validation_suite
  
  deploy:
    needs: [test, data-quality]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: docker build -t pipeline:${{ github.sha }} .
      - run: docker push pipeline:${{ github.sha }}
```

**Artifacts (Compartir entre jobs):**
```yaml
- name: Generate report
  run: python generate_report.py

- uses: actions/upload-artifact@v3
  with:
    name: data-report
    path: reports/*.html
```

**Cost Optimization:**
- Public repos: Gratis ilimitado
- Private repos: 2,000 min/mes gratis (luego $0.008/min)
- Self-hosted runners para reducir costos

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [5]:
gha_yaml = r'''
name: ci
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install deps
        run: |
          pip install -r curso_ingenieria_datos/requirements.txt
      - name: Lint & Test
        run: |
          pip install pytest flake8 black
          flake8 .
          pytest -q
'''
print(gha_yaml)


name: ci
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install deps
        run: |
          pip install -r curso_ingenieria_datos/requirements.txt
      - name: Lint & Test
        run: |
          pip install pytest flake8 black
          flake8 .
          pytest -q



## 5. Observabilidad: logs y m√©tricas m√≠nimas

### üìä **Observability: Logs, M√©tricas y Alertas**

**Structured Logging con Loguru:**

```python
from loguru import logger

# Configuraci√≥n
logger.add(
    "logs/pipeline_{time}.log",
    rotation="500 MB",
    retention="10 days",
    level="INFO",
    format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}"
)

# Contexto estructurado
logger.bind(pipeline="etl", dataset="ventas").info(
    "Processed batch",
    records=1000,
    latency_ms=150.5,
    status="success"
)
```

**Output JSON (Parse con herramientas):**
```json
{
  "timestamp": "2025-10-30T14:23:45.123Z",
  "level": "INFO",
  "pipeline": "etl",
  "dataset": "ventas",
  "message": "Processed batch",
  "records": 1000,
  "latency_ms": 150.5,
  "status": "success"
}
```

**M√©tricas Esenciales (RED Pattern):**

1. **Rate (Throughput):**
   ```python
   records_processed = 10000
   duration_seconds = 60
   throughput = records_processed / duration_seconds
   logger.info(f"throughput={throughput:.2f} records/sec")
   ```

2. **Errors (Error Rate):**
   ```python
   total_records = 10000
   failed_records = 50
   error_rate = (failed_records / total_records) * 100
   logger.warning(f"error_rate={error_rate:.2f}%")
   
   if error_rate > 5.0:
       raise AlertException("High error rate detected")
   ```

3. **Duration (Latency):**
   ```python
   import time
   start = time.time()
   process_batch()
   latency = time.time() - start
   logger.info(f"latency={latency:.3f}s")
   ```

**Prometheus Metrics (Production):**
```python
from prometheus_client import Counter, Histogram, Gauge

records_processed = Counter('records_processed_total', 'Total records')
pipeline_duration = Histogram('pipeline_duration_seconds', 'Duration')
active_pipelines = Gauge('active_pipelines', 'Currently running')

@pipeline_duration.time()
def run_pipeline():
    records_processed.inc(1000)
    # L√≥gica del pipeline
```

**Data Observability Platforms:**

1. **Monte Carlo / Bigeye:**
   - Anomaly detection autom√°tico (distribuci√≥n, volumen, freshness)
   - Lineage visual (upstream/downstream dependencies)
   - Alertas: "Tabla X no actualizada en 4 horas"

2. **DataDog / New Relic:**
   - APM para pipelines
   - Dashboards customizables
   - Distributed tracing

3. **dbt Cloud:**
   - Test results hist√≥ricos
   - Model timing trends
   - Exposures: Qu√© dashboards dependen de qu√© modelos

**Alerting Strategy:**
```python
class DataQualityAlert:
    def __init__(self):
        self.thresholds = {
            'null_rate': 0.05,      # Max 5% nulls
            'duplicate_rate': 0.01, # Max 1% duplicates
            'freshness_hours': 4    # Data < 4h old
        }
    
    def check_and_alert(self, metrics):
        if metrics['null_rate'] > self.thresholds['null_rate']:
            send_slack_alert(f"‚ö†Ô∏è High null rate: {metrics['null_rate']:.2%}")
            send_pagerduty(severity='high')
```

**Golden Signals para Data:**
- **Freshness**: ¬øCu√°ndo lleg√≥ el √∫ltimo dato?
- **Volume**: ¬øN√∫mero de registros esperado?
- **Schema**: ¬øColumnas/tipos cambiaron?
- **Distribution**: ¬øValores fuera de rango?

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [1]:
from loguru import logger
import time, random

def process_batch(n=5):
    for i in range(n):
        start = time.time()
        try:
            if random.random() < 0.1:
                raise ValueError('Fallo aleatorio')
            time.sleep(0.05)
            latency = time.time() - start
            logger.info(f'item={i} status=ok latency={latency:.3f}s')
        except Exception as e:
            logger.error(f'item={i} status=error err={e}')
process_batch(10)

[32m2025-12-07 18:10:45.064[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=0 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.115[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=1 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.115[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=1 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.166[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=2 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.166[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=2 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.220[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m - [1mitem=3 status=ok latency=0.051s[0m
[32m2025-12-07 18:10:45.220[0m | [1mINFO    [0m | [36m__main__[0m:[36mprocess_batch[0m:[36m12[0m

## 6. Ejercicios

1. Agrega una regla de pre-commit que bloquee archivos > 2 MB.
2. Crea un workflow adicional que ejecute validaciones de datos con Great Expectations.
3. A√±ade cobertura de pruebas y publica un badge en el README.

---

## üß≠ Navegaci√≥n

**‚Üê Anterior:** [üóÑÔ∏è Bases de Datos Relacionales y NoSQL: PostgreSQL y MongoDB](04_bases_datos_postgresql_mongodb.ipynb)

**Siguiente ‚Üí:** [üåê Conectores Avanzados: REST, GraphQL y SFTP ‚Üí](06_conectores_avanzados_rest_graphql_sftp.ipynb)

**üìö √çndice de Nivel Mid:**
- [‚ö° Mid - 01. Orquestaci√≥n de Pipelines con Apache Airflow](01_apache_airflow_fundamentos.ipynb)
- [Streaming con Apache Kafka: Fundamentos](02_streaming_kafka.ipynb)
- [‚òÅÔ∏è AWS para Ingenier√≠a de Datos: S3, Glue, Athena y Lambda](03_cloud_aws.ipynb)
- [‚òÅÔ∏è GCP para Ingenier√≠a de Datos: BigQuery, Cloud Storage, Dataflow y Composer](03b_cloud_gcp.ipynb)
- [‚òÅÔ∏è Azure para Ingenier√≠a de Datos: ADLS, Synapse, Data Factory y Databricks](03c_cloud_azure.ipynb)
- [üóÑÔ∏è Bases de Datos Relacionales y NoSQL: PostgreSQL y MongoDB](04_bases_datos_postgresql_mongodb.ipynb)
- [‚ôªÔ∏è DataOps y CI/CD para Pipelines de Datos](05_dataops_cicd.ipynb) ‚Üê üîµ Est√°s aqu√≠
- [üåê Conectores Avanzados: REST, GraphQL y SFTP](06_conectores_avanzados_rest_graphql_sftp.ipynb)
- [üß© Optimizaci√≥n SQL y Particionado de Datos](07_optimizacion_sql_particionado.ipynb)
- [üöÄ Servicios de Datos con FastAPI](08_fastapi_servicios_datos.ipynb)
- [üß™ Proyecto Integrador Mid 1: API ‚Üí DB ‚Üí Parquet con Orquestaci√≥n](09_proyecto_integrador_1.ipynb)
- [üîÑ Proyecto Integrador Mid 2: Kafka ‚Üí Streaming ‚Üí Data Lake y Monitoreo](10_proyecto_integrador_2.ipynb)

**üéì Otros Niveles:**
- [Nivel Junior](../nivel_junior/README.md)
- [Nivel Mid](../nivel_mid/README.md)
- [Nivel Senior](../nivel_senior/README.md)
- [Nivel GenAI](../nivel_genai/README.md)
- [Negocio LATAM](../negocios_latam/README.md)
