# 🌐 Proyecto Integrador Senior 2: Data Mesh Multi-Dominio con Feature Store

Objetivo: diseñar una arquitectura Data Mesh con múltiples dominios autónomos, feature store centralizado para ML, y gobernanza federada.

- Duración: 180+ min (proyecto multi-día)
- Dificultad: Muy Alta
- Prerrequisitos: Senior completo (01–08), experiencia organizacional

### 🌐 **Data Mesh: Paradigm Shift from Centralized to Federated**

**1. The Problem with Centralized Data Platforms**

```
Traditional Monolithic Data Platform:
┌────────────────────────────────────────────┐
│     Central Data Platform Team             │
│  (Bottleneck: 5 engineers, 50 consumers)   │
├────────────────────────────────────────────┤
│ • All ETL pipelines                        │
│ • All data quality checks                  │
│ • All API development                      │
│ • All documentation                        │
│ • All incident response                    │
└────────────────────────────────────────────┘
           ▼  ▼  ▼  ▼  ▼
    Requests: 200+/month
    Lead Time: 6-12 weeks/feature
    Quality: One team can't know all domains
```

**Problems:**
- 🚫 **Bottleneck**: Central team overwhelmed
- 🚫 **Lack of Domain Expertise**: Platform team doesn't understand business nuances
- 🚫 **Slow Innovation**: 3-month wait for new features
- 🚫 **Poor Quality**: Generic validations miss domain-specific issues
- 🚫 **No Ownership**: "Not my problem" mentality

---

**2. Data Mesh: Four Principles (Zhamak Dehghani)**

```
┌─────────────────────────────────────────────────────────────────┐
│                    Data Mesh Principles                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. DOMAIN OWNERSHIP                                            │
│     "You build it, you own it"                                  │
│     - Each business domain owns its data products              │
│     - Domain teams are accountable for quality & SLOs          │
│                                                                  │
│  2. DATA AS A PRODUCT                                           │
│     "Treat data like software products"                         │
│     - Discoverable (catalog)                                    │
│     - Addressable (versioned APIs)                              │
│     - Trustworthy (quality SLOs)                                │
│     - Self-describing (documentation)                           │
│     - Secure (access controls)                                  │
│                                                                  │
│  3. SELF-SERVE DATA PLATFORM                                    │
│     "Democratize infrastructure, not data"                      │
│     - Platform provides: storage, compute, observability       │
│     - Domains consume: CI/CD, monitoring, governance tools     │
│                                                                  │
│  4. FEDERATED COMPUTATIONAL GOVERNANCE                          │
│     "Global policies, local execution"                          │
│     - Central: Security standards, PII policies, cost budgets  │
│     - Local: Implementation details, tooling choices           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

---

**3. Domain-Oriented Architecture**

```python
# domains.yml - Domain Registry
domains:
  
  ventas:
    owner: sales-team@company.com
    mission: "Provide revenue and transaction data products"
    data_products:
      - daily_revenue_api
      - customer_transaction_history
      - product_performance
    slos:
      latency_p99: "15 minutes"
      availability: "99.9%"
      data_quality: "99.5% completeness"
    dependencies:
      - producto.catalog_api  # Cross-domain dependency
    budget_monthly: "$2,000"
  
  logistica:
    owner: fulfillment-team@company.com
    mission: "Enable logistics optimization and tracking"
    data_products:
      - shipment_tracking_api
      - warehouse_inventory
      - delivery_performance
    slos:
      latency_p99: "5 minutes"
      availability: "99.95%"
    dependencies:
      - ventas.daily_revenue_api
    budget_monthly: "$1,500"
  
  producto:
    owner: catalog-team@company.com
    mission: "Provide product catalog and pricing data"
    data_products:
      - catalog_api
      - pricing_history
      - category_taxonomy
    slos:
      latency_p99: "30 minutes"
      availability: "99.9%"
    dependencies: []
    budget_monthly: "$1,000"
  
  marketing:
    owner: campaigns-team@company.com
    mission: "Support campaign performance and customer segmentation"
    data_products:
      - campaign_metrics
      - customer_segments
      - attribution_model
    slos:
      latency_p99: "60 minutes"
      availability: "99.5%"
    dependencies:
      - ventas.customer_transaction_history
      - producto.catalog_api
    budget_monthly: "$2,500"
  
  finanzas:
    owner: payments-team@company.com
    mission: "Financial reporting and fraud detection"
    data_products:
      - payment_transactions
      - fraud_scores
      - accounting_reports
    slos:
      latency_p99: "10 minutes"
      availability: "99.99%"
    dependencies:
      - ventas.daily_revenue_api
    budget_monthly: "$3,000"
    compliance:
      - PCI-DSS
      - SOX
```

**Domain Team Structure:**

```
Ventas Domain Team (6 people):
├── Data Product Owner (1)
│   - Define requirements
│   - Prioritize features
│   - Communicate with consumers
│
├── Data Engineers (3)
│   - Build pipelines
│   - Maintain data quality
│   - Optimize performance
│
├── Analytics Engineer (1)
│   - Create gold layer aggregations
│   - Build BI dashboards
│
└── SRE/DevOps (1)
    - CI/CD pipelines
    - Incident response
    - Cost optimization
```

---

**4. Data Product Canvas**

```markdown
# Data Product: Daily Revenue API

## Business Context
**Purpose**: Provide real-time revenue metrics for executive dashboards and financial reporting

**Consumers**: 
- Finance team (accounting reconciliation)
- Executive dashboards (Tableau)
- ML models (revenue forecasting)

## Technical Specifications

### Input Sources
- Kafka topic: `ecommerce.ventas.transactions`
- S3 batch: `s3://mesh/ventas/raw/refunds/`

### Output Schema (v2.1)
```json
{
  "date": "2024-01-15",
  "region": "LATAM",
  "revenue_gross": 125000.50,
  "revenue_net": 118000.30,
  "transactions_count": 1543,
  "refunds_count": 23,
  "currency": "USD",
  "calculated_at": "2024-01-15T10:30:00Z"
}
```

### SLOs
- **Latency**: p99 < 15 minutes (from transaction to API availability)
- **Availability**: 99.9% uptime (43 minutes downtime/month allowed)
- **Accuracy**: ±0.5% vs accounting system
- **Freshness**: Data no older than 20 minutes

### API Endpoints
- `GET /ventas/v2/daily-revenue?date={YYYY-MM-DD}&region={region}`
- `GET /ventas/v2/revenue-trend?start_date={}&end_date={}`

### Access Control
- Public: No (internal only)
- Authentication: API key (service accounts)
- Authorization: Read-only for finance, marketing, executive teams

### Dependencies
- **Upstream**: producto.catalog_api (for product prices)
- **Downstream**: ML feature store, executive dashboards

### Costs
- Storage: $50/month (S3 Standard)
- Compute: $200/month (EMR Serverless)
- API hosting: $100/month (ECS Fargate)
- **Total**: $350/month

### Metrics
- Request rate: 500 req/day
- p99 latency: 12 minutes (within SLO)
- Availability (30d): 99.92%
- Quality score: 99.7% completeness

### Changelog
- v2.1 (2024-01): Added `refunds_count` field
- v2.0 (2023-12): Breaking change - renamed `total` to `revenue_gross`
- v1.5 (2023-10): Added `currency` field
- v1.0 (2023-08): Initial release
```

---

**5. Self-Service Platform Components**

```python
# platform/infrastructure.py
"""
Self-service platform capabilities provided to all domains
"""

class DataPlatform:
    """
    Shared infrastructure that domains consume
    """
    
    def __init__(self):
        self.components = {
            # Storage
            "s3_buckets": self._provision_s3(),
            "glue_catalog": self._setup_glue_catalog(),
            
            # Compute
            "emr_serverless": self._create_emr_apps(),
            "airflow": self._deploy_mwaa(),
            
            # Observability
            "datahub": self._setup_datahub(),
            "grafana": self._deploy_grafana(),
            "prometheus": self._deploy_prometheus(),
            
            # Governance
            "iam_roles": self._create_domain_roles(),
            "kms_keys": self._create_encryption_keys(),
            
            # CI/CD
            "github_actions": self._setup_workflows(),
            "terraform_modules": self._publish_modules()
        }
    
    def provision_domain(self, domain_name: str):
        """
        Self-service: Domain team provisions own infrastructure
        """
        return {
            "s3_bucket": f"s3://mesh/{domain_name}/",
            "iam_role": f"arn:aws:iam::123:role/{domain_name}-pipeline",
            "airflow_connection": f"{domain_name}_aws",
            "datahub_domain": f"urn:li:domain:{domain_name}",
            "grafana_folder": f"Domains/{domain_name}",
            "cost_center_tag": domain_name
        }
    
    def _provision_s3(self):
        """
        Create S3 buckets with standard structure
        """
        bucket_policy = {
            "lifecycle_rules": [
                {"raw": "7 days → Glacier"},
                {"curated": "90 days → IA"},
                {"gold": "retain indefinitely"}
            ],
            "encryption": "AWS KMS",
            "versioning": True,
            "tags": {"ManagedBy": "platform-team"}
        }
        return bucket_policy
    
    def _setup_glue_catalog(self):
        """
        Central catalog with domain isolation
        """
        return {
            "database_naming": "{domain}_db",
            "table_naming": "{domain}_{entity}",
            "cross_domain_access": "IAM policies"
        }
```

**Platform Team Responsibilities:**

| Area | Platform Team | Domain Team |
|------|---------------|-------------|
| **Infrastructure** | Provision & maintain | Consume via self-service |
| **Standards** | Define (e.g., PII policy) | Implement |
| **Tooling** | Provide (Airflow, DataHub) | Use & extend |
| **Costs** | Set budgets | Optimize within budget |
| **Security** | Global policies (IAM) | Local access controls |
| **Monitoring** | Shared dashboards | Domain-specific alerts |
| **Incidents** | Platform outages | Data quality issues |

---

**6. Benefits vs Challenges**

**Benefits:**

| Benefit | Impact |
|---------|--------|
| **Faster Innovation** | 12 weeks → 2 weeks (feature delivery) |
| **Better Quality** | Domain experts validate data (80% → 95% accuracy) |
| **Scalability** | Add domains without bottleneck |
| **Accountability** | Clear ownership (no more "not my problem") |
| **Cost Transparency** | Per-domain budgets (showback/chargeback) |

**Challenges:**

| Challenge | Mitigation |
|-----------|-----------|
| **Duplication** | Shared platform components, reusable modules |
| **Coordination** | Data contracts, OpenAPI specs |
| **Discoverability** | DataHub catalog with rich metadata |
| **Governance** | Automated policy enforcement (OPA, Cedar) |
| **Skills Gap** | Training, guilds, internal wiki |

---

**7. Migration Strategy: Monolith → Mesh**

```
Phase 1: Foundation (3 months)
├── Week 1-4: Define domains and owners
├── Week 5-8: Build self-service platform (Terraform modules)
├── Week 9-12: Train domain teams, pilot with 1 domain
└── Success Criteria: 1 domain fully migrated

Phase 2: Scale (6 months)
├── Month 4-5: Migrate 3 high-value domains
├── Month 6-8: Implement federated governance
├── Month 9: Sunset legacy central pipelines
└── Success Criteria: 5 domains producing data products

Phase 3: Maturity (ongoing)
├── Add new domains (self-service onboarding)
├── Advanced features (feature store, real-time)
├── Cross-domain analytics (federated queries)
└── Success Criteria: <1 week onboarding for new domains
```

**Pilot Domain Selection Criteria:**
- ✅ High business value
- ✅ Motivated team (early adopters)
- ✅ Clear boundaries (not too many dependencies)
- ✅ Medium complexity (not trivial, not impossible)

**Example Pilot: Ventas Domain**
- **Why**: Critical for business (revenue data)
- **Team**: Experienced, eager to improve
- **Scope**: Clear (transactions, revenue, customers)
- **Timeline**: 3 months → production data product

---

**8. Real-World Examples**

**Netflix:**
- 50+ data domains (Content, Playback, Recommendations, etc.)
- Each domain owns pipelines, quality, APIs
- Central platform: Metacat (catalog), Iceberg (storage), DBT (transformations)
- Result: 5,000+ datasets, self-serve access for 2,000+ engineers

**Uber:**
- Data domains by business unit (Rides, Eats, Freight)
- Feature Store (Michelangelo) with domain-owned features
- Central governance: Data Quality Portal, Lineage (DataBook)
- Result: 10+ PB data lake, <1 week for new data products

**Zalando:**
- 200+ data products across 40 domains
- Self-service: Nakadi (event streaming), Data Lake (S3), dbt (SQL)
- Governance: Data Mesh Portal (discovery), Compliance Scanner
- Result: 90% of data teams self-sufficient

---

**Autor:** Luis J. Raigoso V. (LJRV)

### 🎯 **Feature Store: ML-Ready Data Infrastructure**

**1. The Feature Engineering Problem**

```
Traditional ML Pipeline (Problems):
┌────────────────────────────────────────────────┐
│  Data Scientist A                              │
│  ├── Extracts: customer_total_purchases        │
│  ├── Logic: SQL query (30 days rolling)       │
│  ├── Storage: Local CSV                        │
│  └── Model: Churn prediction (prod)            │
└────────────────────────────────────────────────┘
          ↓ (6 months later)
┌────────────────────────────────────────────────┐
│  Data Scientist B                              │
│  ├── Extracts: customer_total_purchases (again)│
│  ├── Logic: Slightly different SQL            │
│  ├── Storage: Different S3 path               │
│  └── Model: Upsell prediction                  │
└────────────────────────────────────────────────┘

Issues:
❌ Duplication: Same feature, different implementations
❌ Inconsistency: Different logic → different values
❌ Training/Serving Skew: Batch SQL vs real-time Python
❌ No Versioning: Can't reproduce model from 6 months ago
❌ No Discovery: Team B doesn't know Team A computed this
```

---

**2. Feature Store Architecture**

```
┌─────────────────────────────────────────────────────────────────┐
│                       Feature Store                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────┐         ┌──────────────────┐            │
│  │  Offline Store   │         │  Online Store    │            │
│  │  (S3/BigQuery)   │         │  (Redis/DynamoDB)│            │
│  ├──────────────────┤         ├──────────────────┤            │
│  │ • Training data  │         │ • Serving (ms)   │            │
│  │ • Batch (hours)  │         │ • Low latency    │            │
│  │ • Historical     │         │ • Recent data    │            │
│  │ • Petabytes      │         │ • Gigabytes      │            │
│  └──────────────────┘         └──────────────────┘            │
│           ▲                            ▲                        │
│           │                            │                        │
│  ┌────────┴────────────────────────────┴─────────┐            │
│  │         Feature Registry (Metadata)            │            │
│  │  - Feature definitions                         │            │
│  │  - Schema & types                              │            │
│  │  - Owners & documentation                      │            │
│  │  - Lineage & dependencies                      │            │
│  └────────────────────────────────────────────────┘            │
└─────────────────────────────────────────────────────────────────┘
         ▲                                     ▲
         │                                     │
    ┌────┴──────┐                      ┌──────┴─────┐
    │  Training │                      │  Serving   │
    │  (batch)  │                      │ (real-time)│
    └───────────┘                      └────────────┘
```

---

**3. Feast Implementation (Open Source)**

**Installation & Setup:**

```python
# Install Feast
!pip install feast[aws,redis]

# Initialize repository
!feast init feature_repo
cd feature_repo/
```

**Feature Definitions:**

```python
# feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource, RedisSource
from feast.types import Float64, Int64, String
from datetime import timedelta

# ═══════════════════════════════════════════════════════
# ENTITIES (Join Keys)
# ═══════════════════════════════════════════════════════

customer = Entity(
    name="customer_id",
    description="Unique customer identifier",
    value_type=Int64
)

product = Entity(
    name="product_id",
    description="Unique product identifier",
    value_type=Int64
)

# ═══════════════════════════════════════════════════════
# VENTAS DOMAIN FEATURES
# ═══════════════════════════════════════════════════════

ventas_offline_source = FileSource(
    name="ventas_features_source",
    path="s3://mesh/ventas/curated/features/",
    timestamp_field="event_timestamp",
    created_timestamp_column="created_timestamp"
)

ventas_online_source = RedisSource(
    name="ventas_online",
    table="ventas_features",
    timestamp_field="event_timestamp"
)

ventas_customer_features = FeatureView(
    name="ventas_customer_features",
    description="Customer purchase behavior (Ventas domain)",
    entities=[customer],
    ttl=timedelta(days=90),  # Feature validity
    schema=[
        Field(
            name="total_purchases_7d",
            dtype=Float64,
            description="Total $ spent in last 7 days"
        ),
        Field(
            name="total_purchases_30d",
            dtype=Float64,
            description="Total $ spent in last 30 days"
        ),
        Field(
            name="transaction_count_7d",
            dtype=Int64,
            description="Number of transactions in last 7 days"
        ),
        Field(
            name="avg_basket_size_30d",
            dtype=Float64,
            description="Average basket size in last 30 days"
        ),
        Field(
            name="days_since_last_purchase",
            dtype=Int64,
            description="Days since most recent purchase"
        ),
        Field(
            name="preferred_payment_method",
            dtype=String,
            description="Most used payment method (last 90 days)"
        ),
        Field(
            name="is_premium_customer",
            dtype=Int64,
            description="1 if customer spent >$5000 in last year"
        )
    ],
    source=ventas_offline_source,
    online=True,  # Enable online serving
    owner="ventas-team@company.com",
    tags={"domain": "ventas", "pii": "no"}
)

# ═══════════════════════════════════════════════════════
# PRODUCTO DOMAIN FEATURES
# ═══════════════════════════════════════════════════════

producto_offline_source = FileSource(
    name="producto_features_source",
    path="s3://mesh/producto/curated/features/",
    timestamp_field="event_timestamp"
)

producto_features = FeatureView(
    name="producto_features",
    description="Product catalog and pricing (Producto domain)",
    entities=[product],
    ttl=timedelta(days=30),
    schema=[
        Field(name="current_price", dtype=Float64),
        Field(name="category", dtype=String),
        Field(name="stock_level", dtype=Int64),
        Field(name="days_since_launch", dtype=Int64),
        Field(name="avg_rating", dtype=Float64),
        Field(name="total_reviews", dtype=Int64)
    ],
    source=producto_offline_source,
    online=True,
    owner="catalog-team@company.com",
    tags={"domain": "producto"}
)

# ═══════════════════════════════════════════════════════
# LOGISTICA DOMAIN FEATURES
# ═══════════════════════════════════════════════════════

logistica_offline_source = FileSource(
    name="logistica_features_source",
    path="s3://mesh/logistica/curated/features/",
    timestamp_field="event_timestamp"
)

logistica_features = FeatureView(
    name="logistica_features",
    description="Delivery performance (Logistica domain)",
    entities=[customer],
    ttl=timedelta(days=60),
    schema=[
        Field(name="avg_delivery_time_days", dtype=Float64),
        Field(name="on_time_delivery_rate", dtype=Float64),
        Field(name="total_shipments_30d", dtype=Int64),
        Field(name="return_rate_30d", dtype=Float64)
    ],
    source=logistica_offline_source,
    online=True,
    owner="fulfillment-team@company.com",
    tags={"domain": "logistica"}
)
```

**Feature Registry Configuration:**

```yaml
# feature_store.yaml
project: ecommerce_mesh
registry: s3://mesh/feast/registry.db
provider: aws
online_store:
  type: redis
  connection_string: "redis.ecommerce.internal:6379"
  
offline_store:
  type: file  # or 'snowflake', 'bigquery', 'redshift'
  
entity_key_serialization_version: 2
```

**Apply Changes:**

```bash
# Deploy features to registry
feast apply

# Output:
# ✅ Created entity customer_id
# ✅ Created entity product_id
# ✅ Created feature view ventas_customer_features
# ✅ Created feature view producto_features
# ✅ Created feature view logistica_features
```

---

**4. Feature Materialization (Offline → Online)**

```python
# materialize_features.py
from feast import FeatureStore
from datetime import datetime, timedelta

store = FeatureStore(repo_path="feature_repo/")

# Materialize last 7 days to online store (Redis)
store.materialize_incremental(
    end_date=datetime.utcnow()
)

# Output:
# Materializing 1 feature views from 2024-01-08 to 2024-01-15
# ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:01:23
# ✅ ventas_customer_features: 1.2M rows materialized to Redis
# ✅ producto_features: 50K rows materialized
# ✅ logistica_features: 800K rows materialized
```

**Airflow DAG for Incremental Materialization:**

```python
# dags/feast_materialization.py
from airflow.decorators import dag, task
from datetime import datetime, timedelta

@dag(
    schedule="0 */4 * * *",  # Every 4 hours
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["feast", "feature-store"]
)
def feast_materialization():
    
    @task
    def materialize_features():
        from feast import FeatureStore
        
        store = FeatureStore(repo_path="/opt/airflow/feature_repo/")
        
        # Materialize incremental
        store.materialize_incremental(end_date=datetime.utcnow())
        
        print("✅ Features materialized to online store")
    
    materialize_features()

dag = feast_materialization()
```

---

**5. Training: Historical Features (Offline Store)**

```python
# ml/train_churn_model.py
from feast import FeatureStore
import pandas as pd
from datetime import datetime, timedelta

store = FeatureStore(repo_path="feature_repo/")

# Training dataset: customers who churned in last 90 days
entity_df = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, ...],
    "event_timestamp": [
        datetime(2024, 1, 1),
        datetime(2024, 1, 2),
        datetime(2024, 1, 3),
        ...
    ],
    "churned": [1, 0, 1, ...]  # Label
})

# Get historical features (point-in-time correct)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "ventas_customer_features:total_purchases_30d",
        "ventas_customer_features:transaction_count_7d",
        "ventas_customer_features:avg_basket_size_30d",
        "ventas_customer_features:days_since_last_purchase",
        "logistica_features:on_time_delivery_rate",
        "logistica_features:return_rate_30d"
    ]
).to_df()

print(training_df.head())
# customer_id  event_timestamp  total_purchases_30d  transaction_count_7d  ...  churned
# 1001         2024-01-01       542.30               3                     ...  1
# 1002         2024-01-02       1250.80              8                     ...  0

# Train model
from sklearn.ensemble import RandomForestClassifier

X = training_df.drop(columns=["customer_id", "event_timestamp", "churned"])
y = training_df["churned"]

model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)

# Save model
import joblib
joblib.dump(model, "churn_model_v1.pkl")
```

**Point-in-Time Correctness:**

```python
# ⚠️ Problem without feature store: Data leakage
# If we compute features at training time using current data,
# we're using information from the future!

# Example: Customer churned on 2024-01-15
# Feature: total_purchases_30d on 2024-01-15
# ❌ Wrong: Using purchases from 2024-01-15 to 2024-02-14 (includes future)
# ✅ Correct: Using purchases from 2023-12-16 to 2024-01-15 (only past)

# Feast handles this automatically with event_timestamp
```

---

**6. Serving: Real-Time Features (Online Store)**

```python
# api/prediction_service.py
from fastapi import FastAPI
from feast import FeatureStore
from pydantic import BaseModel
import joblib

app = FastAPI(title="Churn Prediction API")

# Load model and feature store
model = joblib.load("churn_model_v1.pkl")
store = FeatureStore(repo_path="feature_repo/")

class PredictionRequest(BaseModel):
    customer_id: int

class PredictionResponse(BaseModel):
    customer_id: int
    churn_probability: float
    risk_level: str

@app.post("/predict", response_model=PredictionResponse)
def predict_churn(request: PredictionRequest):
    # Get features from online store (Redis) - sub-millisecond latency
    features_dict = store.get_online_features(
        features=[
            "ventas_customer_features:total_purchases_30d",
            "ventas_customer_features:transaction_count_7d",
            "ventas_customer_features:avg_basket_size_30d",
            "ventas_customer_features:days_since_last_purchase",
            "logistica_features:on_time_delivery_rate",
            "logistica_features:return_rate_30d"
        ],
        entity_rows=[{"customer_id": request.customer_id}]
    ).to_dict()
    
    # Convert to DataFrame for model
    import pandas as pd
    features_df = pd.DataFrame(features_dict)
    
    # Predict
    churn_prob = model.predict_proba(features_df)[0][1]
    
    # Classify risk
    if churn_prob > 0.7:
        risk_level = "HIGH"
    elif churn_prob > 0.4:
        risk_level = "MEDIUM"
    else:
        risk_level = "LOW"
    
    return PredictionResponse(
        customer_id=request.customer_id,
        churn_probability=churn_prob,
        risk_level=risk_level
    )

# Latency: ~10ms (Redis lookup + model inference)
```

---

**7. Feature Engineering Pipeline**

```python
# pipelines/compute_ventas_features.py
"""
Airflow DAG: Compute Ventas domain features daily
"""
from airflow.decorators import dag, task
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from datetime import datetime, timedelta

@dag(schedule="@daily", start_date=datetime(2024, 1, 1))
def compute_ventas_features():
    
    @task
    def compute_features(ds):
        spark = SparkSession.builder.appName("VentasFeatures").getOrCreate()
        
        # Read transactions (last 90 days)
        transactions = spark.read.format("delta").load("s3://mesh/ventas/curated/transactions/")
        transactions = transactions.filter(col("transaction_date") >= date_sub(lit(ds), 90))
        
        # Compute features
        features = transactions.groupBy("customer_id").agg(
            # 7-day window
            sum(when(col("transaction_date") >= date_sub(lit(ds), 7), col("amount")).otherwise(0))
                .alias("total_purchases_7d"),
            count(when(col("transaction_date") >= date_sub(lit(ds), 7), 1))
                .alias("transaction_count_7d"),
            
            # 30-day window
            sum(when(col("transaction_date") >= date_sub(lit(ds), 30), col("amount")).otherwise(0))
                .alias("total_purchases_30d"),
            avg(when(col("transaction_date") >= date_sub(lit(ds), 30), col("basket_size")))
                .alias("avg_basket_size_30d"),
            
            # Recency
            datediff(lit(ds), max(col("transaction_date")))
                .alias("days_since_last_purchase"),
            
            # Most common payment method (last 90 days)
            first(col("payment_method"))  # After groupBy + window
                .alias("preferred_payment_method")
        )
        
        # Add metadata
        features = features \
            .withColumn("event_timestamp", lit(ds).cast("timestamp")) \
            .withColumn("created_timestamp", current_timestamp())
        
        # Write to feature store offline path (Feast reads this)
        features.write \
            .format("parquet") \
            .mode("overwrite") \
            .partitionBy("event_timestamp") \
            .save("s3://mesh/ventas/curated/features/")
        
        print(f"✅ Computed features for {features.count()} customers")
    
    compute_features()

dag = compute_ventas_features()
```

---

**8. Feature Versioning and Monitoring**

```python
# monitoring/feature_quality.py
from feast import FeatureStore
import great_expectations as gx

store = FeatureStore(repo_path="feature_repo/")

# Get recent features
features_df = store.get_historical_features(
    entity_df=...,
    features=["ventas_customer_features:total_purchases_30d"]
).to_df()

# Validate with Great Expectations
context = gx.get_context()

suite = context.add_expectation_suite("ventas_features_quality")
suite.add_expectation(
    expectation_type="expect_column_values_to_be_between",
    kwargs={"column": "total_purchases_30d", "min_value": 0, "max_value": 100000}
)
suite.add_expectation(
    expectation_type="expect_column_values_to_not_be_null",
    kwargs={"column": "total_purchases_30d"}
)

results = context.get_validator(batch_request=...).validate()

if not results.success:
    # Alert: Feature quality degraded
    send_slack_alert("Feature quality issue in ventas_customer_features")
```

---

**9. Tecton vs Feast Comparison**

| Feature | Feast (Open Source) | Tecton (Commercial) |
|---------|---------------------|---------------------|
| **Cost** | Free | $$$$ (enterprise pricing) |
| **Deployment** | Self-managed | Fully managed SaaS |
| **Online Store** | Redis, DynamoDB | Managed Redis + optimizations |
| **Offline Store** | S3, BigQuery, Snowflake | Snowflake, Databricks |
| **Real-time** | Streaming via custom code | Native streaming (Flink) |
| **Feature Engineering** | External (Spark, dbt) | Declarative transformations |
| **Monitoring** | Custom (GE, Prometheus) | Built-in (drift, quality) |
| **Lineage** | Manual (DataHub) | Automatic |
| **Support** | Community | Enterprise SLA |

**Decision Matrix:**
- **Feast**: Startups, cost-sensitive, engineering resources available
- **Tecton**: Enterprises, need SLA, limited ML engineering team

---

**Autor:** Luis J. Raigoso V. (LJRV)

### 📜 **Data Products as APIs: Contracts, Versioning & SLOs**

**1. Data Contract: API-First Design**

```yaml
# contracts/ventas-daily-revenue-api-v2.yaml
openapi: 3.0.3
info:
  title: Daily Revenue API
  version: 2.1.0
  description: |
    Provides daily revenue metrics aggregated by region.
    
    **Owner**: ventas-team@company.com
    **SLO**: 99.9% availability, p99 latency <15 minutes
    **Changelog**: 
      - v2.1.0 (2024-01): Added refunds_count field
      - v2.0.0 (2023-12): Breaking - renamed 'total' to 'revenue_gross'
  
  contact:
    name: Ventas Data Team
    email: ventas-team@company.com
    url: https://wiki.company.com/data-products/ventas-revenue
  
  x-slo:
    availability: 99.9%
    latency_p99: 900  # seconds (15 min)
    freshness_max: 1200  # seconds (20 min)
  
  x-domain: ventas
  x-cost-center: sales
  x-tier: gold  # bronze/silver/gold

servers:
  - url: https://api.company.com/ventas/v2
    description: Production
  - url: https://api-staging.company.com/ventas/v2
    description: Staging

paths:
  /daily-revenue:
    get:
      summary: Get daily revenue by region
      operationId: getDailyRevenue
      tags:
        - Revenue
      
      parameters:
        - name: date
          in: query
          required: true
          schema:
            type: string
            format: date
            example: "2024-01-15"
          description: Revenue date (YYYY-MM-DD)
        
        - name: region
          in: query
          required: false
          schema:
            type: string
            enum: [LATAM, NA, EU, APAC, ALL]
            default: ALL
          description: Filter by region
      
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/DailyRevenueResponse'
              examples:
                single_region:
                  summary: Single region
                  value:
                    date: "2024-01-15"
                    region: "LATAM"
                    revenue_gross: 125000.50
                    revenue_net: 118000.30
                    transactions_count: 1543
                    refunds_count: 23
                    currency: "USD"
                    calculated_at: "2024-01-15T10:30:00Z"
        
        '400':
          description: Invalid date format
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
        
        '404':
          description: No data for requested date
        
        '503':
          description: Service temporarily unavailable (SLO violation)
      
      security:
        - ApiKeyAuth: []
      
      x-rate-limit:
        requests_per_minute: 100
        requests_per_hour: 5000

components:
  schemas:
    DailyRevenueResponse:
      type: object
      required:
        - date
        - region
        - revenue_gross
        - revenue_net
        - transactions_count
        - currency
        - calculated_at
      properties:
        date:
          type: string
          format: date
          description: Revenue date
        region:
          type: string
          enum: [LATAM, NA, EU, APAC, ALL]
          description: Geographic region
        revenue_gross:
          type: number
          format: double
          minimum: 0
          description: Total revenue before refunds/discounts
          example: 125000.50
        revenue_net:
          type: number
          format: double
          minimum: 0
          description: Net revenue after refunds/discounts
          example: 118000.30
        transactions_count:
          type: integer
          minimum: 0
          description: Number of transactions
          example: 1543
        refunds_count:
          type: integer
          minimum: 0
          description: Number of refunded transactions (added v2.1)
          example: 23
        currency:
          type: string
          enum: [USD, EUR, BRL, MXN]
          description: Currency code
          example: "USD"
        calculated_at:
          type: string
          format: date-time
          description: Timestamp when metrics were calculated
          example: "2024-01-15T10:30:00Z"
        metadata:
          type: object
          properties:
            data_quality_score:
              type: number
              description: Quality score (0-1)
              example: 0.997
            source_systems:
              type: array
              items:
                type: string
              example: ["kafka_transactions", "sftp_refunds"]
    
    Error:
      type: object
      required:
        - error_code
        - message
      properties:
        error_code:
          type: string
          example: "INVALID_DATE_FORMAT"
        message:
          type: string
          example: "Date must be in YYYY-MM-DD format"
        details:
          type: object
  
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: X-API-Key
      description: Service account API key
```

---

**2. Schema Evolution: Backward Compatibility**

```python
# schema_evolution.py
"""
Rules for backward-compatible schema changes
"""

# ✅ SAFE (Backward Compatible):
# 1. Add optional field
{
    "date": "2024-01-15",
    "revenue_gross": 125000.50,
    "refunds_count": 23  # NEW OPTIONAL FIELD (v2.1)
}

# 2. Add new enum value
region: ["LATAM", "NA", "EU", "APAC", "ALL", "AFRICA"]  # NEW VALUE

# 3. Widen validation (less restrictive)
# Old: revenue_gross: minimum 100
# New: revenue_gross: minimum 0

# ❌ BREAKING (Not Backward Compatible):
# 1. Remove field
# Old: {"total": 125000.50}
# New: {}  # 'total' removed

# 2. Rename field
# Old: {"total": 125000.50}
# New: {"revenue_gross": 125000.50}  # 'total' renamed

# 3. Change type
# Old: {"transactions_count": 1543}  # integer
# New: {"transactions_count": "1543"}  # string

# 4. Make field required
# Old: {"date": "2024-01-15"}  # refunds_count optional
# New: {"date": "2024-01-15", "refunds_count": 23}  # required

# 5. Narrow validation (more restrictive)
# Old: revenue_gross: minimum 0
# New: revenue_gross: minimum 100
```

**Breaking Change Protocol:**

```markdown
# Breaking Change Checklist

## Before Implementation
- [ ] Document breaking change in CHANGELOG
- [ ] Notify consumers 30 days in advance (email, Slack)
- [ ] Update API version (v2 → v3)
- [ ] Maintain old version for deprecation period (6 months)

## Implementation
- [ ] Deploy new version (v3) alongside old (v2)
- [ ] Monitor usage of old version
- [ ] Provide migration guide with examples

## Deprecation
- [ ] Mark old version as deprecated (HTTP header: `Deprecation: true`)
- [ ] Return warning in responses (Sunset header)
- [ ] Send reminder emails at 90, 60, 30, 7 days before sunset
- [ ] Sunset date: Remove old version after 6 months

## Example Response Headers:
```http
HTTP/1.1 200 OK
Deprecation: true
Sunset: Sat, 15 Jul 2024 00:00:00 GMT
Link: <https://api.company.com/ventas/v3/daily-revenue>; rel="successor-version"
Warning: 299 - "This API version will be retired on 2024-07-15. Migrate to v3"
```
```

---

**3. API Implementation with FastAPI**

```python
# api/ventas_api.py
from fastapi import FastAPI, Query, HTTPException, Header, Depends
from pydantic import BaseModel, Field
from datetime import date, datetime
from typing import Optional, Literal
import boto3
from prometheus_client import Counter, Histogram

app = FastAPI(
    title="Daily Revenue API",
    version="2.1.0",
    description="Ventas domain data product",
    openapi_tags=[
        {"name": "Revenue", "description": "Revenue metrics operations"}
    ]
)

# Prometheus metrics
request_count = Counter('api_requests_total', 'Total requests', ['endpoint', 'status'])
request_duration = Histogram('api_request_duration_seconds', 'Request duration', ['endpoint'])

# ═══════════════════════════════════════════════════════
# MODELS
# ═══════════════════════════════════════════════════════

class DailyRevenueResponse(BaseModel):
    date: date
    region: Literal["LATAM", "NA", "EU", "APAC", "ALL"]
    revenue_gross: float = Field(..., ge=0, description="Revenue before refunds")
    revenue_net: float = Field(..., ge=0, description="Net revenue")
    transactions_count: int = Field(..., ge=0)
    refunds_count: int = Field(default=0, ge=0, description="New in v2.1")
    currency: Literal["USD", "EUR", "BRL", "MXN"]
    calculated_at: datetime
    
    class Config:
        schema_extra = {
            "example": {
                "date": "2024-01-15",
                "region": "LATAM",
                "revenue_gross": 125000.50,
                "revenue_net": 118000.30,
                "transactions_count": 1543,
                "refunds_count": 23,
                "currency": "USD",
                "calculated_at": "2024-01-15T10:30:00Z"
            }
        }

# ═══════════════════════════════════════════════════════
# DEPENDENCIES
# ═══════════════════════════════════════════════════════

def verify_api_key(x_api_key: str = Header(...)):
    """Verify API key from header"""
    # In production: check against Secrets Manager
    valid_keys = ["dev-key-123", "prod-key-456"]
    if x_api_key not in valid_keys:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return x_api_key

def get_s3_client():
    """Dependency: S3 client"""
    return boto3.client('s3', region_name='us-east-1')

# ═══════════════════════════════════════════════════════
# ENDPOINTS
# ═══════════════════════════════════════════════════════

@app.get(
    "/daily-revenue",
    response_model=DailyRevenueResponse,
    tags=["Revenue"],
    summary="Get daily revenue by region",
    responses={
        200: {"description": "Successful response"},
        400: {"description": "Invalid date format"},
        404: {"description": "No data for requested date"},
        503: {"description": "Service unavailable (SLO violation)"}
    }
)
async def get_daily_revenue(
    date: date = Query(..., description="Revenue date (YYYY-MM-DD)"),
    region: Literal["LATAM", "NA", "EU", "APAC", "ALL"] = Query("ALL"),
    api_key: str = Depends(verify_api_key),
    s3: boto3.client = Depends(get_s3_client)
):
    """
    Retrieve daily revenue metrics for a specific date and region.
    
    **Data Freshness**: Updated every 15 minutes
    **SLO**: p99 latency <15 minutes from transaction to API
    """
    with request_duration.labels(endpoint="/daily-revenue").time():
        try:
            # Query data from S3 (or cache)
            s3_key = f"gold/revenue/daily/dt={date}/region={region}.parquet"
            
            response = s3.get_object(
                Bucket='mesh',
                Key=s3_key
            )
            
            # Parse Parquet (simplified)
            import pandas as pd
            df = pd.read_parquet(response['Body'])
            
            if df.empty:
                request_count.labels(endpoint="/daily-revenue", status=404).inc()
                raise HTTPException(status_code=404, detail="No data for requested date")
            
            # Convert to response model
            row = df.iloc[0]
            result = DailyRevenueResponse(
                date=date,
                region=region,
                revenue_gross=float(row['revenue_gross']),
                revenue_net=float(row['revenue_net']),
                transactions_count=int(row['transactions_count']),
                refunds_count=int(row.get('refunds_count', 0)),  # Default for old data
                currency=row['currency'],
                calculated_at=row['calculated_at']
            )
            
            request_count.labels(endpoint="/daily-revenue", status=200).inc()
            return result
            
        except s3.exceptions.NoSuchKey:
            request_count.labels(endpoint="/daily-revenue", status=404).inc()
            raise HTTPException(status_code=404, detail=f"No data for {date}")
        
        except Exception as e:
            request_count.labels(endpoint="/daily-revenue", status=500).inc()
            raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health_check():
    """Health check endpoint for load balancer"""
    return {"status": "healthy", "version": "2.1.0"}

@app.get("/metrics")
async def metrics():
    """Prometheus metrics endpoint"""
    from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
    from fastapi import Response
    
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
```

---

**4. SLO Monitoring**

```python
# monitoring/slo_monitor.py
"""
Monitor SLOs for data products
"""
from prometheus_client import Gauge, start_http_server
import requests
from datetime import datetime, timedelta
import time

# SLO metrics
slo_availability = Gauge('data_product_availability', 'Availability %', ['product', 'domain'])
slo_latency_p99 = Gauge('data_product_latency_p99_seconds', 'p99 latency', ['product', 'domain'])
slo_freshness = Gauge('data_product_freshness_seconds', 'Data age', ['product', 'domain'])

def check_availability(api_url: str):
    """Check if API is responding (availability SLO)"""
    try:
        response = requests.get(f"{api_url}/health", timeout=5)
        return response.status_code == 200
    except:
        return False

def check_freshness(api_url: str):
    """Check data freshness (how old is the data?)"""
    try:
        response = requests.get(
            f"{api_url}/daily-revenue",
            params={"date": datetime.utcnow().date()},
            headers={"X-API-Key": "monitoring-key"},
            timeout=10
        )
        
        if response.status_code == 200:
            data = response.json()
            calculated_at = datetime.fromisoformat(data['calculated_at'].replace('Z', '+00:00'))
            age_seconds = (datetime.utcnow() - calculated_at).total_seconds()
            return age_seconds
        else:
            return None
    except:
        return None

def monitor_slos():
    """Continuous SLO monitoring"""
    data_products = [
        {
            "name": "daily_revenue_api",
            "domain": "ventas",
            "url": "https://api.company.com/ventas/v2",
            "slo_availability": 0.999,  # 99.9%
            "slo_latency_p99": 900,  # 15 min
            "slo_freshness": 1200  # 20 min
        },
        {
            "name": "shipment_tracking_api",
            "domain": "logistica",
            "url": "https://api.company.com/logistica/v1",
            "slo_availability": 0.9995,  # 99.95%
            "slo_latency_p99": 300,  # 5 min
            "slo_freshness": 600  # 10 min
        }
    ]
    
    while True:
        for product in data_products:
            # Availability
            is_available = check_availability(product['url'])
            availability_pct = 1.0 if is_available else 0.0
            slo_availability.labels(
                product=product['name'],
                domain=product['domain']
            ).set(availability_pct)
            
            # Freshness
            freshness = check_freshness(product['url'])
            if freshness:
                slo_freshness.labels(
                    product=product['name'],
                    domain=product['domain']
                ).set(freshness)
                
                # Alert if SLO violated
                if freshness > product['slo_freshness']:
                    send_alert(
                        f"🚨 Freshness SLO violated for {product['name']}: "
                        f"{freshness}s > {product['slo_freshness']}s"
                    )
        
        time.sleep(60)  # Check every minute

if __name__ == '__main__':
    start_http_server(8001)
    monitor_slos()
```

**Prometheus Alert Rules:**

```yaml
# alerts/data-product-slos.yml
groups:
  - name: data_product_slos
    interval: 1m
    rules:
      
      - alert: DataProductAvailabilitySLOViolation
        expr: |
          (
            sum_over_time(data_product_availability[30d]) 
            / 
            count_over_time(data_product_availability[30d])
          ) < 0.999
        for: 5m
        labels:
          severity: critical
          team: "{{ $labels.domain }}"
        annotations:
          summary: "{{ $labels.product }} availability below SLO"
          description: "Availability: {{ $value | humanizePercentage }} (target: 99.9%)"
          dashboard: "https://grafana.company.com/d/slo?product={{ $labels.product }}"
      
      - alert: DataProductFreshnessSLOViolation
        expr: |
          data_product_freshness_seconds > 1200
        for: 10m
        labels:
          severity: warning
          team: "{{ $labels.domain }}"
        annotations:
          summary: "{{ $labels.product }} data is stale"
          description: "Data age: {{ $value }}s (target: <1200s)"
      
      - alert: DataProductErrorBudgetExhausted
        expr: |
          (1 - 
            (sum_over_time(data_product_availability[30d]) 
            / 
            count_over_time(data_product_availability[30d]))
          ) > 0.001  # 99.9% SLO = 0.1% error budget
        for: 1h
        labels:
          severity: critical
          team: "{{ $labels.domain }}"
        annotations:
          summary: "{{ $labels.product }} error budget exhausted"
          description: |
            Error budget used: {{ $value | humanizePercentage }}
            Action: Freeze non-critical deployments until recovered
```

---

**5. Client SDK Generation**

```bash
# Generate Python client from OpenAPI spec
openapi-generator generate \
  -i contracts/ventas-daily-revenue-api-v2.yaml \
  -g python \
  -o clients/python/ventas-api-client \
  --package-name ventas_api_client

# Usage
pip install ./clients/python/ventas-api-client

# Client code
from ventas_api_client import ApiClient, Configuration, RevenuApi

config = Configuration()
config.host = "https://api.company.com/ventas/v2"
config.api_key['X-API-Key'] = "your-api-key"

client = ApiClient(configuration=config)
api = RevenueApi(client)

# Call API
response = api.get_daily_revenue(date="2024-01-15", region="LATAM")
print(f"Revenue: ${response.revenue_net:,.2f}")
```

---

**6. Contract Testing**

```python
# tests/contract_tests.py
"""
Ensure API complies with OpenAPI contract
"""
import pytest
from fastapi.testclient import TestClient
from api.ventas_api import app
import yaml

client = TestClient(app)

def load_openapi_spec():
    with open("contracts/ventas-daily-revenue-api-v2.yaml") as f:
        return yaml.safe_load(f)

def test_response_matches_schema():
    """Verify response matches OpenAPI schema"""
    spec = load_openapi_spec()
    
    response = client.get(
        "/daily-revenue?date=2024-01-15&region=LATAM",
        headers={"X-API-Key": "test-key"}
    )
    
    assert response.status_code == 200
    data = response.json()
    
    # Verify required fields
    schema = spec['components']['schemas']['DailyRevenueResponse']
    required_fields = schema['required']
    
    for field in required_fields:
        assert field in data, f"Missing required field: {field}"
    
    # Verify types
    assert isinstance(data['revenue_gross'], (int, float))
    assert isinstance(data['transactions_count'], int)
    assert data['region'] in ["LATAM", "NA", "EU", "APAC", "ALL"]

def test_backward_compatibility():
    """Ensure v2.1 is backward compatible with v2.0"""
    response = client.get(
        "/daily-revenue?date=2024-01-15",
        headers={"X-API-Key": "test-key"}
    )
    
    data = response.json()
    
    # v2.0 clients expect these fields
    assert 'date' in data
    assert 'revenue_gross' in data  # Renamed from 'total' in v2.0
    assert 'transactions_count' in data
    
    # v2.1 added optional field (should have default)
    assert 'refunds_count' in data
```

---

**Autor:** Luis J. Raigoso V. (LJRV)

### 🏛️ **Federated Governance: Policies, Observability & Cost at Scale**

**1. Computational Governance Model**

```
┌────────────────────────────────────────────────────────────────┐
│              Governance Federation Model                       │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  GLOBAL POLICIES (Platform Team)                        │  │
│  ├─────────────────────────────────────────────────────────┤  │
│  │ • Security: IAM, encryption, PII masking                │  │
│  │ • Compliance: GDPR, SOX, PCI-DSS                        │  │
│  │ • Quality: Minimum SLOs (99% availability)              │  │
│  │ • Observability: Mandatory lineage, metrics             │  │
│  │ • Cost: Budget limits per domain                        │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  AUTOMATED ENFORCEMENT (Policy Engine)                  │  │
│  ├─────────────────────────────────────────────────────────┤  │
│  │ • OPA/Cedar: Runtime policy checks                      │  │
│  │ • pre-commit hooks: Prevent non-compliant code          │  │
│  │ • CI/CD gates: Block deployment if policies fail        │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           ▼                                     │
│  ┌──────────────┬──────────────┬──────────────┬────────────┐  │
│  │ Ventas       │ Logistica    │ Producto     │ Marketing  │  │
│  │ (local impl) │ (local impl) │ (local impl) │ (local)    │  │
│  ├──────────────┼──────────────┼──────────────┼────────────┤  │
│  │ • Spark      │ • dbt        │ • Polars     │ • Airflow  │  │
│  │ • Daily      │ • Hourly     │ • Weekly     │ • Real-time│  │
│  │ • 50 GB/day  │ • 100 GB/day │ • 10 GB/day  │ • 200 GB/d │  │
│  └──────────────┴──────────────┴──────────────┴────────────┘  │
└────────────────────────────────────────────────────────────────┘
```

---

**2. Policy as Code: Open Policy Agent (OPA)**

```rego
# policies/data_mesh_policies.rego
package datamesh

import future.keywords.if
import future.keywords.in

# ═══════════════════════════════════════════════════════
# POLICY 1: PII Must Be Masked in Shared Datasets
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.dataset.sharing_level == "public"
    some column in input.dataset.columns
    column.contains_pii == true
    not column.is_masked
    
    msg := sprintf(
        "PII column '%s' in dataset '%s' must be masked for public sharing",
        [column.name, input.dataset.name]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 2: All Data Products Must Have Owner
# ═══════════════════════════════════════════════════════

deny[msg] if {
    not input.data_product.owner
    msg := sprintf(
        "Data product '%s' must have an owner defined",
        [input.data_product.name]
    )
}

deny[msg] if {
    input.data_product.owner
    not endswith(input.data_product.owner, "@company.com")
    msg := sprintf(
        "Data product owner '%s' must be a valid company email",
        [input.data_product.owner]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 3: Cost Budget Enforcement
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.domain.monthly_cost > input.domain.budget_limit
    overage := input.domain.monthly_cost - input.domain.budget_limit
    
    msg := sprintf(
        "Domain '%s' exceeded budget: $%.2f over limit of $%.2f",
        [input.domain.name, overage, input.domain.budget_limit]
    )
}

warn[msg] if {
    usage := input.domain.monthly_cost / input.domain.budget_limit
    usage > 0.8
    usage <= 1.0
    
    msg := sprintf(
        "Domain '%s' at %.0f%% of budget ($%.2f / $%.2f)",
        [input.domain.name, usage * 100, input.domain.monthly_cost, input.domain.budget_limit]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 4: Data Quality SLO Enforcement
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.data_product.quality_score < 0.95
    
    msg := sprintf(
        "Data product '%s' quality score %.2f%% below minimum 95%%",
        [input.data_product.name, input.data_product.quality_score * 100]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 5: Lineage Must Be Tracked
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.pipeline.outputs_to_shared_layer
    not input.pipeline.emits_lineage
    
    msg := sprintf(
        "Pipeline '%s' must emit lineage (OpenLineage) to DataHub",
        [input.pipeline.name]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 6: API Versioning Required
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.api.is_public
    not regex.match(`^/v[0-9]+/`, input.api.path)
    
    msg := sprintf(
        "Public API path '%s' must include version prefix (e.g., /v1/)",
        [input.api.path]
    )
}

# ═══════════════════════════════════════════════════════
# POLICY 7: Cross-Domain Access Requires Approval
# ═══════════════════════════════════════════════════════

deny[msg] if {
    input.access_request.source_domain != input.dataset.owner_domain
    not input.access_request.approved_by_owner
    
    msg := sprintf(
        "Domain '%s' accessing '%s' dataset requires approval from '%s' team",
        [
            input.access_request.source_domain,
            input.dataset.name,
            input.dataset.owner_domain
        ]
    )
}
```

**Policy Enforcement in CI/CD:**

```python
# scripts/check_policies.py
"""
Validate policies before deployment
"""
import subprocess
import json
import sys

def check_opa_policies(input_data: dict) -> bool:
    """
    Run OPA policy checks
    """
    # Convert input to JSON
    input_json = json.dumps(input_data)
    
    # Run OPA evaluation
    result = subprocess.run(
        ["opa", "eval", "--data", "policies/", "--input", "-", "data.datamesh.deny"],
        input=input_json.encode(),
        capture_output=True
    )
    
    if result.returncode != 0:
        print(f"❌ OPA evaluation failed: {result.stderr.decode()}")
        return False
    
    output = json.loads(result.stdout)
    violations = output['result'][0]['expressions'][0]['value']
    
    if violations:
        print("❌ Policy violations detected:")
        for violation in violations:
            print(f"  - {violation}")
        return False
    
    print("✅ All policies passed")
    return True

# Example usage in pre-commit hook
if __name__ == "__main__":
    input_data = {
        "data_product": {
            "name": "daily_revenue_api",
            "owner": "ventas-team@company.com",
            "quality_score": 0.997
        },
        "dataset": {
            "name": "ventas_curated",
            "sharing_level": "public",
            "columns": [
                {"name": "customer_id", "contains_pii": False},
                {"name": "email", "contains_pii": True, "is_masked": True}
            ]
        },
        "domain": {
            "name": "ventas",
            "monthly_cost": 1800,
            "budget_limit": 2000
        },
        "pipeline": {
            "name": "ventas_daily_batch",
            "outputs_to_shared_layer": True,
            "emits_lineage": True
        }
    }
    
    if not check_opa_policies(input_data):
        sys.exit(1)
```

---

**3. Multi-Tenant Observability Dashboard**

```python
# monitoring/mesh_dashboard.py
"""
Unified observability across all domains
"""
from grafana_api.grafana_face import GrafanaFace
import json

grafana = GrafanaFace(auth="admin:admin", host="grafana.company.com")

# Create unified dashboard
dashboard = {
    "dashboard": {
        "title": "Data Mesh - Multi-Domain Overview",
        "tags": ["data-mesh", "cross-domain"],
        "timezone": "utc",
        "panels": [
            # Panel 1: Domain Health Matrix
            {
                "id": 1,
                "title": "Domain Health Matrix",
                "type": "heatmap",
                "targets": [{
                    "expr": """
                        sum by (domain) (
                            rate(data_product_availability[5m])
                        )
                    """,
                    "legendFormat": "{{ domain }}"
                }],
                "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8}
            },
            
            # Panel 2: SLO Compliance by Domain
            {
                "id": 2,
                "title": "SLO Compliance by Domain",
                "type": "gauge",
                "targets": [{
                    "expr": """
                        (
                            sum_over_time(data_product_availability{domain="$domain"}[30d])
                            /
                            count_over_time(data_product_availability{domain="$domain"}[30d])
                        ) * 100
                    """,
                    "legendFormat": "Availability %"
                }],
                "fieldConfig": {
                    "defaults": {
                        "thresholds": {
                            "steps": [
                                {"color": "red", "value": 0},
                                {"color": "yellow", "value": 99},
                                {"color": "green", "value": 99.9}
                            ]
                        }
                    }
                },
                "gridPos": {"x": 12, "y": 0, "w": 6, "h": 8}
            },
            
            # Panel 3: Cost by Domain
            {
                "id": 3,
                "title": "Cost by Domain (Monthly)",
                "type": "piechart",
                "targets": [{
                    "expr": """
                        sum by (domain) (
                            increase(aws_billing_estimated_charges{domain!=""}[30d])
                        )
                    """
                }],
                "gridPos": {"x": 18, "y": 0, "w": 6, "h": 8}
            },
            
            # Panel 4: Data Freshness by Product
            {
                "id": 4,
                "title": "Data Freshness (Last Update)",
                "type": "table",
                "targets": [{
                    "expr": """
                        (time() - data_product_last_update_timestamp) / 60
                    """,
                    "format": "table",
                    "instant": True
                }],
                "transformations": [{
                    "id": "organize",
                    "options": {
                        "excludeByName": {},
                        "indexByName": {
                            "domain": 0,
                            "product": 1,
                            "Value": 2
                        },
                        "renameByName": {
                            "Value": "Minutes Since Update"
                        }
                    }
                }],
                "gridPos": {"x": 0, "y": 8, "w": 12, "h": 8}
            },
            
            # Panel 5: Pipeline Success Rate
            {
                "id": 5,
                "title": "Pipeline Success Rate (24h)",
                "type": "stat",
                "targets": [{
                    "expr": """
                        sum by (domain) (
                            rate(airflow_dag_run_success{domain!=""}[24h])
                        )
                        /
                        sum by (domain) (
                            rate(airflow_dag_run_total{domain!=""}[24h])
                        ) * 100
                    """,
                    "legendFormat": "{{ domain }}"
                }],
                "gridPos": {"x": 12, "y": 8, "w": 12, "h": 8}
            },
            
            # Panel 6: Feature Store Usage
            {
                "id": 6,
                "title": "Feature Store Requests (by domain)",
                "type": "timeseries",
                "targets": [{
                    "expr": """
                        rate(feast_feature_requests_total[5m])
                    """,
                    "legendFormat": "{{ domain }} - {{ feature_view }}"
                }],
                "gridPos": {"x": 0, "y": 16, "w": 24, "h": 8}
            },
            
            # Panel 7: Data Quality Score Trend
            {
                "id": 7,
                "title": "Data Quality Score Trend",
                "type": "timeseries",
                "targets": [{
                    "expr": """
                        avg by (domain) (
                            data_product_quality_score
                        )
                    """,
                    "legendFormat": "{{ domain }}"
                }],
                "fieldConfig": {
                    "defaults": {
                        "min": 0,
                        "max": 1,
                        "unit": "percentunit"
                    }
                },
                "gridPos": {"x": 0, "y": 24, "w": 12, "h": 8}
            },
            
            # Panel 8: Cross-Domain Dependencies
            {
                "id": 8,
                "title": "Cross-Domain API Calls",
                "type": "nodeGraph",
                "targets": [{
                    "expr": """
                        sum by (source_domain, target_domain) (
                            rate(api_requests_total{source_domain!="",target_domain!=""}[5m])
                        )
                    """
                }],
                "gridPos": {"x": 12, "y": 24, "w": 12, "h": 8}
            }
        ],
        "templating": {
            "list": [
                {
                    "name": "domain",
                    "type": "query",
                    "query": "label_values(data_product_availability, domain)",
                    "multi": False,
                    "includeAll": True
                }
            ]
        }
    },
    "folderId": 0,
    "overwrite": True
}

# Create dashboard
result = grafana.dashboard.update_dashboard(dashboard)
print(f"✅ Dashboard created: {result['url']}")
```

---

**4. Cost Allocation and Chargeback**

```python
# finops/cost_allocation.py
"""
Allocate AWS costs to domains (chargeback model)
"""
import boto3
from datetime import datetime, timedelta
import pandas as pd

ce = boto3.client('ce', region_name='us-east-1')

def get_domain_costs(start_date: str, end_date: str) -> pd.DataFrame:
    """
    Query AWS Cost Explorer with domain tags
    """
    response = ce.get_cost_and_usage(
        TimePeriod={
            'Start': start_date,
            'End': end_date
        },
        Granularity='DAILY',
        Metrics=['UnblendedCost', 'UsageQuantity'],
        GroupBy=[
            {'Type': 'TAG', 'Key': 'domain'},
            {'Type': 'DIMENSION', 'Key': 'SERVICE'}
        ],
        Filter={
            'Tags': {
                'Key': 'Project',
                'Values': ['data-mesh']
            }
        }
    )
    
    # Parse results
    rows = []
    for result in response['ResultsByTime']:
        date = result['TimePeriod']['Start']
        for group in result['Groups']:
            domain = group['Keys'][0].split('$')[1] if '$' in group['Keys'][0] else 'untagged'
            service = group['Keys'][1]
            cost = float(group['Metrics']['UnblendedCost']['Amount'])
            
            rows.append({
                'date': date,
                'domain': domain,
                'service': service,
                'cost': cost
            })
    
    df = pd.DataFrame(rows)
    return df

# Generate monthly report
start = (datetime.utcnow() - timedelta(days=30)).strftime('%Y-%m-%d')
end = datetime.utcnow().strftime('%Y-%m-%d')

costs_df = get_domain_costs(start, end)

# Aggregate by domain
domain_summary = costs_df.groupby('domain').agg({
    'cost': 'sum'
}).reset_index()

domain_summary = domain_summary.sort_values('cost', ascending=False)

print("\n📊 Domain Cost Summary (Last 30 Days):")
print("=" * 50)
for _, row in domain_summary.iterrows():
    print(f"{row['domain']:15} ${row['cost']:>10,.2f}")

print("=" * 50)
print(f"{'TOTAL':15} ${domain_summary['cost'].sum():>10,.2f}")

# Check budget overages
budgets = {
    'ventas': 2000,
    'logistica': 1500,
    'producto': 1000,
    'marketing': 2500,
    'finanzas': 3000
}

print("\n⚠️ Budget Status:")
print("=" * 50)
for domain, budget in budgets.items():
    actual = domain_summary[domain_summary['domain'] == domain]['cost'].sum()
    utilization = (actual / budget) * 100
    status = "✅" if utilization <= 100 else "🚨"
    
    print(f"{status} {domain:15} ${actual:>8,.2f} / ${budget:>8,.2f} ({utilization:>5.1f}%)")

# Export to DataHub for visibility
from datahub.emitter.rest_emitter import DatahubRestEmitter

emitter = DatahubRestEmitter('http://datahub:8080')

for _, row in domain_summary.iterrows():
    # Emit cost metrics to DataHub
    domain_urn = f"urn:li:domain:{row['domain']}"
    # (Implementation omitted for brevity)
```

---

**5. DataHub: Cross-Domain Lineage**

```python
# lineage/emit_cross_domain_lineage.py
"""
Emit lineage showing cross-domain dependencies
"""
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.schema_classes import (
    DatasetLineageTypeClass,
    UpstreamClass,
    UpstreamLineageClass
)

emitter = DatahubRestEmitter('http://datahub:8080')

# Example: Marketing domain consumes Ventas + Producto data
lineage_map = {
    # Marketing domain datasets
    "urn:li:dataset:(urn:li:dataPlatform:s3,mesh.marketing.customer_segments,PROD)": {
        "upstreams": [
            # Consumes from Ventas
            "urn:li:dataset:(urn:li:dataPlatform:s3,mesh.ventas.customer_transactions,PROD)",
            # Consumes from Producto
            "urn:li:dataset:(urn:li:dataPlatform:s3,mesh.producto.catalog,PROD)"
        ],
        "type": DatasetLineageTypeClass.TRANSFORMED
    },
    
    # Finanzas consumes Ventas
    "urn:li:dataset:(urn:li:dataPlatform:s3,mesh.finanzas.accounting_reports,PROD)": {
        "upstreams": [
            "urn:li:dataset:(urn:li:dataPlatform:s3,mesh.ventas.daily_revenue,PROD)"
        ],
        "type": DatasetLineageTypeClass.COPY
    }
}

for downstream_urn, lineage_info in lineage_map.items():
    upstreams = [
        UpstreamClass(
            dataset=upstream_urn,
            type=lineage_info['type']
        )
        for upstream_urn in lineage_info['upstreams']
    ]
    
    lineage = UpstreamLineageClass(upstreams=upstreams)
    
    emitter.emit_mcp(
        MetadataChangeProposalWrapper(
            entityUrn=downstream_urn,
            aspect=lineage
        )
    )
    
    print(f"✅ Lineage emitted for {downstream_urn}")
```

**DataHub Search: Find Cross-Domain Dependencies**

```graphql
# GraphQL query to find all datasets consuming Ventas data
query {
  search(
    input: {
      type: DATASET
      query: "*"
      filters: [
        {
          field: "upstream.urn"
          values: ["urn:li:domain:ventas"]
        }
      ]
    }
  ) {
    searchResults {
      entity {
        ... on Dataset {
          urn
          name
          domain {
            name
          }
          upstream {
            dataset {
              urn
              name
            }
          }
        }
      }
    }
  }
}
```

---

**6. Domain Autonomy with Guardrails**

```python
# platform/domain_provisioning.py
"""
Self-service domain provisioning with governance guardrails
"""
from typing import Dict
import boto3

class DomainProvisioner:
    """
    Automated domain provisioning with policy enforcement
    """
    
    def __init__(self):
        self.s3 = boto3.client('s3')
        self.iam = boto3.client('iam')
        self.glue = boto3.client('glue')
    
    def provision_new_domain(self, domain_name: str, config: Dict) -> Dict:
        """
        Provision infrastructure for new domain
        
        Enforces:
        - Naming conventions
        - Cost budgets
        - Security policies
        - Observability standards
        """
        
        # Validate config against policies
        if not self._validate_config(domain_name, config):
            raise ValueError("Configuration violates governance policies")
        
        resources = {}
        
        # 1. S3 bucket with standard structure
        bucket_name = f"mesh-{domain_name}"
        self.s3.create_bucket(Bucket=bucket_name)
        
        # Apply lifecycle policies (governance requirement)
        self.s3.put_bucket_lifecycle_configuration(
            Bucket=bucket_name,
            LifecycleConfiguration={
                'Rules': [
                    {
                        'ID': 'raw-retention',
                        'Prefix': 'raw/',
                        'Status': 'Enabled',
                        'Transitions': [
                            {'Days': 7, 'StorageClass': 'GLACIER'}
                        ],
                        'Expiration': {'Days': 90}
                    }
                ]
            }
        )
        
        # Enable versioning (governance requirement)
        self.s3.put_bucket_versioning(
            Bucket=bucket_name,
            VersioningConfiguration={'Status': 'Enabled'}
        )
        
        resources['s3_bucket'] = bucket_name
        
        # 2. IAM role with least privilege
        role_name = f"{domain_name}-pipeline-role"
        assume_role_policy = {
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Principal": {"Service": "emr-serverless.amazonaws.com"},
                "Action": "sts:AssumeRole"
            }]
        }
        
        self.iam.create_role(
            RoleName=role_name,
            AssumeRolePolicyDocument=json.dumps(assume_role_policy),
            Tags=[
                {'Key': 'domain', 'Value': domain_name},
                {'Key': 'managed-by', 'Value': 'platform-team'}
            ]
        )
        
        resources['iam_role'] = role_name
        
        # 3. Glue database
        self.glue.create_database(
            DatabaseInput={
                'Name': f"{domain_name}_db",
                'Description': f"Database for {domain_name} domain",
                'Parameters': {
                    'domain': domain_name,
                    'owner': config['owner']
                }
            }
        )
        
        resources['glue_database'] = f"{domain_name}_db"
        
        # 4. Budget alert (governance requirement)
        budgets = boto3.client('budgets')
        budgets.create_budget(
            AccountId='123456789012',
            Budget={
                'BudgetName': f"{domain_name}-monthly",
                'BudgetLimit': {
                    'Amount': str(config['budget_limit']),
                    'Unit': 'USD'
                },
                'TimeUnit': 'MONTHLY',
                'BudgetType': 'COST',
                'CostFilters': {
                    'TagKeyValue': [f'domain${domain_name}']
                }
            },
            NotificationsWithSubscribers=[
                {
                    'Notification': {
                        'NotificationType': 'ACTUAL',
                        'ComparisonOperator': 'GREATER_THAN',
                        'Threshold': 80.0
                    },
                    'Subscribers': [{
                        'SubscriptionType': 'EMAIL',
                        'Address': config['owner']
                    }]
                }
            ]
        )
        
        return resources
    
    def _validate_config(self, domain_name: str, config: Dict) -> bool:
        """Validate against governance policies"""
        
        # Policy 1: Owner must be specified
        if 'owner' not in config or not config['owner'].endswith('@company.com'):
            print("❌ Owner email required and must be @company.com")
            return False
        
        # Policy 2: Budget limit required
        if 'budget_limit' not in config or config['budget_limit'] <= 0:
            print("❌ Budget limit must be positive")
            return False
        
        # Policy 3: Naming convention
        if not domain_name.islower() or len(domain_name) > 20:
            print("❌ Domain name must be lowercase and <20 chars")
            return False
        
        return True

# Usage
provisioner = DomainProvisioner()

new_domain_config = {
    'owner': 'analytics-team@company.com',
    'budget_limit': 1500,
    'slo_availability': 0.999
}

resources = provisioner.provision_new_domain('analytics', new_domain_config)
print(f"✅ Domain 'analytics' provisioned: {resources}")
```

---

**7. Incident Response: Cross-Domain Impact Analysis**

```python
# sre/incident_response.py
"""
Analyze blast radius of incidents across domains
"""
from datahub.client import DataHubGraph

graph = DataHubGraph(server="http://datahub:8080")

def analyze_impact(failed_dataset_urn: str):
    """
    Find all downstream consumers of failed dataset
    """
    
    # Query DataHub for downstream lineage
    query = f"""
    {{
        dataset(urn: "{failed_dataset_urn}") {{
            urn
            name
            domain {{ name }}
            downstream(limit: 100) {{
                dataset {{
                    urn
                    name
                    domain {{ name }}
                }}
            }}
        }}
    }}
    """
    
    result = graph.execute_graphql(query)
    failed = result['dataset']
    
    print(f"\n🚨 INCIDENT: {failed['name']} in {failed['domain']['name']} domain")
    print("=" * 70)
    
    downstreams = failed.get('downstream', [])
    
    if not downstreams:
        print("✅ No downstream dependencies (isolated impact)")
        return
    
    # Group by domain
    impacted_domains = {}
    for downstream in downstreams:
        ds = downstream['dataset']
        domain = ds['domain']['name']
        
        if domain not in impacted_domains:
            impacted_domains[domain] = []
        
        impacted_domains[domain].append(ds['name'])
    
    print(f"⚠️ IMPACTED DOMAINS: {len(impacted_domains)}")
    for domain, datasets in impacted_domains.items():
        print(f"\n  {domain.upper()}:")
        for dataset in datasets:
            print(f"    - {dataset}")
    
    # Suggest actions
    print("\n📋 RECOMMENDED ACTIONS:")
    print("1. Notify impacted domain owners:")
    for domain in impacted_domains.keys():
        print(f"   - {domain}-team@company.com")
    print("2. Check if impacted datasets have fallback sources")
    print("3. Estimate ETA for fix and communicate")

# Example: Ventas daily revenue pipeline failed
analyze_impact("urn:li:dataset:(urn:li:dataPlatform:s3,mesh.ventas.daily_revenue,PROD)")

# Output:
# 🚨 INCIDENT: daily_revenue in ventas domain
# ======================================================================
# ⚠️ IMPACTED DOMAINS: 3
# 
#   MARKETING:
#     - customer_segments
#     - campaign_attribution
# 
#   FINANZAS:
#     - accounting_reports
#     - revenue_forecasts
# 
#   EXECUTIVE:
#     - executive_dashboard
# 
# 📋 RECOMMENDED ACTIONS:
# 1. Notify impacted domain owners:
#    - marketing-team@company.com
#    - finanzas-team@company.com
#    - executive-team@company.com
# 2. Check if impacted datasets have fallback sources
# 3. Estimate ETA for fix and communicate
```

---

**Autor:** Luis J. Raigoso V. (LJRV)

## 1. Contexto y requerimientos

**Empresa**: Marketplace multi-categoría con 5 dominios de negocio:
- Ventas (Sales)
- Logística (Fulfillment)
- Producto (Catalog)
- Marketing (Campaigns)
- Finanzas (Payments)

**Objetivo**: Cada dominio gestiona sus propios datos como producto, con SLOs, versionado y documentación. Un feature store central consume features de todos los dominios para ML.

**Requerimientos**:
- Plataforma self-service: catálogo, CI/CD, observabilidad compartidos.
- Gobernanza federada: políticas de seguridad y calidad globales, aplicadas localmente.
- Feature store (Feast/Tecton) con features de cada dominio.
- APIs de data products con contratos versionados (OpenAPI).
- Linaje cross-domain visible en DataHub.

## 2. Arquitectura Data Mesh propuesta

In [None]:
mesh_diagram = '''
┌────────────────────────────────────────────────────────────────┐
│                  Plataforma Self-Service                       │
│  - Airflow compartido  - DataHub (catálogo + linaje)          │
│  - Grafana + Prometheus - CI/CD (GitHub Actions)              │
│  - Feature Store (Feast) - Políticas IAM centrales            │
└────────────────────────────────────────────────────────────────┘
                              ▲
          ┌───────────────────┼───────────────────┐
          │                   │                   │
  ┌───────▼──────┐   ┌────────▼────────┐  ┌──────▼────────┐
  │ Dominio      │   │  Dominio        │  │  Dominio      │
  │ Ventas       │   │  Logística      │  │  Producto     │
  │ (data prod)  │   │  (data prod)    │  │  (data prod)  │
  │ - raw/       │   │  - raw/         │  │  - raw/       │
  │ - curated/   │   │  - curated/     │  │  - curated/   │
  │ - API        │   │  - API          │  │  - API        │
  │ - Features   │   │  - Features     │  │  - Features   │
  └──────────────┘   └─────────────────┘  └───────────────┘
          │                   │                   │
          └───────────────────┼───────────────────┘
                              ▼
                   ┌──────────────────┐
                   │  Feature Store   │
                   │  (Feast/Tecton)  │
                   └─────────┬────────┘
                             │
                      ┌──────▼─────┐
                      │ ML Models  │
                      │ (training) │
                      └────────────┘
'''
print(mesh_diagram)

## 3. Componentes por dominio (ejemplo: Ventas)

### 3.1 Data Product Ventas
- Owner: Equipo de Ventas.
- Fuentes: Kafka (transacciones), archivos batch (devoluciones).
- Storage: S3 `s3://mesh/ventas/raw/`, `s3://mesh/ventas/curated/`.
- API: FastAPI endpoint `/ventas/v1/daily-revenue` con contrato OpenAPI.
- Features: `cliente_total_compras_30d`, `cliente_num_transacciones_7d`.
- SLO: latencia p99 < 15 min, disponibilidad > 99.9%.
- Documentación: README, diagramas, changelog.

### 3.2 Pipeline Ventas
- Airflow DAG propio del equipo, con validaciones GE y alertas.
- Escribe features a Feast (offline store: S3 Parquet, online: Redis).
- Emite linaje a DataHub vía OpenLineage.

## 4. Feature Store centralizado

In [None]:
feast_config = r'''
# feature_repo/ventas_features.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
from datetime import timedelta

cliente = Entity(name='cliente_id', join_keys=['cliente_id'])

ventas_source = FileSource(
    path='s3://mesh/ventas/curated/features.parquet',
    timestamp_field='event_timestamp'
)

ventas_fv = FeatureView(
    name='ventas_features',
    entities=[cliente],
    ttl=timedelta(days=30),
    schema=[
        Field(name='total_compras_30d', dtype=Float32),
        Field(name='num_transacciones_7d', dtype=Int64),
    ],
    source=ventas_source,
    owner='ventas-team@empresa.com'
)

# Similarmente para logistica_features, producto_features, etc.
'''
print(feast_config.splitlines()[:25])

## 5. Gobernanza federada

- Políticas globales:
  - Todo PII enmascarado en datasets compartidos.
  - Validaciones mínimas de calidad (Great Expectations).
  - Linaje obligatorio (OpenLineage).
  - Versionado semántico de APIs.
- Autonomía local:
  - Cada dominio elige su stack de transformación (Spark/Pandas/dbt).
  - Frecuencia de actualizaciones según SLO propio.
  - Esquemas propios, evolucionables con compatibilidad (Avro/Protobuf).

## 6. Checklist de implementación

In [None]:
checklist = '''
☐ 1. Definir dominios y owners (RACI)
☐ 2. Crear buckets S3 por dominio (ventas/, logistica/, producto/)
☐ 3. DAG Airflow por dominio con validaciones y linaje
☐ 4. APIs FastAPI versionadas (OpenAPI specs)
☐ 5. Feature definitions en Feast por dominio
☐ 6. Feast apply y materialize-incremental en CI/CD
☐ 7. DataHub registrar data products con metadata
☐ 8. Políticas IAM federadas (admin global + roles por dominio)
☐ 9. Dashboard Grafana multi-dominio con SLOs
☐ 10. Contratos de calidad (data contracts) versionados
☐ 11. Onboarding docs para nuevos dominios
☐ 12. Incident response playbook cross-domain
☐ 13. Cost allocation tags por dominio
☐ 14. Training model multi-dominio (consume features de Feast)
☐ 15. Tests de integración cross-domain
'''
print(checklist)

## 7. Entregables

- Documento de diseño Data Mesh con principios y responsabilidades.
- Repositorio multi-dominio (monorepo o multi-repo).
- Feature store funcional con features de ≥3 dominios.
- Catálogo DataHub con linaje cross-domain.
- APIs documentadas (Swagger/OpenAPI) por dominio.
- Dashboard unificado con métricas de todos los dominios.
- Modelo ML entrenado consumiendo features federadas.
- Presentación ejecutiva (slides) con resultados y aprendizajes.

## 8. Evaluación

- Autonomía: ¿cada dominio opera independiente?
- Gobernanza: ¿políticas aplicadas consistentemente?
- Feature store: ¿features accesibles y versionadas?
- Observabilidad: ¿linaje y métricas cross-domain?
- Escalabilidad: ¿fácil agregar nuevos dominios?
- Costos: ¿optimización por dominio visible?