## üìå STORY 2.1: Arquitetura do Pipeline ML Simplificado

Definir pipeline ML m√≠nimo que pode ser integrado em 48h
Aprendizado anterior: Ensemble complexo √© overkill; Logistic Regression com recall focus funciona

In [None]:
# -*- coding: utf-8 -*-
"""Story 2.1: Arquitetura do Pipeline ML Simplificado

Automatically generated by Colab.

# üèóÔ∏è STORY 2.1: ARQUITETURA DO PIPELINE ML SIMPLIFICADO
## üìã Vis√£o Geral
Esta Story define o pipeline ML m√≠nimo que pode ser integrado em 48h,
focando em simplicidade, performance e integra√ß√£o r√°pida com backend.
"""

# ============================================================================
# IMPORTA√á√ïES NECESS√ÅRIAS
# ============================================================================
import pandas as pd
import numpy as np
import json
import os
import yaml
from datetime import datetime
from typing import Dict, List, Any, Optional, Tuple
import warnings

warnings.filterwarnings('ignore')

print("="*80)
print("üèóÔ∏è STORY 2.1: ARQUITETURA DO PIPELINE ML SIMPLIFICADO")
print("="*80)

# ============================================================================
# CONFIGURA√á√ïES
# ============================================================================
SEED = 42
np.random.seed(SEED)

# Criar estrutura de pastas
os.makedirs('datascience/2_solution/architecture', exist_ok=True)
os.makedirs('datascience/2_solution/contracts', exist_ok=True)
os.makedirs('datascience/2_solution/code', exist_ok=True)
os.makedirs('datascience/2_solution/docs', exist_ok=True)
os.makedirs('datascience/2_solution/tests', exist_ok=True)

print("üìÅ Estrutura de pastas criada")

# ============================================================================
# TASK 2.1.1: üèóÔ∏è DESIGN DO PIPELINE END-TO-END
# ============================================================================
print("\n" + "="*80)
print("üèóÔ∏è TASK 2.1.1: DESIGN DO PIPELINE END-TO-END")
print("="*80)

print("\nüîß DEFININDO ARQUITETURA DO PIPELINE ML...")

# 1. Vis√£o geral do pipeline
pipeline_design = {
    "name": "FlightDelayPredictionPipeline",
    "version": "1.0.0",
    "description": "Pipeline m√≠nimo para previs√£o de atrasos de voos",
    "design_principles": [
        "Simplicidade sobre complexidade",
        "Performance em tempo real (< 2s)",
        "Integra√ß√£o f√°cil com backend Java",
        "Manutenibilidade m√°xima",
        "Fallback para regras simples"
    ],
    "timeline": {
        "development": "24h",
        "integration": "24h", 
        "total": "48h"
    }
}

# 2. Steps do pipeline end-to-end
pipeline_steps = {
    "step_1_input": {
        "name": "Input Validation",
        "description": "Valida√ß√£o dos 5 inputs do usu√°rio",
        "input": "JSON com 5 campos",
        "output": "Dicion√°rio Python validado",
        "timeout": "100ms",
        "error_handling": "Retorna erro 400 com detalhes"
    },
    "step_2_transform": {
        "name": "Feature Transformation",
        "description": "Transforma inputs em 7 features usando MVPTrafficFeatureTransformer",
        "input": "Dicion√°rio Python validado",
        "output": "Array numpy com 7 features",
        "timeout": "50ms",
        "dependencies": ["transform_simple.py"]
    },
    "step_3_predict": {
        "name": "Model Prediction",
        "description": "Previs√£o usando Logistic Regression",
        "input": "Array numpy com 7 features",
        "output": "Probabilidade (0-1) e classe bin√°ria",
        "timeout": "100ms",
        "algorithm": "LogisticRegression",
        "fallback": "Regra base (taxa hist√≥rica)"
    },
    "step_4_postprocess": {
        "name": "Post-processing",
        "description": "C√°lculo de m√©tricas de neg√≥cio",
        "input": "Probabilidade e classe",
        "output": "JSON com previs√£o, probabilidade e custo evitado",
        "timeout": "50ms",
        "calculations": ["custo_evitado", "confian√ßa"]
    },
    "step_5_output": {
        "name": "Output Formatting",
        "description": "Formata√ß√£o da resposta final",
        "input": "Dados processados",
        "output": "JSON padronizado",
        "timeout": "50ms",
        "format": {
            "status": "success/error",
            "prediction": {"atraso": bool, "probabilidade": float},
            "business_metrics": {"custo_evitado": float, "confian√ßa": str},
            "metadata": {"model_version": str, "processing_time_ms": float}
        }
    }
}

# 3. Especifica√ß√£o de input (5 campos JSON)
input_specification = {
    "required_fields": 5,
    "schema": {
        "companhia_aerea": {
            "type": "string",
            "pattern": "^[A-Z0-9]{2}$",
            "description": "C√≥digo IATA da companhia a√©rea",
            "example": "AA",
            "required": True
        },
        "aeroporto_origem": {
            "type": "string",
            "pattern": "^[A-Z]{3}$",
            "description": "C√≥digo IATA do aeroporto de origem",
            "example": "JFK",
            "required": True
        },
        "aeroporto_destino": {
            "type": "string",
            "pattern": "^[A-Z]{3}$",
            "description": "C√≥digo IATA do aeroporto de destino",
            "example": "LAX",
            "required": True
        },
        "data_hora_partida": {
            "type": "string",
            "format": "date-time",
            "description": "Data e hora da partida no formato ISO 8601",
            "example": "2024-01-15T14:30:00",
            "required": True
        },
        "distancia_km": {
            "type": "number",
            "minimum": 0,
            "maximum": 5000,
            "description": "Dist√¢ncia em quil√¥metros entre origem e destino",
            "example": 3980.0,
            "required": True
        }
    },
    "example_request": {
        "companhia_aerea": "AA",
        "aeroporto_origem": "JFK",
        "aeroporto_destino": "LAX",
        "data_hora_partida": "2024-01-15T14:30:00",
        "distancia_km": 3980.0
    }
}

# 4. Especifica√ß√£o de output
output_specification = {
    "fields": {
        "previsao": {
            "type": "boolean",
            "description": "Previs√£o de atraso (True = atraso ‚â• 15min)",
            "example": True
        },
        "probabilidade": {
            "type": "float",
            "description": "Probabilidade de atraso (0-1)",
            "example": 0.78
        },
        "custo_evitado": {
            "type": "float",
            "description": "Custo estimado evitado em R$ se a√ß√£o preventiva for tomada",
            "example": 1250.50
        },
        "confian√ßa": {
            "type": "string",
            "description": "N√≠vel de confian√ßa da previs√£o (BAIXA/MEDIA/ALTA)",
            "example": "ALTA"
        }
    },
    "example_response_success": {
        "status": "success",
        "prediction": {
            "atraso": True,
            "probabilidade": 0.78
        },
        "business_metrics": {
            "custo_evitado": 1250.50,
            "confian√ßa": "ALTA"
        },
        "metadata": {
            "model_version": "1.0.0",
            "processing_time_ms": 185.5,
            "timestamp": "2024-01-15T14:30:00.123Z"
        }
    },
    "example_response_error": {
        "status": "error",
        "error": {
            "code": "VALIDATION_ERROR",
            "message": "Campo 'companhia_aerea' inv√°lido",
            "details": "Deve ser string de 2 caracteres"
        },
        "metadata": {
            "timestamp": "2024-01-15T14:30:00.123Z"
        }
    }
}

# 5. Fluxo de dados entre componentes
data_flow = {
    "sequence": [
        "API Gateway (Java) ‚Üí FastAPI Endpoint",
        "FastAPI ‚Üí Input Validator",
        "Input Validator ‚Üí Feature Transformer",
        "Feature Transformer ‚Üí ML Model",
        "ML Model ‚Üí Business Logic",
        "Business Logic ‚Üí Output Formatter",
        "Output Formatter ‚Üí FastAPI Response",
        "FastAPI Response ‚Üí API Gateway (Java)"
    ],
    "data_formats": {
        "step_1": "JSON (HTTP Request)",
        "step_2": "Python Dict (validado)",
        "step_3": "NumPy Array (7 features)",
        "step_4": "Python Tuple (prob, class)",
        "step_5": "Python Dict (enriquecido)",
        "step_6": "JSON (HTTP Response)"
    },
    "performance_targets": {
        "total_latency": "< 500ms (p95)",
        "throughput": "> 100 req/sec",
        "error_rate": "< 1%",
        "availability": "> 99.5%"
    }
}

print("\n‚úÖ PIPELINE END-TO-END DEFINIDO:")
print("   ‚Ä¢ 5 steps principais")
print("   ‚Ä¢ Input: 5 campos JSON")
print("   ‚Ä¢ Output: previs√£o + probabilidade + custo_evitado")
print("   ‚Ä¢ Fluxo de dados documentado")

# ============================================================================
# TASK 2.1.2: ‚öôÔ∏è ESCOLHA DO ALGORITMO BASE
# ============================================================================
print("\n" + "="*80)
print("‚öôÔ∏è TASK 2.1.2: ESCOLHA DO ALGORITMO BASE")
print("="*80)

print("\nü§ñ ANALISANDO OP√á√ïES DE ALGORITMOS...")

# 1. Compara√ß√£o de algoritmos
algorithm_comparison = {
    "LogisticRegression": {
        "pros": [
            "Simplicidade interpret√°vel",
            "R√°pido treino e infer√™ncia",
            "Probabilidades calibradas",
            "Funciona bem com features lineares",
            "F√°cil debug e monitoramento",
            "Baixa complexidade computacional"
        ],
        "cons": [
            "Linear por natureza",
            "Menos flex√≠vel que ensembles",
            "Sens√≠vel a outliers"
        ],
        "performance": {
            "training_time": "segundos",
            "inference_time": "milissegundos",
            "memory_footprint": "MBs"
        },
        "fit_48h_timeline": "EXCELENTE"
    },
    "RandomForest": {
        "pros": [
            "Alta precis√£o",
            "Robusto a outliers",
            "Feature importance nativa"
        ],
        "cons": [
            "Mais lento para infer√™ncia",
            "Complexidade maior",
            "Overfitting risco",
            "Probabilidades n√£o calibradas"
        ],
        "performance": {
            "training_time": "minutos",
            "inference_time": "dezenas de ms",
            "memory_footprint": "centenas de MB"
        },
        "fit_48h_timeline": "MODERADO"
    },
    "XGBoost": {
        "pros": [
            "State-of-the-art performance",
            "Excelente com dados estruturados",
            "Feature importance"
        ],
        "cons": [
            "Complexidade alta",
            "Hiperpar√¢metros sens√≠veis",
            "Infer√™ncia mais lenta",
            "Manuten√ß√£o complexa"
        ],
        "performance": {
            "training_time": "minutos-horas",
            "inference_time": "dezenas de ms",
            "memory_footprint": "centenas de MB"
        },
        "fit_48h_timeline": "RUIM"
    }
}

# 2. Justificativa para LogisticRegression
justification = {
    "decision": "LogisticRegression",
    "reasons": [
        "Timeline de 48h exige simplicidade",
        "MVP n√£o precisa de complexidade excessiva",
        "Recall > 0.75 alcan√ß√°vel com class_weight='balanced'",
        "Interpretabilidade crucial para neg√≥cio",
        "Manuten√ß√£o f√°cil para equipe multidisciplinar",
        "Integra√ß√£o r√°pida com backend Java"
    ],
    "learning_from_previous": [
        "Ensemble complexo √© overkill para MVP",
        "Recall focado √© mais importante que accuracy",
        "Simplicidade acelera time-to-market"
    ]
}

# 3. Par√¢metros iniciais do modelo
model_parameters = {
    "algorithm": "LogisticRegression",
    "hyperparameters": {
        "penalty": "l2",
        "C": 1.0,
        "class_weight": "balanced",
        "solver": "lbfgs",
        "max_iter": 1000,
        "random_state": SEED
    },
    "rationale": {
        "class_weight='balanced'": "Compensa desbalanceamento do dataset",
        "penalty='l2'": "Regulariza√ß√£o para evitar overfitting",
        "solver='lbfgs'": "Eficiente para datasets m√©dios",
        "max_iter=1000": "Garante converg√™ncia"
    }
}

# 4. M√©trica prim√°ria: RECALL
metrics_specification = {
    "primary_metric": "RECALL (classe positiva)",
    "target": "> 0.75",
    "rationale": "Melhor prever falsos positivos que falsos negativos em atrasos",
    "secondary_metrics": {
        "precision": "> 0.60",
        "f1_score": "> 0.65",
        "roc_auc": "> 0.70"
    },
    "business_impact": {
        "false_negative": "ALTO (cliente perde voo)",
        "false_positive": "M√âDIO (alerta desnecess√°rio)",
        "true_positive": "ALTO (evita problemas)",
        "true_negative": "M√âDIO (confirma√ß√£o pontualidade)"
    }
}

# 5. Estrat√©gia de fallback
fallback_strategy = {
    "trigger_conditions": [
        "Modelo n√£o carregado",
        "Tempo de resposta > 2s",
        "Erro na transforma√ß√£o de features",
        "Confian√ßa da predi√ß√£o < 0.5"
    ],
    "fallback_method": "Regra base em taxa hist√≥rica",
    "implementation": {
        "rule": "Se hora_do_dia ‚àà [16, 20] E companhia in ['XY', 'YZ'] ‚Üí prever atraso",
        "default": "Prever n√£o-atraso (classe majorit√°ria)",
        "performance": {
            "recall": "~0.30",
            "precision": "~0.25",
            "accuracy": "~0.70"
        }
    },
    "monitoring": {
        "fallback_rate": "Alertar se > 5%",
        "performance_gap": "Alertar se recall < 0.60"
    }
}

print("\n‚úÖ ALGORITMO BASE DEFINIDO:")
print("   ‚Ä¢ Escolha: LogisticRegression (simplicidade > complexidade)")
print("   ‚Ä¢ M√©trica prim√°ria: RECALL > 0.75")
print("   ‚Ä¢ Par√¢metros: class_weight='balanced'")
print("   ‚Ä¢ Fallback: regra simples baseada em taxa hist√≥rica")

# ============================================================================
# TASK 2.1.3: üîå INTERFACE COM BACKEND JAVA
# ============================================================================
print("\n" + "="*80)
print("üîå TASK 2.1.3: INTERFACE COM BACKEND JAVA")
print("="*80)

print("\nüåê DEFININDO API PARA INTEGRA√á√ÉO COM JAVA...")

# 1. Arquitetura da API
api_architecture = {
    "technology": "FastAPI",
    "justification": [
        "Performance alta (async/await)",
        "Documenta√ß√£o autom√°tica (OpenAPI/Swagger)",
        "F√°cil integra√ß√£o com Java via HTTP",
        "Type hints para valida√ß√£o",
        "Comunidade ativa"
    ],
    "deployment": {
        "container": "Docker (Python 3.9-slim)",
        "orchestration": "Kubernetes/Docker Compose",
        "scaling": "Horizontal (r√©plicas)",
        "resources": {
            "cpu": "500m",
            "memory": "512Mi",
            "requests_per_pod": "50"
        }
    },
    "integration_with_java": {
        "protocol": "HTTP/REST",
        "authentication": "API Key header",
        "load_balancing": "Round-robin via Nginx/Ingress",
        "circuit_breaker": "Hystrix/Resilience4j no lado Java",
        "timeout_config": {
            "java_side": "2s total",
            "python_side": "1.5s processamento"
        }
    }
}

# 2. Especifica√ß√£o dos endpoints
api_endpoints = {
    "/predict": {
        "method": "POST",
        "description": "Endpoint principal para previs√£o de atrasos",
        "request_body": {
            "type": "application/json",
            "schema": "input_specification",
            "required": True
        },
        "responses": {
            "200": {
                "description": "Previs√£o bem-sucedida",
                "schema": "output_specification"
            },
            "400": {
                "description": "Erro de valida√ß√£o dos inputs",
                "schema": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "example": "error"},
                        "error": {
                            "type": "object",
                            "properties": {
                                "code": {"type": "string", "example": "VALIDATION_ERROR"},
                                "message": {"type": "string", "example": "Campo inv√°lido"},
                                "details": {"type": "string", "example": "Esperado string de 2 caracteres"}
                            }
                        }
                    }
                }
            },
            "500": {
                "description": "Erro interno do servidor",
                "schema": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "example": "error"},
                        "error": {
                            "type": "object",
                            "properties": {
                                "code": {"type": "string", "example": "INTERNAL_ERROR"},
                                "message": {"type": "string", "example": "Erro no processamento"}
                            }
                        }
                    }
                }
            },
            "503": {
                "description": "Servi√ßo indispon√≠vel",
                "schema": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "example": "error"},
                        "error": {
                            "type": "object",
                            "properties": {
                                "code": {"type": "string", "example": "SERVICE_UNAVAILABLE"},
                                "message": {"type": "string", "example": "Servi√ßo em manuten√ß√£o"}
                            }
                        }
                    }
                }
            }
        },
        "timeout": "2s",
        "rate_limit": "100 requests/minute",
        "authentication": "API Key (X-API-Key header)"
    },
    "/health": {
        "method": "GET",
        "description": "Health check do servi√ßo ML",
        "responses": {
            "200": {
                "description": "Servi√ßo saud√°vel",
                "schema": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "example": "healthy"},
                        "timestamp": {"type": "string", "example": "2024-01-15T14:30:00Z"},
                        "components": {
                            "type": "object",
                            "properties": {
                                "model": {"type": "string", "example": "loaded"},
                                "database": {"type": "string", "example": "connected"},
                                "cache": {"type": "string", "example": "available"}
                            }
                        },
                        "metrics": {
                            "type": "object",
                            "properties": {
                                "uptime": {"type": "string", "example": "99.5%"},
                                "response_time_p95": {"type": "number", "example": 185.5},
                                "requests_last_hour": {"type": "integer", "example": 1500}
                            }
                        }
                    }
                }
            },
            "503": {
                "description": "Servi√ßo n√£o saud√°vel",
                "schema": {
                    "type": "object",
                    "properties": {
                        "status": {"type": "string", "example": "unhealthy"},
                        "timestamp": {"type": "string", "example": "2024-01-15T14:30:00Z"},
                        "failed_components": {"type": "array", "items": {"type": "string"}},
                        "error": {"type": "string"}
                    }
                }
            }
        },
        "timeout": "100ms",
        "frequency": "A cada 30s pelo load balancer"
    },
    "/metrics": {
        "method": "GET",
        "description": "M√©tricas de performance do modelo",
        "responses": {
            "200": {
                "description": "M√©tricas dispon√≠veis",
                "schema": {
                    "type": "object",
                    "properties": {
                        "model_metrics": {
                            "type": "object",
                            "properties": {
                                "recall": {"type": "number"},
                                "precision": {"type": "number"},
                                "f1_score": {"type": "number"},
                                "accuracy": {"type": "number"}
                            }
                        },
                        "performance_metrics": {
                            "type": "object",
                            "properties": {
                                "avg_response_time_ms": {"type": "number"},
                                "requests_per_second": {"type": "number"},
                                "error_rate": {"type": "number"}
                            }
                        }
                    }
                }
            }
        },
        "authentication": "API Key (X-API-Key header)"
    }
}

print("\n‚úÖ API PARA BACKEND JAVA DEFINIDA:")
print("   ‚Ä¢ Endpoints: /predict (POST), /health (GET), /metrics (GET)")
print("   ‚Ä¢ Timeout m√°ximo: 2s")
print("   ‚Ä¢ Tecnologia: FastAPI + Docker")
print("   ‚Ä¢ Integra√ß√£o: HTTP/REST com circuit breaker")

# ============================================================================
# TASK 2.1.4: üß™ ESTRAT√âGIA DE VALIDA√á√ÉO R√ÅPIDA
# ============================================================================
print("\n" + "="*80)
print("üß™ TASK 2.1.4: ESTRAT√âGIA DE VALIDA√á√ÉO R√ÅPIDA")
print("="*80)

print("\nüìä DEFININDO VALIDA√á√ÉO DO MODELO...")

# 1. Split treino/teste
validation_strategy = {
    "split_method": "Stratified 70/30",
    "rationale": "Manter propor√ß√£o de classes em ambos conjuntos",
    "implementation": {
        "train_size": 0.7,
        "test_size": 0.3,
        "random_state": SEED,
        "stratify": True
    },
    "timeline": {
        "data_preparation": "2h",
        "training": "1h",
        "validation": "2h",
        "total": "5h"
    }
}

# 2. M√©tricas m√≠nimas aceit√°veis
acceptance_criteria = {
    "primary_metric": {
        "name": "Recall (classe positiva)",
        "minimum": 0.75,
        "target": 0.80,
        "stretch": 0.85
    },
    "secondary_metrics": [
        {
            "name": "Precision",
            "minimum": 0.60,
            "target": 0.65
        },
        {
            "name": "F1-Score",
            "minimum": 0.65,
            "target": 0.70
        },
        {
            "name": "ROC-AUC",
            "minimum": 0.70,
            "target": 0.75
        }
    ],
    "business_metrics": [
        {
            "name": "Custo evitado estimado",
            "calculation": "TP * R$500 + FP * R$100",
            "minimum": "R$50,000/m√™s"
        },
        {
            "name": "Redu√ß√£o de reclama√ß√µes",
            "target": "30% redu√ß√£o"
        }
    ]
}

# 3. Baseline (modelo dummy)
baseline_models = {
    "dummy_stratified": {
        "strategy": "stratified",
        "description": "Prev√™ de acordo com distribui√ß√£o das classes",
        "expected_performance": {
            "accuracy": "~taxa da classe majorit√°ria",
            "recall": "~taxa da classe positiva",
            "precision": "~taxa da classe positiva"
        }
    },
    "dummy_most_frequent": {
        "strategy": "most_frequent",
        "description": "Sempre prev√™ a classe majorit√°ria (n√£o-atraso)",
        "expected_performance": {
            "accuracy": "~70% (se 30% de atrasos)",
            "recall": "0% (nunca detecta atrasos)",
            "precision": "0% (n√£o aplic√°vel)"
        }
    },
    "improvement_target": {
        "vs_dummy_stratified": "> 20% melhor em recall",
        "minimum_improvement": "Recall absoluto > 0.75"
    }
}

# 4. Template de model card (simplificado)
model_card_template = {
    "model_details": {
        "name": "Flight Delay Predictor MVP",
        "version": "1.0.0",
        "date": datetime.now().strftime("%Y-%m-%d"),
        "owners": ["@ananda.matos"]
    },
    "intended_use": {
        "primary_use": "Previs√£o de atrasos de voos comerciais",
        "primary_users": "Agentes de opera√ß√µes de aeroporto"
    },
    "metrics": {
        "performance_measures": {
            "recall": "target > 0.75",
            "precision": "target > 0.60",
            "f1_score": "target > 0.65",
            "inference_time": "target < 200ms"
        }
    }
}

print("\n‚úÖ ESTRAT√âGIA DE VALIDA√á√ÉO DEFINIDA:")
print("   ‚Ä¢ Split: 70/30 estratificado")
print("   ‚Ä¢ M√©tricas m√≠nimas: RECALL > 0.75")
print("   ‚Ä¢ Baseline: modelo dummy estratificado")
print("   ‚Ä¢ Model card: template simplificado criado")

# ============================================================================
# üì¶ SALVAR ENTREG√ÅVEIS DA STORY 2.1
# ============================================================================
print("\n" + "="*80)
print("üì¶ SALVANDO ENTREG√ÅVEIS DA STORY 2.1")
print("="*80)

# 1. Salvar arquitetura do pipeline ML
print("\nüìÑ 1. SALVANDO DESIGN DO PIPELINE...")

# Construir o conte√∫do do pipeline em partes para evitar erro de f-string
current_date = datetime.now().strftime('%Y-%m-%d')

ml_pipeline_parts = [
    "# üèóÔ∏è ARQUITETURA DO PIPELINE ML SIMPLIFICADO\n\n",
    "## üìã VIS√ÉO GERAL\n",
    f"- **Story:** 2.1 - Arquitetura do Pipeline ML Simplificado\n",
    f"- **Respons√°vel:** @ananda.matos\n",
    f"- **Data:** {current_date}\n",
    "- **Timeline:** 48h (24h desenvolvimento + 24h integra√ß√£o)\n\n",
    "## üéØ PRINC√çPIOS DE DESIGN\n",
    "1. **Simplicidade sobre complexidade**\n",
    "2. **Performance em tempo real (< 2s)**\n",
    "3. **Integra√ß√£o f√°cil com backend Java**\n",
    "4. **Manutenibilidade m√°xima**\n",
    "5. **Fallback para regras simples**\n\n",
    "## üîÑ PIPELINE END-TO-END\n\n",
    "### Step 1: Input Validation\n",
    "- **Entrada:** JSON com 5 campos do usu√°rio\n",
    "- **Processo:** Valida√ß√£o de formato e ranges\n",
    "- **Sa√≠da:** Dicion√°rio Python validado\n",
    "- **Timeout:** 100ms\n",
    "- **Tratamento de erro:** Retorna 400 com detalhes\n\n",
    "### Step 2: Feature Transformation\n",
    "- **Entrada:** Dicion√°rio Python validado\n",
    "- **Processo:** Transforma√ß√£o usando MVPTrafficFeatureTransformer\n",
    "- **Sa√≠da:** Array numpy com 7 features\n",
    "- **Timeout:** 50ms\n",
    "- **Depend√™ncias:** `transform_simple.py`\n\n",
    "### Step 3: Model Prediction\n",
    "- **Entrada:** Array numpy com 7 features\n",
    "- **Processo:** Previs√£o com LogisticRegression\n",
    "- **Sa√≠da:** Probabilidade (0-1) e classe bin√°ria\n",
    "- **Timeout:** 100ms\n",
    "- **Fallback:** Regra base em taxa hist√≥rica\n\n",
    "### Step 4: Post-processing\n",
    "- **Entrada:** Probabilidade e classe\n",
    "- **Processo:** C√°lculo de m√©tricas de neg√≥cio\n",
    "- **Sa√≠da:** Dados enriquecidos com custo evitado\n",
    "- **Timeout:** 50ms\n\n",
    "### Step 5: Output Formatting\n",
    "- **Entrada:** Dados processados\n",
    "- **Processo:** Formata√ß√£o JSON padronizada\n",
    "- **Sa√≠da:** Resposta HTTP\n",
    "- **Timeout:** 50ms\n\n",
    "## üì• ESPECIFICA√á√ÉO DE INPUT (5 CAMPOS)\n\n",
    "### 1. companhia_aerea\n",
    "```json\n",
    "{\n",
    '  "type": "string",\n',
    '  "pattern": "^[A-Z0-9]{2}$",\n',
    '  "description": "C√≥digo IATA da companhia a√©rea",\n',
    '  "example": "AA",\n',
    '  "required": true\n',
    "}\n",
    "```\n\n",
    "### 2. aeroporto_origem\n",
    "```json\n",
    "{\n",
    '  "type": "string",\n',
    '  "pattern": "^[A-Z]{3}$",\n',
    '  "description": "C√≥digo IATA do aeroporto de origem",\n',
    '  "example": "JFK",\n',
    '  "required": true\n',
    "}\n",
    "```\n\n",
    "### 3. aeroporto_destino\n",
    "```json\n",
    "{\n",
    '  "type": "string",\n',
    '  "pattern": "^[A-Z]{3}$",\n',
    '  "description": "C√≥digo IATA do aeroporto de destino",\n',
    '  "example": "LAX",\n',
    '  "required": true\n',
    "}\n",
    "```\n\n",
    "### 4. data_hora_partida\n",
    "```json\n",
    "{\n",
    '  "type": "string",\n',
    '  "format": "date-time",\n',
    '  "description": "Data e hora da partida (ISO 8601)",\n',
    '  "example": "2024-01-15T14:30:00",\n',
    '  "required": true\n',
    "}\n",
    "```\n\n",
    "### 5. distancia_km\n",
    "```json\n",
    "{\n",
    '  "type": "number",\n',
    '  "minimum": 0,\n',
    '  "maximum": 5000,\n',
    '  "description": "Dist√¢ncia em quil√¥metros",\n',
    '  "example": 3980.0,\n',
    '  "required": true\n',
    "}\n",
    "```\n\n",
    "## üì§ ESPECIFICA√á√ÉO DE OUTPUT\n\n",
    "### Resposta de Sucesso (200)\n",
    "```json\n",
    "{\n",
    '  "status": "success",\n',
    '  "prediction": {\n',
    '    "atraso": true,\n',
    '    "probabilidade": 0.78\n',
    "  },\n",
    '  "business_metrics": {\n',
    '    "custo_evitado": 1250.50,\n',
    '    "confian√ßa": "ALTA"\n',
    "  },\n",
    '  "metadata": {\n',
    '    "model_version": "1.0.0",\n',
    '    "processing_time_ms": 185.5,\n',
    '    "timestamp": "2024-01-15T14:30:00.123Z"\n',
    "  }\n",
    "}\n",
    "```\n\n",
    "### Resposta de Erro (400)\n",
    "```json\n",
    "{\n",
    '  "status": "error",\n',
    '  "error": {\n',
    '    "code": "VALIDATION_ERROR",\n',
    '    "message": "Campo \'companhia_aerea\' inv√°lido",\n',
    '    "details": "Deve ser string de 2 caracteres"\n',
    "  },\n",
    '  "metadata": {\n',
    '    "timestamp": "2024-01-15T14:30:00.123Z"\n',
    "  }\n",
    "}\n",
    "```\n\n",
    "## ü§ñ ESCOLHA DO ALGORITMO\n\n",
    "### Decis√£o: LogisticRegression\n",
    "**Justificativa:**\n",
    "- Timeline de 48h exige simplicidade\n",
    "- MVP n√£o precisa de complexidade excessiva\n",
    "- Recall > 0.75 alcan√ß√°vel com class_weight='balanced'\n",
    "- Interpretabilidade crucial para neg√≥cio\n",
    "- Manuten√ß√£o f√°cil para equipe multidisciplinar\n",
    "- Integra√ß√£o r√°pida com backend Java\n\n",
    "### Par√¢metros do Modelo\n",
    "```python\n",
    "LogisticRegression(\n",
    "    penalty='l2',\n",
    "    C=1.0,\n",
    "    class_weight='balanced',  # Compensa desbalanceamento\n",
    "    solver='lbfgs',\n",
    "    max_iter=1000,\n",
    "    random_state=42\n",
    ")\n",
    "```\n\n",
    "### M√©trica Prim√°ria: RECALL\n",
    "- **Target:** > 0.75\n",
    "- **Justificativa:** Melhor prever falsos positivos que falsos negativos\n",
    "- **Impacto no neg√≥cio:**\n",
    "  - False Negative: ALTO (cliente perde voo)\n",
    "  - False Positive: M√âDIO (alerta desnecess√°rio)\n\n",
    "## üîå INTERFACE COM BACKEND JAVA\n\n",
    "### Tecnologia: FastAPI\n",
    "**Vantagens:**\n",
    "- Performance alta (async/await)\n",
    "- Documenta√ß√£o autom√°tica (OpenAPI/Swagger)\n",
    "- F√°cil integra√ß√£o com Java via HTTP\n",
    "- Type hints para valida√ß√£o\n\n",
    "### Endpoints Principais\n\n",
    "#### POST /predict\n",
    "- **Descri√ß√£o:** Endpoint principal para previs√£o\n",
    "- **Timeout:** 2s total\n",
    "- **Rate Limit:** 100 requests/minute\n",
    "- **Authentication:** API Key header\n\n",
    "#### GET /health\n",
    "- **Descri√ß√£o:** Health check do servi√ßo\n",
    "- **Timeout:** 100ms\n",
    "- **Frequ√™ncia:** A cada 30s pelo load balancer\n\n",
    "#### GET /metrics\n",
    "- **Descri√ß√£o:** M√©tricas de performance\n",
    "- **Authentication:** API Key header\n\n",
    "## üß™ ESTRAT√âGIA DE VALIDA√á√ÉO\n\n",
    "### Split de Dados\n",
    "- **M√©todo:** Stratified 70/30\n",
    "- **Justificativa:** Manter propor√ß√£o de classes\n",
    "- **Random State:** 42\n\n",
    "### M√©tricas M√≠nimas Aceit√°veis\n",
    "| M√©trica | M√≠nimo | Target | Stretch |\n",
    "|---------|--------|--------|---------|\n",
    "| **Recall** | 0.75 | 0.80 | 0.85 |\n",
    "| **Precision** | 0.60 | 0.65 | 0.70 |\n",
    "| **F1-Score** | 0.65 | 0.70 | 0.75 |\n",
    "| **ROC-AUC** | 0.70 | 0.75 | 0.80 |\n\n",
    "### Baseline (Modelo Dummy)\n",
    "- **Estrat√©gia:** stratified\n",
    "- **Recall esperado:** ~taxa da classe positiva\n",
    "- **Melhoria m√≠nima:** > 20% sobre baseline\n\n",
    "## üìä FLUXO DE DADOS\n\n",
    "```\n",
    "API Gateway (Java) ‚Üí FastAPI Endpoint\n",
    "FastAPI ‚Üí Input Validator (100ms)\n",
    "Input Validator ‚Üí Feature Transformer (50ms)\n",
    "Feature Transformer ‚Üí ML Model (100ms)\n",
    "ML Model ‚Üí Business Logic (50ms)\n",
    "Business Logic ‚Üí Output Formatter (50ms)\n",
    "Output Formatter ‚Üí FastAPI Response\n",
    "FastAPI Response ‚Üí API Gateway (Java)\n",
    "```\n\n",
    "### Performance Targets\n",
    "- **Lat√™ncia total (p95):** < 500ms\n",
    "- **Throughput:** > 100 req/sec\n",
    "- **Error rate:** < 1%\n",
    "- **Availability:** > 99.5%\n\n",
    "## üöÄ PR√ìXIMOS PASSOS\n\n",
    "### Fase 1: Desenvolvimento (24h)\n",
    "1. Implementar pipeline_template.py\n",
    "2. Desenvolver endpoints FastAPI\n",
    "3. Implementar valida√ß√£o e transforma√ß√£o\n",
    "4. Integrar modelo LogisticRegression\n\n",
    "### Fase 2: Integra√ß√£o (24h)\n",
    "1. Containerizar com Docker\n",
    "2. Configurar CI/CD pipeline\n",
    "3. Integrar com backend Java\n",
    "4. Realizar testes end-to-end\n\n",
    "### Fase 3: Valida√ß√£o (5h)\n",
    "1. Split 70/30 dos dados\n",
    "2. Treinar e avaliar modelo\n",
    "3. Comparar com baseline\n",
    "4. Gerar model card\n\n",
    f"---\n\n*Gerado em: {datetime.now().strftime('%d/%m/%Y %H:%M:%S')}*\n"
]

# Juntar todas as partes
ml_pipeline_design = ''.join(ml_pipeline_parts)

ml_pipeline_path = 'datascience/2_solution/architecture/ml_pipeline_design.md'
with open(ml_pipeline_path, 'w', encoding='utf-8') as f:
    f.write(ml_pipeline_design)

print(f"   ‚úÖ Design do pipeline salvo: {ml_pipeline_path}")

# 2. Salvar especifica√ß√£o da API em YAML
print("\nüìÑ 2. SALVANDO ESPECIFICA√á√ÉO DA API...")

api_spec_yaml = """openapi: 3.0.3
info:
  title: Flight Delay Prediction API
  description: API minimalista para previs√£o de atrasos de voos
  version: 1.0.0
  contact:
    name: Ananda Matos
    email: ananda.matos@company.com
  x-responsible: "@ananda.matos"

servers:
  - url: http://localhost:8000
    description: Local development
  - url: https://ml-service.company.com
    description: Production server

paths:
  /predict:
    post:
      summary: Prever atraso de voo
      description: |
        Recebe 5 inputs do usu√°rio e retorna previs√£o de atraso,
        probabilidade e m√©tricas de neg√≥cio.
        
        **Timeout m√°ximo:** 2 segundos
        **Rate limit:** 100 requests/minuto
      operationId: predictDelay
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/FlightRequest'
      responses:
        '200':
          description: Previs√£o bem-sucedida
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/PredictionResponse'
        '400':
          description: Erro de valida√ß√£o dos inputs
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '500':
          description: Erro interno do servidor
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '503':
          description: Servi√ßo indispon√≠vel
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
      security:
        - apiKey: []
      x-timeout: 2000
      x-rate-limit: 100

  /health:
    get:
      summary: Health check do servi√ßo
      description: |
        Verifica a sa√∫de do servi√ßo ML e seus componentes.
        
        **Timeout:** 100ms
        **Frequ√™ncia:** A cada 30s pelo load balancer
      operationId: healthCheck
      responses:
        '200':
          description: Servi√ßo saud√°vel
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthResponse'
        '503':
          description: Servi√ßo n√£o saud√°vel
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HealthResponse'
      x-timeout: 100

  /metrics:
    get:
      summary: M√©tricas de performance
      description: Retorna m√©tricas do modelo e performance do servi√ßo
      operationId: getMetrics
      security:
        - apiKey: []
      responses:
        '200':
          description: M√©tricas dispon√≠veis
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/MetricsResponse'

components:
  schemas:
    FlightRequest:
      type: object
      required:
        - companhia_aerea
        - aeroporto_origem
        - aeroporto_destino
        - data_hora_partida
        - distancia_km
      properties:
        companhia_aerea:
          type: string
          pattern: '^[A-Z0-9]{2}$'
          example: "AA"
        aeroporto_origem:
          type: string
          pattern: '^[A-Z]{3}$'
          example: "JFK"
        aeroporto_destino:
          type: string
          pattern: '^[A-Z]{3}$'
          example: "LAX"
        data_hora_partida:
          type: string
          format: date-time
          example: "2024-01-15T14:30:00"
        distancia_km:
          type: number
          minimum: 0
          maximum: 5000
          example: 3980.0

    PredictionResponse:
      type: object
      properties:
        status:
          type: string
          enum: [success, error]
        prediction:
          type: object
          properties:
            atraso:
              type: boolean
            probabilidade:
              type: number
              minimum: 0
              maximum: 1
        business_metrics:
          type: object
          properties:
            custo_evitado:
              type: number
            confian√ßa:
              type: string
              enum: [BAIXA, MEDIA, ALTA]
        metadata:
          type: object
          properties:
            model_version:
              type: string
            processing_time_ms:
              type: number
            timestamp:
              type: string
              format: date-time

    ErrorResponse:
      type: object
      properties:
        status:
          type: string
          enum: [error]
        error:
          type: object
          properties:
            code:
              type: string
            message:
              type: string
            details:
              type: string
        metadata:
          type: object
          properties:
            timestamp:
              type: string
              format: date-time

    HealthResponse:
      type: object
      properties:
        status:
          type: string
          enum: [healthy, unhealthy]
        timestamp:
          type: string
          format: date-time
        components:
          type: object
          properties:
            model:
              type: string
            database:
              type: string
            cache:
              type: string

  securitySchemes:
    apiKey:
      type: apiKey
      in: header
      name: X-API-Key
"""

api_spec_path = 'datascience/2_solution/contracts/api_spec.yaml'
with open(api_spec_path, 'w', encoding='utf-8') as f:
    f.write(api_spec_yaml)

print(f"   ‚úÖ Especifica√ß√£o da API salva: {api_spec_path}")

# 3. Salvar template do pipeline em Python
print("\nüíª 3. SALVANDO TEMPLATE DO PIPELINE...")

pipeline_template_code = '''"""
pipeline_template.py - Template do pipeline ML simplificado

Pipeline end-to-end para previs√£o de atrasos de voos com:
1. Valida√ß√£o de inputs
2. Transforma√ß√£o de features
3. Predi√ß√£o com LogisticRegression
4. Post-processing e formata√ß√£o
"""

import json
import time
import logging
from typing import Dict, Any, Optional, Tuple
from datetime import datetime

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

# Configura√ß√£o de logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class FlightDelayPipeline:
    """Pipeline simplificado para previs√£o de atrasos de voos"""
    
    def __init__(self, model_params: Optional[Dict] = None):
        """
        Inicializa o pipeline ML.
        
        Args:
            model_params: Par√¢metros para o LogisticRegression
        """
        self.model = None
        self.feature_transformer = None
        self.model_params = model_params or {
            'penalty': 'l2',
            'C': 1.0,
            'class_weight': 'balanced',
            'solver': 'lbfgs',
            'max_iter': 1000,
            'random_state': 42
        }
        
        logger.info("Pipeline inicializado")
    
    def validate_input(self, input_data: Dict[str, Any]) -> Tuple[bool, Optional[str]]:
        """
        Valida os 5 inputs do usu√°rio.
        
        Args:
            input_data: Dicion√°rio com os 5 campos de input
            
        Returns:
            Tuple (is_valid, error_message)
        """
        required_fields = [
            'companhia_aerea',
            'aeroporto_origem', 
            'aeroporto_destino',
            'data_hora_partida',
            'distancia_km'
        ]
        
        # Verificar campos obrigat√≥rios
        for field in required_fields:
            if field not in input_data:
                return False, f"Campo obrigat√≥rio faltando: {field}"
        
        # Validar companhia a√©rea
        airline = input_data['companhia_aerea']
        if not isinstance(airline, str) or len(airline) != 2:
            return False, "companhia_aerea deve ser string de 2 caracteres"
        
        # Validar aeroportos
        for field in ['aeroporto_origem', 'aeroporto_destino']:
            airport = input_data[field]
            if not isinstance(airport, str) or len(airport) != 3:
                return False, f"{field} deve ser string de 3 caracteres"
        
        # Validar data/hora
        try:
            pd.to_datetime(input_data['data_hora_partida'])
        except:
            return False, "data_hora_partida deve ser data/hora v√°lida"
        
        # Validar dist√¢ncia
        distance = input_data['distancia_km']
        if not isinstance(distance, (int, float)) or distance < 0 or distance > 5000:
            return False, "distancia_km deve ser n√∫mero entre 0 e 5000"
        
        return True, None
    
    def transform_features(self, input_data: Dict[str, Any]) -> np.ndarray:
        """
        Transforma inputs em features para o modelo.
        
        Args:
            input_data: Inputs validados do usu√°rio
            
        Returns:
            Array numpy com 7 features
        """
        try:
            from transform_simple import MVPTrafficFeatureTransformer
        except ImportError:
            logger.warning("Transformador n√£o encontrado, usando fallback")
            return self._fallback_transform(input_data)
        
        if self.feature_transformer is None:
            self.feature_transformer = MVPTrafficFeatureTransformer()
        
        features_dict = self.feature_transformer.transform_single(input_data)
        
        features_order = [
            'encoded_simple_airline',
            'encoded_route_pair',
            'hour_of_day',
            'day_of_week',
            'distance_km',
            'is_weekend'
        ]
        
        features_array = []
        for feature in features_order:
            if feature in features_dict:
                features_array.append(features_dict[feature])
        
        return np.array(features_array).reshape(1, -1)
    
    def _fallback_transform(self, input_data: Dict[str, Any]) -> np.ndarray:
        """Transforma√ß√£o fallback simples"""
        departure_time = pd.to_datetime(input_data['data_hora_partida'])
        hour_of_day = departure_time.hour
        day_of_week = departure_time.weekday()
        
        features = np.array([
            0,
            0,
            hour_of_day / 23.0,
            day_of_week / 6.0,
            min(input_data['distancia_km'] / 5000.0, 1.0),
            1 if day_of_week >= 5 else 0
        ]).reshape(1, -1)
        
        return features
    
    def train(self, X_train: np.ndarray, y_train: np.ndarray):
        """
        Treina o modelo LogisticRegression.
        
        Args:
            X_train: Features de treino
            y_train: Labels de treino
        """
        logger.info(f"Treinando modelo com {len(X_train)} amostras")
        
        self.model = LogisticRegression(**self.model_params)
        self.model.fit(X_train, y_train)
        
        logger.info("Modelo treinado")
    
    def predict(self, features: np.ndarray, threshold: float = 0.5) -> Tuple[bool, float]:
        """
        Faz previs√£o usando o modelo treinado.
        
        Args:
            features: Array com 7 features
            threshold: Limite para classifica√ß√£o
            
        Returns:
            Tuple (prediction, probability)
        """
        start_time = time.time()
        
        if self.model is None:
            logger.warning("Modelo n√£o treinado, usando fallback")
            return self._fallback_predict(features, threshold)
        
        try:
            probabilities = self.model.predict_proba(features)[0]
            positive_prob = probabilities[1]
            
            prediction = positive_prob >= threshold
            
            processing_time = (time.time() - start_time) * 1000
            logger.debug(f"Predi√ß√£o em {processing_time:.1f}ms")
            
            return prediction, positive_prob
            
        except Exception as e:
            logger.error(f"Erro na predi√ß√£o: {e}")
            return self._fallback_predict(features, threshold)
    
    def _fallback_predict(self, features: np.ndarray, threshold: float) -> Tuple[bool, float]:
        """Fallback prediction baseado em regras simples"""
        hour_of_day = features[0, 2] * 23
        is_evening_rush = 16 <= hour_of_day <= 20
        
        if is_evening_rush:
            return True, 0.65
        else:
            return False, 0.35
    
    def calculate_business_metrics(self, prediction: bool, probability: float) -> Dict[str, Any]:
        """
        Calcula m√©tricas de neg√≥cio baseadas na previs√£o.
        
        Args:
            prediction: Previs√£o do modelo
            probability: Probabilidade da previs√£o
            
        Returns:
            Dicion√°rio com m√©tricas de neg√≥cio
        """
        if prediction:
            base_cost = 500 * 150
            cost_avoided = base_cost * probability * 0.5
        else:
            cost_avoided = 0.0
        
        if probability >= 0.7:
            confidence = "ALTA"
        elif probability >= 0.4:
            confidence = "MEDIA"
        else:
            confidence = "BAIXA"
        
        return {
            "custo_evitado": round(cost_avoided, 2),
            "confian√ßa": confidence
        }
    
    def process_request(self, input_data: Dict[str, Any]) -> Dict[str, Any]:
        """
        Processa uma requisi√ß√£o completa end-to-end.
        
        Args:
            input_data: Inputs do usu√°rio em formato JSON/dict
            
        Returns:
            Resposta formatada para API
        """
        start_time = time.time()
        
        is_valid, error_msg = self.validate_input(input_data)
        if not is_valid:
            return {
                "status": "error",
                "error": {
                    "code": "VALIDATION_ERROR",
                    "message": error_msg
                },
                "metadata": {
                    "timestamp": datetime.now().isoformat()
                }
            }
        
        try:
            features = self.transform_features(input_data)
            prediction, probability = self.predict(features)
            business_metrics = self.calculate_business_metrics(prediction, probability)
            
            processing_time = (time.time() - start_time) * 1000
            
            response = {
                "status": "success",
                "prediction": {
                    "atraso": bool(prediction),
                    "probabilidade": round(probability, 3)
                },
                "business_metrics": business_metrics,
                "metadata": {
                    "model_version": "1.0.0",
                    "processing_time_ms": round(processing_time, 1),
                    "timestamp": datetime.now().isoformat()
                }
            }
            
            logger.info(f"Requisi√ß√£o processada em {processing_time:.1f}ms")
            return response
            
        except Exception as e:
            logger.error(f"Erro no processamento: {e}")
            return {
                "status": "error",
                "error": {
                    "code": "PROCESSING_ERROR",
                    "message": "Erro interno no processamento"
                },
                "metadata": {
                    "timestamp": datetime.now().isoformat()
                }
            }


if __name__ == "__main__":
    print("üöÄ TESTE DO PIPELINE ML SIMPLIFICADO")
    print("=" * 50)
    
    pipeline = FlightDelayPipeline()
    
    example_input = {
        "companhia_aerea": "AA",
        "aeroporto_origem": "JFK",
        "aeroporto_destino": "LAX",
        "data_hora_partida": "2024-01-15T14:30:00",
        "distancia_km": 3980.0
    }
    
    print("üì• Input do usu√°rio:")
    print(json.dumps(example_input, indent=2))
    print()
    
    response = pipeline.process_request(example_input)
    
    print("üì§ Resposta do pipeline:")
    print(json.dumps(response, indent=2))
    
    print("\n‚úÖ Pipeline testado com sucesso!")
'''

pipeline_template_path = 'datascience/2_solution/code/pipeline_template.py'
with open(pipeline_template_path, 'w', encoding='utf-8') as f:
    f.write(pipeline_template_code)

print(f"   ‚úÖ Template do pipeline salvo: {pipeline_template_path}")

# 4. Criar arquivo de configura√ß√£o
print("\n‚öôÔ∏è  4. CRIANDO ARQUIVO DE CONFIGURA√á√ÉO...")

config_content = f"""[PIPELINE]
name = "FlightDelayPredictionPipeline"
version = "1.0.0"
story_id = "2.1"
responsible = "@ananda.matos"
created_at = "{datetime.now().isoformat()}"
timeline_hours = 48

[MODEL]
algorithm = "LogisticRegression"
class_weight = "balanced"
target_metric = "recall"
target_value = 0.75
threshold = 0.5
fallback_enabled = true

[API]
framework = "FastAPI"
endpoints = "/predict, /health, /metrics"
timeout_ms = 2000
rate_limit = 100
auth_method = "api_key"

[VALIDATION]
split_method = "stratified_70_30"
random_state = 42
minimum_metrics = "recall>0.75, precision>0.60, f1>0.65"
baseline = "dummy_stratified"
improvement_target = "20%"

[PERFORMANCE]
total_latency_ms = 500
throughput_rps = 100
error_rate_percent = 1
availability_percent = 99.5

[INTEGRATION]
backend = "Java Spring Boot"
protocol = "HTTP/REST"
circuit_breaker = "enabled"
"""

config_path = 'datascience/2_solution/pipeline_config.ini'
with open(config_path, 'w', encoding='utf-8') as f:
    f.write(config_content)

print(f"   ‚úÖ Configura√ß√£o salva: {config_path}")

# ============================================================================
# CONCLUS√ÉO
# ============================================================================
print("\n" + "="*80)
print("‚úÖ STORY 2.1 COMPLETADA COM SUCESSO!")
print("="*80)

print(f"""
üì¶ ENTREG√ÅVEIS GERADOS:

üèóÔ∏è  ARQUITETURA:
   ‚Ä¢ ml_pipeline_design.md         - Design completo do pipeline end-to-end

üîå CONTRATOS:
   ‚Ä¢ api_spec.yaml                 - Especifica√ß√£o OpenAPI da interface

üíª C√ìDIGO:
   ‚Ä¢ pipeline_template.py          - Template implementado do pipeline

üìö DOCUMENTA√á√ÉO:
   ‚Ä¢ pipeline_config.ini           - Configura√ß√£o do pipeline

üéØ ARQUITETURA DEFINIDA:

1. üèóÔ∏è  PIPELINE END-TO-END (5 steps):
   ‚Ä¢ Input Validation (100ms)
   ‚Ä¢ Feature Transformation (50ms)  
   ‚Ä¢ Model Prediction (100ms) - LogisticRegression
   ‚Ä¢ Post-processing (50ms)
   ‚Ä¢ Output Formatting (50ms)

2. ‚öôÔ∏è  ALGORITMO BASE:
   ‚Ä¢ Escolha: LogisticRegression (simplicidade > complexidade)
   ‚Ä¢ Par√¢metros: class_weight='balanced', C=1.0, penalty='l2'
   ‚Ä¢ M√©trica prim√°ria: RECALL > 0.75
   ‚Ä¢ Fallback: Regra base em taxa hist√≥rica

3. üîå INTERFACE COM JAVA:
   ‚Ä¢ API: FastAPI com 3 endpoints (/predict, /health, /metrics)
   ‚Ä¢ Timeout: 2s m√°ximo
   ‚Ä¢ Autentica√ß√£o: API Key header
   ‚Ä¢ Integra√ß√£o: HTTP/REST com circuit breaker

4. üß™ VALIDA√á√ÉO R√ÅPIDA:
   ‚Ä¢ Split: 70/30 estratificado
   ‚Ä¢ M√©tricas m√≠nimas: RECALL > 0.75, PRECISION > 0.60
   ‚Ä¢ Baseline: Modelo dummy estratificado

üöÄ TIMELINE DE 48H:
   ‚Ä¢ Desenvolvimento: 24h (pipeline + API)
   ‚Ä¢ Integra√ß√£o: 24h (Docker + Java backend)
   ‚Ä¢ Valida√ß√£o: 5h (split + treino + testes)

‚ö° PERFORMANCE TARGETS:
   ‚Ä¢ Lat√™ncia total: < 500ms (p95)
   ‚Ä¢ Throughput: > 100 req/sec
   ‚Ä¢ Error rate: < 1%
   ‚Ä¢ Availability: > 99.5%

üîß PR√ìXIMOS PASSOS:
   1. Implementar pipeline_template.py com dados reais
   2. Desenvolver API FastAPI baseada no template
   3. Containerizar com Docker
   4. Integrar com backend Java
   5. Realizar valida√ß√£o final

‚úÖ STATUS: PRONTO PARA IMPLEMENTA√á√ÉO EM 48H
""")