# 🚀 Proyecto Integrador 1: Chatbot de Consulta de Datos con RAG

**Objetivo**: construir un chatbot empresarial que permita consultar datos usando lenguaje natural, combinando RAG (documentación) y NL2SQL (queries).

## Alcance del Proyecto

- **Duración**: 4-6 horas
- **Dificultad**: Alta
- **Stack**: OpenAI, ChromaDB, LangChain, SQLite/PostgreSQL, Streamlit

## Funcionalidades

1. ✅ Indexar documentación de esquemas y métricas
2. ✅ Responder preguntas sobre estructura de datos (RAG)
3. ✅ Ejecutar queries en lenguaje natural (NL2SQL)
4. ✅ Visualizar resultados en tablas/gráficos
5. ✅ Historial de conversación
6. ✅ Validación de seguridad

### 🎯 **Arquitectura del Data Chatbot: RAG + NL2SQL Híbrido**

**Componentes del Sistema:**

```
┌─────────────────────────────────────────────────────────────────┐
│                  DATA CHATBOT ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  USER INTERFACE (Streamlit)                                     │
│  ┌──────────────────────────────────────┐                      │
│  │ • Natural language input             │                      │
│  │ • Results display (tables, charts)   │                      │
│  │ • Conversation history               │                      │
│  │ • Export buttons (CSV, Excel)        │                      │
│  └──────────────────────────────────────┘                      │
│                    ↓                                             │
│  INTENT CLASSIFIER (GPT-3.5)                                    │
│  ┌──────────────────────────────────────┐                      │
│  │ Input: "What's our revenue by region?"                      │
│  │ Output: QUERY (needs SQL execution)  │                      │
│  │                                       │                      │
│  │ Input: "What columns does sales have?"                      │
│  │ Output: SCHEMA (use RAG)             │                      │
│  └──────────────────────────────────────┘                      │
│          ↓                      ↓                                │
│  ┌──────────────┐      ┌──────────────┐                        │
│  │  RAG PATH    │      │  SQL PATH    │                        │
│  └──────────────┘      └──────────────┘                        │
│          ↓                      ↓                                │
│  RETRIEVAL (ChromaDB)   NL2SQL (GPT-4)                         │
│  ┌───────────────────┐  ┌───────────────────┐                 │
│  │ • Vector search   │  │ • Schema context  │                 │
│  │ • Top-K docs      │  │ • SQL generation  │                 │
│  │ • Rerank (opt)    │  │ • Safety check    │                 │
│  └───────────────────┘  └───────────────────┘                 │
│          ↓                      ↓                                │
│  GENERATION (GPT-4)     EXECUTION (SQLite/Postgres)            │
│  ┌───────────────────┐  ┌───────────────────┐                 │
│  │ • Context + Q     │  │ • Execute query   │                 │
│  │ • Generate answer │  │ • Return DataFrame│                 │
│  └───────────────────┘  └───────────────────┘                 │
│          ↓                      ↓                                │
│  ┌──────────────────────────────────────────┐                 │
│  │       RESPONSE FORMATTING                 │                 │
│  │ • Text answer (RAG)                       │                 │
│  │ • Table + Chart (SQL)                     │                 │
│  │ • Error handling                          │                 │
│  └──────────────────────────────────────────┘                 │
│                    ↓                                             │
│  OBSERVABILITY & CACHING                                        │
│  ┌──────────────────────────────────────────┐                 │
│  │ • Query logs (audit trail)                │                 │
│  │ • Performance metrics                     │                 │
│  │ • Redis cache (frequent queries)          │                 │
│  │ • User feedback collection                │                 │
│  └──────────────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────────┘
```

**Design Decisions & Trade-offs:**

| Decision | Rationale | Trade-off |
|----------|-----------|-----------|
| **Intent Classification First** | Ruta óptima: RAG es 10x más rápido que SQL | Clasificación errónea → respuesta subóptima (mitigado con GPT-3.5 accuracy >95%) |
| **ChromaDB (embedded)** | Simple setup, no infra externa | Escala limitada (millones de docs → usar Pinecone/Weaviate) |
| **GPT-4 para NL2SQL** | Mejor accuracy en SQL complejo (JOINs, subqueries) | 10x más caro que GPT-3.5 (mitigado con caching) |
| **SQLite** | Zero-config, perfecto para demo | Producción → Postgres/MySQL con connection pooling |
| **Streamlit** | Prototipo rápido, sin frontend | Limitado concurrency (producción → FastAPI + React) |

**Intent Classification Strategy:**

```python
from typing import Literal
from pydantic import BaseModel

class Intent(BaseModel):
    """Clasificación de intención con confidence"""
    type: Literal['SCHEMA', 'QUERY', 'AGGREGATION', 'COMPARISON', 'TREND', 'AMBIGUOUS']
    confidence: float  # 0-1
    reasoning: str

def classify_intent_advanced(question: str, conversation_history: list = None) -> Intent:
    """
    Clasificador avanzado con contexto conversacional.
    
    Types:
    - SCHEMA: Preguntas sobre estructura ("¿Qué columnas...?", "¿Cómo se define...?")
    - QUERY: Consulta directa ("¿Cuánto...?", "¿Quién es...?")
    - AGGREGATION: Requiere GROUP BY ("Top 5", "Por categoría")
    - COMPARISON: Requiere múltiples queries ("Compara X vs Y")
    - TREND: Serie temporal ("Evolución", "Tendencia")
    - AMBIGUOUS: Necesita aclaración
    """
    
    # Contexto conversacional (últimas 3 interacciones)
    context = ""
    if conversation_history:
        recent = conversation_history[-3:]
        context = "\n".join([f"Q: {h['question']}\nA: {h['answer_type']}" for h in recent])
    
    prompt = f"""
    Clasifica esta pregunta considerando el historial de conversación.
    
    Historial reciente:
    {context or 'Primera pregunta'}
    
    Pregunta actual: "{question}"
    
    Clasifica en:
    - SCHEMA: Preguntas sobre definiciones, estructura de datos
    - QUERY: Consultas directas que requieren SQL
    - AGGREGATION: Análisis agregado (totales, promedios, top N)
    - COMPARISON: Comparaciones entre entidades
    - TREND: Análisis temporal (evolución, tendencias)
    - AMBIGUOUS: Pregunta poco clara, necesita aclaración
    
    Responde en JSON:
    {{
        "type": "...",
        "confidence": 0.95,
        "reasoning": "Breve explicación"
    }}
    """
    
    response = client.chat.completions.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0.1,
        response_format={"type": "json_object"}
    )
    
    return Intent(**json.loads(response.choices[0].message.content))

# Ejemplo
intent = classify_intent_advanced(
    "¿Cómo han evolucionado las ventas de Laptops en el último trimestre?",
    conversation_history=[
        {'question': '¿Qué productos vendemos?', 'answer_type': 'SCHEMA'}
    ]
)

print(f"Tipo: {intent.type}")
print(f"Confianza: {intent.confidence:.2%}")
print(f"Razonamiento: {intent.reasoning}")

# Output:
# Tipo: TREND
# Confianza: 92%
# Razonamiento: Pregunta sobre evolución temporal requiere serie temporal y filtro por producto
```

**Routing Logic (Production-Ready):**

```python
from dataclasses import dataclass
from typing import Optional, Dict, Any

@dataclass
class QueryPlan:
    """Plan de ejecución para la query"""
    route: Literal['RAG', 'SQL', 'HYBRID', 'CLARIFY']
    steps: list[str]
    estimated_cost_usd: float
    estimated_latency_sec: float

def create_query_plan(intent: Intent, question: str) -> QueryPlan:
    """
    Genera plan de ejecución óptimo basado en intención.
    
    Estrategias:
    - RAG: Rápido (0.5s), barato ($0.001), para preguntas conceptuales
    - SQL: Medio (2s), medio ($0.01), para queries directas
    - HYBRID: Lento (4s), caro ($0.02), para análisis complejos
    - CLARIFY: Pedir aclaración al usuario
    """
    
    if intent.type == 'SCHEMA':
        return QueryPlan(
            route='RAG',
            steps=['Retrieve docs from ChromaDB', 'Generate answer with GPT-4'],
            estimated_cost_usd=0.001,
            estimated_latency_sec=0.5
        )
    
    elif intent.type == 'QUERY' and intent.confidence > 0.85:
        return QueryPlan(
            route='SQL',
            steps=['Generate SQL', 'Validate safety', 'Execute', 'Format results'],
            estimated_cost_usd=0.01,
            estimated_latency_sec=2.0
        )
    
    elif intent.type in ['AGGREGATION', 'COMPARISON', 'TREND']:
        return QueryPlan(
            route='HYBRID',
            steps=[
                'Retrieve schema docs (RAG)',
                'Generate complex SQL with context',
                'Execute and analyze results',
                'Generate insights with LLM'
            ],
            estimated_cost_usd=0.02,
            estimated_latency_sec=4.0
        )
    
    else:  # AMBIGUOUS or low confidence
        return QueryPlan(
            route='CLARIFY',
            steps=['Ask user for clarification'],
            estimated_cost_usd=0.0,
            estimated_latency_sec=0.1
        )

def execute_plan(plan: QueryPlan, question: str, context: Dict[str, Any]) -> Dict:
    """Ejecuta el plan según la ruta"""
    
    if plan.route == 'RAG':
        return execute_rag(question)
    
    elif plan.route == 'SQL':
        return execute_sql(question, context)
    
    elif plan.route == 'HYBRID':
        # Paso 1: RAG para contexto
        schema_context = rag_search(question, top_k=3)
        
        # Paso 2: SQL mejorado con contexto
        sql_result = execute_sql_with_context(question, schema_context)
        
        # Paso 3: LLM analiza resultados
        insights = generate_insights(sql_result, question)
        
        return {
            'type': 'HYBRID',
            'data': sql_result['data'],
            'sql': sql_result['sql'],
            'insights': insights,
            'plan': plan
        }
    
    else:  # CLARIFY
        return {
            'type': 'CLARIFY',
            'message': generate_clarification_question(question, context)
        }

# Ejemplo de flujo completo
question = "Compare sales performance between regions"
intent = classify_intent_advanced(question)
plan = create_query_plan(intent, question)
result = execute_plan(plan, question, context={'user_id': 'analyst_001'})

print(f"Plan: {plan.route}")
print(f"Cost: ${plan.estimated_cost_usd:.4f}")
print(f"Latency: {plan.estimated_latency_sec:.1f}s")
```

**Error Recovery Patterns:**

```python
class ChatbotError(Exception):
    """Base exception para errores del chatbot"""
    pass

class SQLGenerationError(ChatbotError):
    """Error en generación de SQL"""
    pass

class ExecutionError(ChatbotError):
    """Error en ejecución de query"""
    pass

def chatbot_with_recovery(question: str, max_retries: int = 2) -> Dict:
    """
    Chatbot con recuperación automática de errores.
    
    Estrategias:
    1. SQL inválido → Regenerar con error message
    2. Timeout → Sugerir query más simple
    3. No results → Sugerir alternativas
    """
    
    for attempt in range(max_retries + 1):
        try:
            # Intent classification
            intent = classify_intent_advanced(question)
            
            if intent.confidence < 0.7:
                # Low confidence → Ask for clarification
                return {
                    'type': 'CLARIFY',
                    'message': f"No estoy seguro de entender. ¿Quieres decir...?",
                    'suggestions': generate_suggestions(question)
                }
            
            # Create and execute plan
            plan = create_query_plan(intent, question)
            result = execute_plan(plan, question, {})
            
            # Success
            return result
        
        except SQLGenerationError as e:
            if attempt < max_retries:
                # Regenerar con más contexto
                question_improved = f"{question}. Error anterior: {str(e)}. Por favor genera SQL válido."
                continue
            else:
                return {
                    'type': 'ERROR',
                    'message': 'No pude generar una consulta válida. ¿Puedes reformular la pregunta?'
                }
        
        except ExecutionError as e:
            if 'timeout' in str(e).lower():
                return {
                    'type': 'ERROR',
                    'message': 'La consulta tomó demasiado tiempo. Intenta con un rango de fechas más pequeño.',
                    'suggestion': 'Por ejemplo: "Ventas del último mes"'
                }
            elif 'no results' in str(e).lower():
                return {
                    'type': 'INFO',
                    'message': 'No encontré resultados para esta consulta.',
                    'suggestions': [
                        'Intenta expandir el rango de fechas',
                        'Verifica los filtros aplicados',
                        'Revisa la ortografía de nombres'
                    ]
                }
        
        except Exception as e:
            # Unexpected error → Log and fallback to safe response
            logger.error(f"Unexpected error: {str(e)}", exc_info=True)
            return {
                'type': 'ERROR',
                'message': 'Ocurrió un error inesperado. Por favor intenta nuevamente.',
                'error_id': str(uuid.uuid4())  # For support tracking
            }
    
    # Max retries exceeded
    return {
        'type': 'ERROR',
        'message': 'No pude procesar tu pregunta después de varios intentos.'
    }
```

**Conversational Context Management:**

```python
from collections import deque

class ConversationManager:
    """Maneja contexto conversacional para follow-up questions"""
    
    def __init__(self, max_history: int = 10):
        self.history = deque(maxlen=max_history)
        self.last_sql = None
        self.last_dataframe = None
    
    def add_interaction(self, question: str, response: Dict):
        """Añade interacción al historial"""
        self.history.append({
            'timestamp': datetime.now(),
            'question': question,
            'response_type': response['type'],
            'summary': self._summarize_response(response)
        })
        
        # Guardar último SQL/DataFrame para follow-ups
        if response['type'] == 'SQL':
            self.last_sql = response.get('sql')
            self.last_dataframe = response.get('data')
    
    def resolve_references(self, question: str) -> str:
        """
        Resuelve referencias pronominales (it, that, those, etc.)
        
        Ejemplos:
        Q1: "What are total sales by region?"
        Q2: "Show me the top 3" → "Show me the top 3 regions by total sales"
        """
        
        pronouns = ['it', 'that', 'those', 'them', 'this', 'these']
        
        if any(p in question.lower() for p in pronouns) and self.history:
            last_q = self.history[-1]['question']
            
            prompt = f"""
            Resuelve la referencia en esta pregunta usando el contexto.
            
            Pregunta anterior: "{last_q}"
            Pregunta actual: "{question}"
            
            Reescribe la pregunta actual sin pronombres, haciendo explícito a qué se refiere.
            
            Pregunta resuelta:
            """
            
            response = client.chat.completions.create(
                model='gpt-3.5-turbo',
                messages=[{'role': 'user', 'content': prompt}],
                temperature=0
            )
            
            return response.choices[0].message.content.strip()
        
        return question
    
    def suggest_follow_ups(self) -> list[str]:
        """Genera sugerencias de preguntas de seguimiento"""
        
        if not self.last_dataframe or len(self.last_dataframe) == 0:
            return []
        
        suggestions = []
        
        # Análisis del DataFrame
        df = self.last_dataframe
        numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
        
        if numeric_cols:
            suggestions.append(f"Muéstrame un gráfico de {numeric_cols[0]}")
            suggestions.append(f"¿Cuál es el promedio de {numeric_cols[0]}?")
        
        if 'fecha' in df.columns or 'date' in df.columns:
            suggestions.append("¿Cuál es la tendencia en el tiempo?")
        
        if len(df) > 5:
            suggestions.append("Muéstrame solo los top 5")
        
        return suggestions[:3]  # Max 3 sugerencias

# Uso
conv = ConversationManager()

q1 = "What are total sales by category?"
r1 = chatbot_with_recovery(q1)
conv.add_interaction(q1, r1)

q2 = "Show me the top 3"  # Referencia ambigua
q2_resolved = conv.resolve_references(q2)  # → "Show me the top 3 categories by total sales"
r2 = chatbot_with_recovery(q2_resolved)
conv.add_interaction(q2_resolved, r2)

# Sugerencias
suggestions = conv.suggest_follow_ups()
print("Preguntas sugeridas:")
for s in suggestions:
    print(f"  • {s}")
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

## Parte 1: Setup del proyecto

In [None]:
# pip install openai chromadb langchain streamlit pandas plotly sqlalchemy
import os
import sqlite3
import pandas as pd
from openai import OpenAI
import chromadb
from chromadb.config import Settings

# Configuración
os.environ['OPENAI_API_KEY'] = 'tu-api-key'  # Reemplazar
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

print('✅ Setup completo')

## Parte 2: Base de datos de ejemplo

In [None]:
# Crear BD SQLite con datos de ejemplo
conn = sqlite3.connect('empresa.db')
cursor = conn.cursor()

# Tabla ventas
cursor.execute('''
CREATE TABLE IF NOT EXISTS ventas (
    venta_id INTEGER PRIMARY KEY,
    fecha DATE,
    producto TEXT,
    categoria TEXT,
    cantidad INTEGER,
    precio_unitario REAL,
    total REAL,
    region TEXT
)
''')

# Datos de ejemplo
ventas_data = [
    (1, '2024-01-15', 'Laptop Pro', 'Electrónica', 2, 1200, 2400, 'Norte'),
    (2, '2024-01-16', 'Mouse Wireless', 'Accesorios', 10, 25, 250, 'Sur'),
    (3, '2024-01-17', 'Teclado Mecánico', 'Accesorios', 5, 80, 400, 'Norte'),
    (4, '2024-01-18', 'Monitor 27"', 'Electrónica', 3, 350, 1050, 'Este'),
    (5, '2024-01-19', 'Laptop Pro', 'Electrónica', 1, 1200, 1200, 'Oeste'),
    (6, '2024-01-20', 'Webcam HD', 'Accesorios', 8, 60, 480, 'Norte')
]

cursor.executemany('INSERT OR REPLACE INTO ventas VALUES (?,?,?,?,?,?,?,?)', ventas_data)
conn.commit()

# Verificar
df_ventas = pd.read_sql_query('SELECT * FROM ventas LIMIT 3', conn)
print('✅ Base de datos creada\n')
print(df_ventas)

## Parte 3: Indexar documentación (RAG)

In [None]:
# ChromaDB para RAG
chroma_client = chromadb.PersistentClient(path='./chatbot_db')
collection = chroma_client.get_or_create_collection(name='data_docs')

def get_embedding(text: str):
    resp = client.embeddings.create(
        model='text-embedding-ada-002',
        input=text
    )
    return resp.data[0].embedding

# Documentación a indexar
docs = [
    {
        'id': 'tabla_ventas',
        'text': '''
Tabla: ventas
Descripción: Registro de todas las transacciones de venta de productos.
Columnas:
- venta_id: ID único de la venta
- fecha: Fecha de la transacción
- producto: Nombre del producto vendido
- categoria: Categoría (Electrónica, Accesorios)
- cantidad: Unidades vendidas
- precio_unitario: Precio por unidad en USD
- total: Monto total (cantidad * precio_unitario)
- region: Región geográfica (Norte, Sur, Este, Oeste)
        '''
    },
    {
        'id': 'metrica_revenue',
        'text': '''
Métrica: Revenue Total
Definición: Suma del campo 'total' de todas las ventas.
Cálculo: SELECT SUM(total) FROM ventas
Uso: Dashboard ejecutivo, reportes mensuales
        '''
    },
    {
        'id': 'producto_top',
        'text': '''
Análisis: Top Productos
Query: SELECT producto, SUM(cantidad) as unidades, SUM(total) as revenue 
       FROM ventas GROUP BY producto ORDER BY revenue DESC
Insight: Identifica productos más rentables
        '''
    }
]

# Indexar
for doc in docs:
    collection.add(
        ids=[doc['id']],
        documents=[doc['text']],
        embeddings=[get_embedding(doc['text'])]
    )

print(f'✅ {len(docs)} documentos indexados')

## Parte 4: Sistema RAG

### 🔐 **Security & Governance: Acceso Seguro a Datos**

**Security Layers:**

```python
┌──────────────────────────────────────────────────────────────┐
│                  SECURITY ARCHITECTURE                        │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  1️⃣ AUTHENTICATION (WHO are you?)                            │
│     • JWT tokens                                             │
│     • OAuth2 (SSO)                                           │
│     • API keys (machine-to-machine)                          │
│                                                               │
│  2️⃣ AUTHORIZATION (WHAT can you access?)                     │
│     • Role-Based Access Control (RBAC)                       │
│     • Row-Level Security (RLS)                               │
│     • Column masking (PII)                                   │
│                                                               │
│  3️⃣ SQL INJECTION PREVENTION                                 │
│     • Whitelist validation                                   │
│     • Parameterized queries                                  │
│     • Read-only user                                         │
│                                                               │
│  4️⃣ PROMPT INJECTION PREVENTION                              │
│     • System message isolation                               │
│     • Output validation                                      │
│     • Jailbreak detection                                    │
│                                                               │
│  5️⃣ AUDIT & MONITORING                                       │
│     • Query logging                                          │
│     • Sensitive data access tracking                         │
│     • Anomaly detection                                      │
└──────────────────────────────────────────────────────────────┘
```

**SQL Injection Prevention (Defense in Depth):**

```python
from typing import Set
import sqlparse
from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML

class SQLValidator:
    """Valida seguridad de SQL generado por LLM"""
    
    # Whitelist de operaciones permitidas
    ALLOWED_KEYWORDS = {
        'SELECT', 'FROM', 'WHERE', 'GROUP BY', 'ORDER BY', 'LIMIT',
        'HAVING', 'AS', 'JOIN', 'INNER JOIN', 'LEFT JOIN', 'ON', 'AND', 'OR'
    }
    
    # Blacklist de operaciones peligrosas
    DANGEROUS_KEYWORDS = {
        'INSERT', 'UPDATE', 'DELETE', 'DROP', 'ALTER', 'CREATE', 'TRUNCATE',
        'EXEC', 'EXECUTE', 'GRANT', 'REVOKE', 'SCRIPT', 'xp_', 'sp_'
    }
    
    def __init__(self, allowed_tables: Set[str], allowed_columns: Dict[str, Set[str]]):
        self.allowed_tables = allowed_tables
        self.allowed_columns = allowed_columns  # {table: {col1, col2, ...}}
    
    def validate(self, sql: str) -> tuple[bool, Optional[str]]:
        """
        Valida SQL en múltiples niveles.
        
        Returns:
            (is_valid, error_message)
        """
        
        # Level 1: Blacklist check (fast)
        sql_upper = sql.upper()
        for keyword in self.DANGEROUS_KEYWORDS:
            if keyword in sql_upper:
                return False, f"Dangerous keyword detected: {keyword}"
        
        # Level 2: Whitelist check
        parsed = sqlparse.parse(sql)[0]
        for token in parsed.tokens:
            if token.ttype is Keyword or token.ttype is DML:
                keyword = token.value.upper()
                if keyword not in self.ALLOWED_KEYWORDS:
                    return False, f"Keyword not allowed: {keyword}"
        
        # Level 3: Table validation
        tables_used = self._extract_tables(parsed)
        for table in tables_used:
            if table not in self.allowed_tables:
                return False, f"Table not allowed: {table}"
        
        # Level 4: Column validation (optional, más estricto)
        # columns_used = self._extract_columns(parsed)
        # for table, columns in columns_used.items():
        #     for col in columns:
        #         if col not in self.allowed_columns.get(table, set()):
        #             return False, f"Column not allowed: {table}.{col}"
        
        # Level 5: Complexity check (prevent expensive queries)
        if sql_upper.count('JOIN') > 3:
            return False, "Too many JOINs (max 3)"
        
        if 'UNION' in sql_upper and sql_upper.count('UNION') > 2:
            return False, "Too many UNIONs (max 2)"
        
        # Level 6: Injection patterns
        injection_patterns = [
            r";\s*DROP",
            r"OR\s+1\s*=\s*1",
            r"--\s*$",
            r"\/\*.*\*\/",
            r"xp_cmdshell",
            r"WAITFOR\s+DELAY"
        ]
        
        import re
        for pattern in injection_patterns:
            if re.search(pattern, sql, re.IGNORECASE):
                return False, f"Injection pattern detected: {pattern}"
        
        return True, None
    
    def _extract_tables(self, parsed) -> Set[str]:
        """Extrae nombres de tablas del SQL"""
        tables = set()
        
        from_seen = False
        for token in parsed.tokens:
            if from_seen:
                if isinstance(token, IdentifierList):
                    for identifier in token.get_identifiers():
                        tables.add(identifier.get_real_name())
                elif isinstance(token, Identifier):
                    tables.add(token.get_real_name())
                from_seen = False
            
            if token.ttype is Keyword and token.value.upper() == 'FROM':
                from_seen = True
        
        return tables

# Uso
validator = SQLValidator(
    allowed_tables={'ventas', 'productos', 'clientes'},
    allowed_columns={
        'ventas': {'venta_id', 'fecha', 'producto', 'total', 'region'},
        'clientes': {'cliente_id', 'nombre', 'email'}  # Note: no SSN, credit_card, etc
    }
)

# Test cases
test_queries = [
    "SELECT * FROM ventas WHERE region='Norte'",  # ✅ Valid
    "SELECT * FROM ventas; DROP TABLE ventas;",   # ❌ SQL injection
    "SELECT * FROM usuarios WHERE 1=1 OR 1=1",     # ❌ Injection pattern
    "SELECT * FROM ventas JOIN productos ON ventas.producto=productos.producto_id",  # ✅ Valid JOIN
]

for sql in test_queries:
    is_valid, error = validator.validate(sql)
    print(f"{'✅' if is_valid else '❌'} {sql[:50]}...")
    if error:
        print(f"   Error: {error}")
```

**Prompt Injection Prevention:**

```python
class PromptInjectionDetector:
    """Detecta intentos de prompt injection"""
    
    # Patrones comunes de jailbreak
    JAILBREAK_PATTERNS = [
        r"ignore (previous|above) instructions",
        r"disregard (previous|above|all)",
        r"forget (what|everything) (you|i) (told|said)",
        r"you are now",
        r"pretend (you|to) (are|be)",
        r"roleplay",
        r"DAN mode",
        r"developer mode",
        r"system: ",
        r"<\|im_start\|>",
        r"PWNED",
    ]
    
    def detect(self, user_input: str) -> tuple[bool, Optional[str]]:
        """
        Detecta prompt injection.
        
        Returns:
            (is_safe, detected_pattern)
        """
        
        input_lower = user_input.lower()
        
        # Check jailbreak patterns
        for pattern in self.JAILBREAK_PATTERNS:
            if re.search(pattern, input_lower):
                return False, pattern
        
        # Check for system role impersonation
        if user_input.strip().startswith(('system:', 'assistant:', 'user:')):
            return False, "Role impersonation"
        
        # Check for excessive prompt tokens (> 1000 words might be malicious)
        if len(user_input.split()) > 1000:
            return False, "Excessive input length"
        
        return True, None
    
    def sanitize(self, user_input: str) -> str:
        """Sanitiza input del usuario"""
        
        # Remove potential role markers
        sanitized = re.sub(r'^(system|assistant|user):\s*', '', user_input, flags=re.IGNORECASE)
        
        # Escape special tokens (if using specific LLM)
        sanitized = sanitized.replace('<|im_start|>', '').replace('<|im_end|>', '')
        
        # Trim excessive whitespace
        sanitized = ' '.join(sanitized.split())
        
        return sanitized

def secure_chatbot(user_input: str, user_context: Dict) -> Dict:
    """Chatbot con validación de seguridad"""
    
    # 1. Prompt injection detection
    detector = PromptInjectionDetector()
    is_safe, pattern = detector.detect(user_input)
    
    if not is_safe:
        logger.warning(f"Prompt injection detected: {pattern} from user {user_context['user_id']}")
        return {
            'type': 'ERROR',
            'message': 'Input validation failed. Please rephrase your question.'
        }
    
    # 2. Sanitize input
    sanitized_input = detector.sanitize(user_input)
    
    # 3. Classify intent
    intent = classify_intent_advanced(sanitized_input)
    
    # 4. If SQL route, validate generated SQL
    if intent.type in ['QUERY', 'AGGREGATION']:
        sql = nl_to_sql(sanitized_input)
        
        validator = SQLValidator(
            allowed_tables=user_context.get('allowed_tables', {'ventas'}),
            allowed_columns=user_context.get('allowed_columns', {})
        )
        
        is_valid, error = validator.validate(sql)
        
        if not is_valid:
            logger.error(f"Invalid SQL generated: {error}")
            return {
                'type': 'ERROR',
                'message': 'Query validation failed. Please try a different question.'
            }
        
        # Execute with read-only user
        result = execute_sql_readonly(sql, user_context)
        
        return result
    
    else:
        # RAG path (also validate output)
        answer = rag_answer(sanitized_input)
        
        # Check if answer leaks sensitive info
        if contains_pii(answer):
            answer = mask_pii(answer)
        
        return {'type': 'RAG', 'answer': answer}
```

**Role-Based Access Control (RBAC):**

```python
from enum import Enum
from functools import wraps

class Role(Enum):
    VIEWER = 'viewer'       # Read-only, limited tables
    ANALYST = 'analyst'     # Read all tables, no PII
    ADMIN = 'admin'         # Full access
    DATA_SCIENTIST = 'data_scientist'  # Read all + export

class Permission(Enum):
    READ_SALES = 'read:sales'
    READ_CUSTOMERS = 'read:customers'
    READ_PII = 'read:pii'
    EXPORT_DATA = 'export:data'
    EXECUTE_JOINS = 'execute:joins'

# Role → Permissions mapping
ROLE_PERMISSIONS = {
    Role.VIEWER: {
        Permission.READ_SALES,
    },
    Role.ANALYST: {
        Permission.READ_SALES,
        Permission.READ_CUSTOMERS,
        Permission.EXECUTE_JOINS,
    },
    Role.DATA_SCIENTIST: {
        Permission.READ_SALES,
        Permission.READ_CUSTOMERS,
        Permission.EXECUTE_JOINS,
        Permission.EXPORT_DATA,
    },
    Role.ADMIN: {
        Permission.READ_SALES,
        Permission.READ_CUSTOMERS,
        Permission.READ_PII,
        Permission.EXECUTE_JOINS,
        Permission.EXPORT_DATA,
    }
}

class AccessControlManager:
    """Gestiona acceso basado en roles"""
    
    def __init__(self, user_id: str, role: Role):
        self.user_id = user_id
        self.role = role
        self.permissions = ROLE_PERMISSIONS[role]
    
    def can(self, permission: Permission) -> bool:
        """Verifica si el usuario tiene el permiso"""
        return permission in self.permissions
    
    def get_allowed_tables(self) -> Set[str]:
        """Retorna tablas permitidas según rol"""
        tables = set()
        
        if Permission.READ_SALES in self.permissions:
            tables.add('ventas')
            tables.add('productos')
        
        if Permission.READ_CUSTOMERS in self.permissions:
            tables.add('clientes')
        
        # PII tables solo para admin
        if Permission.READ_PII in self.permissions:
            tables.add('empleados_salarios')
            tables.add('clientes_detalles')
        
        return tables
    
    def apply_row_level_security(self, sql: str) -> str:
        """
        Aplica Row-Level Security (RLS) al SQL.
        
        Ejemplo: Si user_role=VIEWER y region='Norte',
                 solo puede ver ventas de región Norte.
        """
        
        # Viewer: solo su región
        if self.role == Role.VIEWER:
            user_region = self._get_user_region(self.user_id)
            if 'WHERE' in sql.upper():
                sql = sql.replace('WHERE', f"WHERE region='{user_region}' AND", 1)
            else:
                sql += f" WHERE region='{user_region}'"
        
        return sql
    
    def mask_columns(self, df: pd.DataFrame, table: str) -> pd.DataFrame:
        """Enmascara columnas sensibles según permisos"""
        
        # Si no tiene permiso READ_PII, enmascarar datos sensibles
        if Permission.READ_PII not in self.permissions:
            pii_columns = {
                'clientes': ['email', 'telefono', 'direccion'],
                'empleados': ['ssn', 'salario', 'cuenta_bancaria']
            }
            
            for col in pii_columns.get(table, []):
                if col in df.columns:
                    df[col] = df[col].apply(lambda x: '***MASKED***')
        
        return df
    
    def _get_user_region(self, user_id: str) -> str:
        """Obtiene región del usuario (desde DB o cache)"""
        # Placeholder - en producción consultar DB
        return 'Norte'

def require_permission(permission: Permission):
    """Decorator para proteger endpoints"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Obtener ACL del contexto (inyectado por FastAPI Depends)
            acl = kwargs.get('acl')
            if not acl or not acl.can(permission):
                raise HTTPException(403, f"Permission denied: {permission.value}")
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Uso en chatbot
def secure_query_execution(question: str, user_id: str, role: Role) -> Dict:
    """Ejecuta query con control de acceso"""
    
    # Setup ACL
    acl = AccessControlManager(user_id, role)
    
    # Generate SQL
    sql = nl_to_sql(question)
    
    # Validate tables
    validator = SQLValidator(
        allowed_tables=acl.get_allowed_tables(),
        allowed_columns={}  # Configure based on role
    )
    
    is_valid, error = validator.validate(sql)
    if not is_valid:
        return {'type': 'ERROR', 'message': f'Access denied: {error}'}
    
    # Apply RLS
    sql_with_rls = acl.apply_row_level_security(sql)
    
    # Execute
    df = pd.read_sql_query(sql_with_rls, conn)
    
    # Mask PII
    df_masked = acl.mask_columns(df, table='clientes')  # Detect table from SQL
    
    # Check export permission
    can_export = acl.can(Permission.EXPORT_DATA)
    
    return {
        'type': 'SQL',
        'sql': sql_with_rls,
        'data': df_masked,
        'can_export': can_export
    }

# Ejemplo
result = secure_query_execution(
    question="Show me all customers",
    user_id='analyst_001',
    role=Role.ANALYST
)

# ANALYST puede ver clientes, pero email/teléfono están enmascarados
print(result['data'])
#    cliente_id        nombre              email        telefono
# 0           1    Empresa ABC  ***MASKED***   ***MASKED***
```

**Audit Logging:**

```python
from datetime import datetime
import json

class AuditLogger:
    """Logs de auditoría para compliance"""
    
    def __init__(self, db_connection):
        self.db = db_connection
        self._create_audit_table()
    
    def _create_audit_table(self):
        """Crea tabla de auditoría"""
        self.db.execute('''
            CREATE TABLE IF NOT EXISTS audit_log (
                log_id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp TEXT,
                user_id TEXT,
                user_role TEXT,
                action TEXT,
                question TEXT,
                sql_generated TEXT,
                tables_accessed TEXT,
                rows_returned INTEGER,
                execution_time_ms INTEGER,
                success BOOLEAN,
                error_message TEXT,
                ip_address TEXT,
                session_id TEXT
            )
        ''')
    
    def log_query(
        self,
        user_id: str,
        user_role: str,
        question: str,
        sql: Optional[str],
        tables: Set[str],
        rows_returned: int,
        execution_time_ms: int,
        success: bool,
        error: Optional[str] = None,
        ip_address: Optional[str] = None,
        session_id: Optional[str] = None
    ):
        """Registra ejecución de query"""
        
        self.db.execute('''
            INSERT INTO audit_log VALUES (
                NULL, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?
            )
        ''', (
            datetime.now().isoformat(),
            user_id,
            user_role,
            'QUERY',
            question,
            sql,
            json.dumps(list(tables)),
            rows_returned,
            execution_time_ms,
            success,
            error,
            ip_address,
            session_id
        ))
        
        self.db.commit()
    
    def get_user_activity(self, user_id: str, days: int = 7) -> pd.DataFrame:
        """Obtiene actividad de un usuario"""
        return pd.read_sql_query(f'''
            SELECT 
                timestamp,
                question,
                tables_accessed,
                rows_returned,
                success
            FROM audit_log
            WHERE user_id = ?
              AND timestamp >= datetime('now', '-{days} days')
            ORDER BY timestamp DESC
        ''', self.db, params=(user_id,))
    
    def detect_anomalies(self) -> pd.DataFrame:
        """Detecta actividad sospechosa"""
        return pd.read_sql_query('''
            SELECT 
                user_id,
                COUNT(*) as query_count,
                SUM(rows_returned) as total_rows,
                COUNT(DISTINCT tables_accessed) as unique_tables
            FROM audit_log
            WHERE timestamp >= datetime('now', '-1 hour')
            GROUP BY user_id
            HAVING query_count > 100  -- Anomaly threshold
               OR total_rows > 1000000  -- Data exfiltration?
        ''', self.db)

# Uso integrado
audit = AuditLogger(conn)

def chatbot_with_audit(question: str, user_context: Dict) -> Dict:
    """Chatbot con auditoría completa"""
    
    import time
    start = time.time()
    
    try:
        result = secure_query_execution(
            question=question,
            user_id=user_context['user_id'],
            role=user_context['role']
        )
        
        execution_time = int((time.time() - start) * 1000)
        
        # Log successful execution
        audit.log_query(
            user_id=user_context['user_id'],
            user_role=user_context['role'].value,
            question=question,
            sql=result.get('sql'),
            tables={'ventas'},  # Extract from SQL in production
            rows_returned=len(result.get('data', [])),
            execution_time_ms=execution_time,
            success=True,
            ip_address=user_context.get('ip'),
            session_id=user_context.get('session_id')
        )
        
        return result
    
    except Exception as e:
        execution_time = int((time.time() - start) * 1000)
        
        # Log error
        audit.log_query(
            user_id=user_context['user_id'],
            user_role=user_context['role'].value,
            question=question,
            sql=None,
            tables=set(),
            rows_returned=0,
            execution_time_ms=execution_time,
            success=False,
            error=str(e)
        )
        
        raise

# Monitoring dashboard
anomalies = audit.detect_anomalies()
if not anomalies.empty:
    print("⚠️  Suspicious activity detected:")
    print(anomalies)
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

In [None]:
def rag_search(question: str, top_k: int = 2):
    """Busca contexto relevante."""
    query_emb = get_embedding(question)
    results = collection.query(
        query_embeddings=[query_emb],
        n_results=top_k
    )
    return '\n\n'.join(results['documents'][0])

def rag_answer(question: str):
    """Responde usando RAG."""
    context = rag_search(question)
    
    prompt = f'''
Eres un asistente de datos experto. Responde basándote SOLO en el contexto.

Contexto:
{context}

Pregunta: {question}

Respuesta:
'''
    
    resp = client.chat.completions.create(
        model='gpt-4',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0.1
    )
    
    return resp.choices[0].message.content.strip()

# Test
q1 = '¿Qué columnas tiene la tabla ventas?'
print(f'❓ {q1}')
print(f'✅ {rag_answer(q1)}\n')

q2 = '¿Cómo se calcula el revenue total?'
print(f'❓ {q2}')
print(f'✅ {rag_answer(q2)}')

## Parte 5: Sistema NL2SQL

In [None]:
def get_schema():
    """Obtiene esquema de la BD."""
    return '''
Tabla: ventas
Columnas:
- venta_id INTEGER
- fecha DATE
- producto TEXT
- categoria TEXT
- cantidad INTEGER
- precio_unitario REAL
- total REAL
- region TEXT
'''

def is_safe_query(query: str):
    """Valida seguridad."""
    dangerous = ['INSERT', 'UPDATE', 'DELETE', 'DROP', 'ALTER', 'CREATE']
    return not any(kw in query.upper() for kw in dangerous)

def nl_to_sql(question: str):
    """Convierte pregunta a SQL."""
    schema = get_schema()
    
    prompt = f'''
Esquema de base de datos:
{schema}

Convierte esta pregunta a SQL (SQLite):
{question}

Reglas:
- Solo SELECT
- Devuelve SOLO el SQL, sin explicaciones

SQL:
'''
    
    resp = client.chat.completions.create(
        model='gpt-4',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    
    return resp.choices[0].message.content.strip().replace('```sql','').replace('```','').strip()

def execute_nl_query(question: str):
    """Ejecuta query desde lenguaje natural."""
    try:
        sql = nl_to_sql(question)
        
        if not is_safe_query(sql):
            return {'error': 'Query no segura detectada'}
        
        df = pd.read_sql_query(sql, conn)
        
        return {
            'sql': sql,
            'data': df,
            'rows': len(df)
        }
    except Exception as e:
        return {'error': str(e)}

# Test
q3 = '¿Cuántas ventas hubo en total?'
result = execute_nl_query(q3)
print(f'❓ {q3}')
print(f'SQL generado: {result["sql"]}')
print(f'Resultado:\n{result["data"]}')

## Parte 6: Chatbot unificado

In [None]:
def classify_intent(question: str):
    """Clasifica la intención de la pregunta."""
    prompt = f'''
Clasifica esta pregunta en UNA categoría:
- SCHEMA: pregunta sobre estructura de datos, definiciones
- QUERY: requiere ejecutar una consulta SQL

Pregunta: {question}

Responde solo con: SCHEMA o QUERY
'''
    
    resp = client.chat.completions.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0
    )
    
    return resp.choices[0].message.content.strip()

def chatbot(question: str):
    """Chatbot inteligente que elige RAG o NL2SQL."""
    intent = classify_intent(question)
    
    if intent == 'SCHEMA':
        return {
            'type': 'RAG',
            'answer': rag_answer(question)
        }
    else:  # QUERY
        result = execute_nl_query(question)
        if 'error' in result:
            return {'type': 'ERROR', 'answer': result['error']}
        return {
            'type': 'SQL',
            'sql': result['sql'],
            'data': result['data'],
            'rows': result['rows']
        }

# Tests
preguntas = [
    '¿Qué información tiene la tabla ventas?',
    '¿Cuál fue el total de ventas por región?',
    '¿Cómo se define el revenue?',
    'Top 3 productos por ingresos'
]

for q in preguntas:
    print(f'\n❓ {q}')
    response = chatbot(q)
    print(f'Tipo: {response["type"]}')
    if response['type'] == 'SQL':
        print(f'SQL: {response["sql"]}')
        print(response['data'])
    else:
        print(response['answer'])

### ⚡ **Performance Optimization: Caching & Query Optimization**

**Multi-Layer Caching Strategy:**

```python
┌────────────────────────────────────────────────────────────────┐
│                    CACHING ARCHITECTURE                         │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  L1: EMBEDDINGS CACHE (Persistent, Disk)                       │
│      • ChromaDB stores embeddings                              │
│      • Avoids re-embedding same questions                      │
│      • ~$0.0001 saved per cached query                         │
│                                                                 │
│  L2: INTENT CLASSIFICATION CACHE (Redis, 1 hour TTL)           │
│      • Cache question → intent mapping                         │
│      • Key: hash(question) → Intent object                     │
│      • Hit rate: ~40% for common questions                     │
│      • ~0.5s saved per hit                                     │
│                                                                 │
│  L3: SQL GENERATION CACHE (Redis, 24 hour TTL)                 │
│      • Cache question → SQL mapping                            │
│      • Invalidate on schema changes                            │
│      • Hit rate: ~60% for repetitive queries                   │
│      • ~2s + $0.01 saved per hit                               │
│                                                                 │
│  L4: QUERY RESULTS CACHE (Redis, 5 min TTL)                    │
│      • Cache SQL → DataFrame                                   │
│      • Short TTL (data freshness)                              │
│      • Hit rate: ~30% for dashboards                           │
│      • ~5s saved per hit (DB query time)                       │
│                                                                 │
│  L5: RAG ANSWER CACHE (Redis, 1 week TTL)                      │
│      • Cache schema questions → answers                        │
│      • Long TTL (schema rarely changes)                        │
│      • Hit rate: ~70% for documentation queries                │
│      • ~1s + $0.005 saved per hit                              │
└────────────────────────────────────────────────────────────────┘
```

**Redis-Based Caching Implementation:**

```python
import redis
import hashlib
import pickle
from typing import Optional, Any
from functools import wraps

class CacheManager:
    """Multi-layer cache manager con Redis"""
    
    def __init__(self, redis_url: str = 'redis://localhost:6379/0'):
        self.redis = redis.from_url(redis_url, decode_responses=False)
        
        # TTL por tipo de cache (segundos)
        self.ttl_config = {
            'intent': 3600,        # 1 hour
            'sql': 86400,          # 24 hours
            'results': 300,        # 5 minutes
            'rag': 604800,         # 1 week
        }
    
    def _make_key(self, cache_type: str, query: str, context: Dict = None) -> str:
        """Genera cache key determinístico"""
        # Include context in key for user-specific caching
        key_parts = [cache_type, query]
        if context:
            key_parts.append(json.dumps(context, sort_keys=True))
        
        key_string = '|'.join(key_parts)
        key_hash = hashlib.sha256(key_string.encode()).hexdigest()[:16]
        
        return f"chatbot:{cache_type}:{key_hash}"
    
    def get(self, cache_type: str, query: str, context: Dict = None) -> Optional[Any]:
        """Obtiene valor del cache"""
        key = self._make_key(cache_type, query, context)
        
        cached = self.redis.get(key)
        if cached:
            # Deserialize
            return pickle.loads(cached)
        
        return None
    
    def set(self, cache_type: str, query: str, value: Any, context: Dict = None):
        """Guarda valor en cache con TTL"""
        key = self._make_key(cache_type, query, context)
        ttl = self.ttl_config.get(cache_type, 3600)
        
        # Serialize
        serialized = pickle.dumps(value)
        
        self.redis.setex(key, ttl, serialized)
    
    def invalidate(self, cache_type: str, pattern: str = '*'):
        """Invalida cache por tipo"""
        keys_pattern = f"chatbot:{cache_type}:{pattern}"
        
        for key in self.redis.scan_iter(match=keys_pattern):
            self.redis.delete(key)
    
    def get_stats(self) -> Dict:
        """Estadísticas de cache"""
        stats = {}
        
        for cache_type in self.ttl_config.keys():
            pattern = f"chatbot:{cache_type}:*"
            keys = list(self.redis.scan_iter(match=pattern))
            stats[cache_type] = {
                'keys': len(keys),
                'ttl': self.ttl_config[cache_type]
            }
        
        return stats

# Decorator para cachear funciones
def cached(cache_type: str):
    """Decorator para cachear automáticamente"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Extract query from args
            query = args[0] if args else kwargs.get('question', kwargs.get('query'))
            
            # Check cache
            cache_mgr = kwargs.get('cache_manager') or CacheManager()
            cached_result = cache_mgr.get(cache_type, query)
            
            if cached_result is not None:
                print(f"✅ Cache HIT [{cache_type}]: {query[:50]}...")
                return cached_result
            
            print(f"❌ Cache MISS [{cache_type}]: {query[:50]}...")
            
            # Execute function
            result = func(*args, **kwargs)
            
            # Store in cache
            cache_mgr.set(cache_type, query, result)
            
            return result
        return wrapper
    return decorator

# Aplicar caching a funciones existentes
cache = CacheManager()

@cached('intent')
def classify_intent_cached(question: str, cache_manager=None):
    """Intent classification con cache"""
    return classify_intent_advanced(question)

@cached('sql')
def nl_to_sql_cached(question: str, cache_manager=None):
    """NL2SQL con cache"""
    return nl_to_sql(question)

@cached('rag')
def rag_answer_cached(question: str, cache_manager=None):
    """RAG con cache"""
    return rag_answer(question)

# Chatbot con caching completo
def chatbot_with_caching(question: str) -> Dict:
    """Chatbot optimizado con multi-layer caching"""
    
    # Layer 2: Intent cache
    intent = classify_intent_cached(question, cache_manager=cache)
    
    if intent.type == 'SCHEMA':
        # Layer 5: RAG answer cache
        answer = rag_answer_cached(question, cache_manager=cache)
        return {'type': 'RAG', 'answer': answer}
    
    else:
        # Layer 3: SQL generation cache
        sql = nl_to_sql_cached(question, cache_manager=cache)
        
        # Layer 4: Query results cache
        cache_key = f"results:{sql}"
        cached_df = cache.get('results', cache_key)
        
        if cached_df is not None:
            print(f"✅ Cache HIT [results]")
            return {
                'type': 'SQL',
                'sql': sql,
                'data': cached_df,
                'from_cache': True
            }
        
        # Execute query
        df = pd.read_sql_query(sql, conn)
        
        # Cache results
        cache.set('results', cache_key, df)
        
        return {
            'type': 'SQL',
            'sql': sql,
            'data': df,
            'from_cache': False
        }

# Performance comparison
import time

def benchmark_caching():
    """Compara performance con y sin cache"""
    
    questions = [
        "What are total sales by region?",
        "What columns does sales table have?",
        "Show me top 5 products by revenue"
    ] * 3  # Repetir para demostrar cache hits
    
    # Sin cache
    start = time.time()
    for q in questions:
        _ = chatbot(q)  # Función original
    time_no_cache = time.time() - start
    
    # Con cache
    start = time.time()
    for q in questions:
        _ = chatbot_with_caching(q)
    time_with_cache = time.time() - start
    
    print(f"\n📊 CACHING PERFORMANCE:")
    print(f"Without cache: {time_no_cache:.2f}s")
    print(f"With cache: {time_with_cache:.2f}s")
    print(f"Speedup: {time_no_cache/time_with_cache:.1f}x")
    
    # Cost savings
    cost_per_query = 0.01  # Average
    queries_cached = len(questions) * 0.6  # Assuming 60% hit rate
    cost_saved = queries_cached * cost_per_query
    
    print(f"Cost saved: ${cost_saved:.2f}")

# Cache stats dashboard
stats = cache.get_stats()
print("\n📈 CACHE STATISTICS:")
for cache_type, data in stats.items():
    print(f"{cache_type}: {data['keys']} keys, TTL={data['ttl']}s")
```

**Query Optimization Strategies:**

```python
class QueryOptimizer:
    """Optimiza SQL generado por LLM"""
    
    def __init__(self, db_connection):
        self.db = db_connection
    
    def optimize(self, sql: str) -> tuple[str, Dict]:
        """
        Optimiza SQL y retorna versión mejorada + explicación.
        
        Optimizations:
        1. Add LIMIT if missing (prevent full table scans)
        2. Suggest indexes for WHERE/JOIN columns
        3. Replace SELECT * with explicit columns
        4. Rewrite subqueries as JOINs when possible
        """
        
        optimizations = []
        optimized_sql = sql
        
        # 1. Add LIMIT if missing (safety)
        if 'LIMIT' not in sql.upper():
            optimized_sql += ' LIMIT 10000'
            optimizations.append({
                'type': 'LIMIT_ADDED',
                'reason': 'Prevent full table scan (max 10K rows)',
                'impact': 'High'
            })
        
        # 2. Check for SELECT *
        if 'SELECT *' in sql.upper():
            # Suggest explicit columns (would need schema context)
            optimizations.append({
                'type': 'SELECT_STAR',
                'reason': 'SELECT * retrieves unnecessary columns',
                'suggestion': 'Specify only needed columns',
                'impact': 'Medium'
            })
        
        # 3. Analyze EXPLAIN QUERY PLAN
        explain_result = self.db.execute(f'EXPLAIN QUERY PLAN {optimized_sql}').fetchall()
        
        for row in explain_result:
            plan_detail = row[3]  # SQLite EXPLAIN format
            
            # Detect full table scan
            if 'SCAN TABLE' in plan_detail and 'USING INDEX' not in plan_detail:
                table_name = plan_detail.split('SCAN TABLE')[1].split()[0]
                optimizations.append({
                    'type': 'FULL_TABLE_SCAN',
                    'reason': f'Full table scan on {table_name}',
                    'suggestion': f'Consider adding index on WHERE clause columns',
                    'impact': 'High'
                })
        
        # 4. Execution time estimate
        start = time.time()
        self.db.execute(optimized_sql)
        execution_time = time.time() - start
        
        if execution_time > 5.0:
            optimizations.append({
                'type': 'SLOW_QUERY',
                'reason': f'Query took {execution_time:.2f}s (threshold: 5s)',
                'suggestion': 'Consider adding indexes or simplifying query',
                'impact': 'Critical'
            })
        
        return optimized_sql, {
            'optimizations': optimizations,
            'execution_time_sec': execution_time,
            'explain_plan': explain_result
        }
    
    def suggest_indexes(self, sql: str) -> list[str]:
        """Sugiere índices basados en el SQL"""
        
        # Parse WHERE clause
        import sqlparse
        parsed = sqlparse.parse(sql)[0]
        
        where_columns = self._extract_where_columns(parsed)
        
        suggestions = []
        for table, columns in where_columns.items():
            for col in columns:
                suggestions.append(f"CREATE INDEX idx_{table}_{col} ON {table}({col});")
        
        return suggestions
    
    def _extract_where_columns(self, parsed) -> Dict[str, Set[str]]:
        """Extrae columnas usadas en WHERE clause"""
        # Simplified - production would use proper SQL parsing
        where_columns = {}
        
        # Placeholder logic
        # In production: parse AST to extract WHERE conditions
        
        return where_columns

# Uso en chatbot
def chatbot_with_optimization(question: str) -> Dict:
    """Chatbot con optimización de queries"""
    
    sql = nl_to_sql_cached(question, cache_manager=cache)
    
    # Optimize SQL
    optimizer = QueryOptimizer(conn)
    optimized_sql, opt_info = optimizer.optimize(sql)
    
    # Execute optimized SQL
    df = pd.read_sql_query(optimized_sql, conn)
    
    # Suggest indexes if slow
    suggestions = []
    if opt_info['execution_time_sec'] > 2.0:
        suggestions = optimizer.suggest_indexes(optimized_sql)
    
    return {
        'type': 'SQL',
        'sql': optimized_sql,
        'data': df,
        'optimization_info': opt_info,
        'index_suggestions': suggestions
    }

# Ejemplo
result = chatbot_with_optimization("Show me all sales from last year")

print(f"SQL: {result['sql']}")
print(f"Execution time: {result['optimization_info']['execution_time_sec']:.2f}s")
print(f"Optimizations applied: {len(result['optimization_info']['optimizations'])}")

if result['index_suggestions']:
    print("\n💡 Index suggestions:")
    for suggestion in result['index_suggestions']:
        print(f"   {suggestion}")
```

**Concurrent Request Handling:**

```python
import asyncio
from concurrent.futures import ThreadPoolExecutor
from typing import List

class AsyncChatbot:
    """Chatbot con procesamiento paralelo"""
    
    def __init__(self, max_workers: int = 10):
        self.executor = ThreadPoolExecutor(max_workers=max_workers)
        self.cache = CacheManager()
    
    async def process_question(self, question: str, user_context: Dict) -> Dict:
        """Procesa una pregunta de forma asíncrona"""
        
        loop = asyncio.get_event_loop()
        
        # Run blocking chatbot function in thread pool
        result = await loop.run_in_executor(
            self.executor,
            chatbot_with_caching,
            question
        )
        
        return result
    
    async def process_batch(self, questions: List[str], user_context: Dict) -> List[Dict]:
        """Procesa múltiples preguntas en paralelo"""
        
        tasks = [
            self.process_question(q, user_context)
            for q in questions
        ]
        
        results = await asyncio.gather(*tasks)
        
        return results

# Uso con FastAPI
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel

app = FastAPI()
async_chatbot = AsyncChatbot(max_workers=10)

class QuestionRequest(BaseModel):
    question: str
    user_id: str

class BatchRequest(BaseModel):
    questions: List[str]
    user_id: str

@app.post('/v1/chat')
async def chat_endpoint(request: QuestionRequest):
    """Endpoint asíncrono para preguntas individuales"""
    
    result = await async_chatbot.process_question(
        question=request.question,
        user_context={'user_id': request.user_id}
    )
    
    return result

@app.post('/v1/chat/batch')
async def batch_chat_endpoint(request: BatchRequest):
    """Endpoint para procesar múltiples preguntas en paralelo"""
    
    results = await async_chatbot.process_batch(
        questions=request.questions,
        user_context={'user_id': request.user_id}
    )
    
    return {
        'count': len(results),
        'results': results
    }

# Benchmark parallelism
async def benchmark_parallel():
    questions = [
        "What are total sales?",
        "Top 5 products by revenue?",
        "Sales by region?",
        "Average order value?",
        "Customer count by segment?"
    ]
    
    # Sequential
    start = time.time()
    for q in questions:
        _ = chatbot_with_caching(q)
    seq_time = time.time() - start
    
    # Parallel
    start = time.time()
    results = await async_chatbot.process_batch(questions, {})
    par_time = time.time() - start
    
    print(f"Sequential: {seq_time:.2f}s")
    print(f"Parallel: {par_time:.2f}s")
    print(f"Speedup: {seq_time/par_time:.1f}x")

# asyncio.run(benchmark_parallel())
```

**Performance Monitoring Dashboard:**

```python
from dataclasses import dataclass
from collections import defaultdict
import plotly.graph_objects as go

@dataclass
class PerformanceMetrics:
    """Métricas de performance del chatbot"""
    total_requests: int
    cache_hits: int
    cache_misses: int
    avg_latency_ms: float
    p95_latency_ms: float
    p99_latency_ms: float
    error_rate: float
    cost_total_usd: float
    cost_per_request_usd: float

class PerformanceMonitor:
    """Monitorea performance del chatbot"""
    
    def __init__(self):
        self.metrics = defaultdict(list)
    
    def record(self, request_type: str, latency_ms: float, from_cache: bool, cost_usd: float, success: bool):
        """Registra métrica de una request"""
        self.metrics['latency'].append(latency_ms)
        self.metrics['cache_hit'].append(1 if from_cache else 0)
        self.metrics['cost'].append(cost_usd)
        self.metrics['success'].append(1 if success else 0)
        self.metrics['type'].append(request_type)
    
    def get_metrics(self) -> PerformanceMetrics:
        """Calcula métricas agregadas"""
        
        total = len(self.metrics['latency'])
        cache_hits = sum(self.metrics['cache_hit'])
        
        latencies = sorted(self.metrics['latency'])
        p95_idx = int(len(latencies) * 0.95)
        p99_idx = int(len(latencies) * 0.99)
        
        return PerformanceMetrics(
            total_requests=total,
            cache_hits=cache_hits,
            cache_misses=total - cache_hits,
            avg_latency_ms=np.mean(self.metrics['latency']),
            p95_latency_ms=latencies[p95_idx] if latencies else 0,
            p99_latency_ms=latencies[p99_idx] if latencies else 0,
            error_rate=(total - sum(self.metrics['success'])) / total if total > 0 else 0,
            cost_total_usd=sum(self.metrics['cost']),
            cost_per_request_usd=np.mean(self.metrics['cost'])
        )
    
    def create_dashboard(self) -> go.Figure:
        """Genera dashboard de performance"""
        
        from plotly.subplots import make_subplots
        
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=['Latency Distribution', 'Cache Hit Rate', 'Cost Over Time', 'Request Types']
        )
        
        # Latency histogram
        fig.add_trace(
            go.Histogram(x=self.metrics['latency'], name='Latency (ms)'),
            row=1, col=1
        )
        
        # Cache hit rate
        cache_hit_rate = sum(self.metrics['cache_hit']) / len(self.metrics['cache_hit']) * 100
        fig.add_trace(
            go.Indicator(
                mode='gauge+number',
                value=cache_hit_rate,
                title={'text': 'Cache Hit Rate (%)'},
                gauge={'axis': {'range': [0, 100]}}
            ),
            row=1, col=2
        )
        
        # Cost over time
        fig.add_trace(
            go.Scatter(y=np.cumsum(self.metrics['cost']), mode='lines', name='Cumulative Cost'),
            row=2, col=1
        )
        
        # Request types pie
        type_counts = pd.Series(self.metrics['type']).value_counts()
        fig.add_trace(
            go.Pie(labels=type_counts.index, values=type_counts.values),
            row=2, col=2
        )
        
        fig.update_layout(height=800, showlegend=False, title_text="Chatbot Performance Dashboard")
        
        return fig

# Integración con chatbot
monitor = PerformanceMonitor()

def chatbot_monitored(question: str) -> Dict:
    """Chatbot con monitoreo de performance"""
    
    start = time.time()
    
    try:
        result = chatbot_with_caching(question)
        
        latency_ms = (time.time() - start) * 1000
        from_cache = result.get('from_cache', False)
        cost = 0.0 if from_cache else 0.01  # Estimate
        
        monitor.record(
            request_type=result['type'],
            latency_ms=latency_ms,
            from_cache=from_cache,
            cost_usd=cost,
            success=True
        )
        
        return result
    
    except Exception as e:
        latency_ms = (time.time() - start) * 1000
        
        monitor.record(
            request_type='ERROR',
            latency_ms=latency_ms,
            from_cache=False,
            cost_usd=0.0,
            success=False
        )
        
        raise

# Generar reporte
metrics = monitor.get_metrics()
print(f"""
📊 PERFORMANCE REPORT
━━━━━━━━━━━━━━━━━━━━━
Total Requests:     {metrics.total_requests}
Cache Hit Rate:     {metrics.cache_hits / metrics.total_requests * 100:.1f}%
Avg Latency:        {metrics.avg_latency_ms:.0f}ms
P95 Latency:        {metrics.p95_latency_ms:.0f}ms
P99 Latency:        {metrics.p99_latency_ms:.0f}ms
Error Rate:         {metrics.error_rate * 100:.2f}%
Total Cost:         ${metrics.cost_total_usd:.2f}
Cost per Request:   ${metrics.cost_per_request_usd:.4f}
""")

# dashboard = monitor.create_dashboard()
# dashboard.show()
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

## Parte 7: Interfaz Streamlit

In [None]:
# Guardar como app.py y ejecutar: streamlit run app.py

streamlit_code = '''
import streamlit as st
import pandas as pd
import plotly.express as px
from chatbot import chatbot  # Importar función

st.set_page_config(page_title='Data Chatbot', page_icon='🤖', layout='wide')

st.title('🤖 Chatbot de Consulta de Datos')
st.markdown('Pregunta sobre tus datos en lenguaje natural')

# Historial en session_state
if 'history' not in st.session_state:
    st.session_state.history = []

# Input
question = st.text_input('Tu pregunta:', placeholder='Ej: ¿Cuáles son las ventas totales por categoría?')

if st.button('Consultar') and question:
    with st.spinner('Procesando...'):
        response = chatbot(question)
        st.session_state.history.append({'q': question, 'r': response})
        
        if response['type'] == 'SQL':
            st.success(f"SQL ejecutado: `{response['sql']}`")
            st.dataframe(response['data'])
            
            # Visualización automática si es numérico
            if len(response['data'].columns) == 2:
                fig = px.bar(response['data'], x=response['data'].columns[0], y=response['data'].columns[1])
                st.plotly_chart(fig)
        else:
            st.info(response['answer'])

# Historial
if st.session_state.history:
    st.markdown('---')
    st.subheader('📜 Historial')
    for item in reversed(st.session_state.history[-5:]):
        st.markdown(f"**Q:** {item['q']}")
        if item['r']['type'] == 'SQL':
            st.code(item['r']['sql'])
        st.markdown('---')
'''

with open('app.py', 'w') as f:
    f.write(streamlit_code)

print('✅ App Streamlit guardada en app.py')
print('Ejecuta: streamlit run app.py')

## Parte 8: Mejoras y extensiones

### 🚀 **Production Deployment & Scaling Strategy**

**Deployment Architecture (Cloud-Native):**

```
┌──────────────────────────────────────────────────────────────────┐
│              PRODUCTION DEPLOYMENT ARCHITECTURE                   │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  EDGE LAYER (CDN + Load Balancer)                               │
│  ┌────────────────────────────────────┐                         │
│  │ CloudFlare / CloudFront            │                         │
│  │  • Static assets caching           │                         │
│  │  • DDoS protection                 │                         │
│  │  • SSL termination                 │                         │
│  └────────────────────────────────────┘                         │
│                    ↓                                              │
│  APPLICATION LAYER (Kubernetes)                                  │
│  ┌────────────────────────────────────┐                         │
│  │ FastAPI Pods (autoscale 3-20)      │                         │
│  │  • Uvicorn workers: 4 per pod      │                         │
│  │  • Resources: 1 CPU, 2GB RAM       │                         │
│  │  • Health checks: /health           │                         │
│  │  • Horizontal Pod Autoscaler        │                         │
│  │    (target: 70% CPU)                │                         │
│  └────────────────────────────────────┘                         │
│                    ↓                                              │
│  CACHING LAYER (Redis Cluster)                                   │
│  ┌────────────────────────────────────┐                         │
│  │ Redis (HA, 3 nodes)                │                         │
│  │  • Intent cache (1h TTL)            │                         │
│  │  • SQL cache (24h TTL)              │                         │
│  │  • Results cache (5min TTL)         │                         │
│  │  • Persistence: AOF + RDB           │                         │
│  └────────────────────────────────────┘                         │
│                    ↓                                              │
│  VECTOR STORE (Pinecone / Weaviate)                             │
│  ┌────────────────────────────────────┐                         │
│  │ Production vector DB                │                         │
│  │  • 10M+ vectors capacity            │                         │
│  │  • Automatic backups                │                         │
│  │  • Replication: 2x                  │                         │
│  └────────────────────────────────────┘                         │
│                    ↓                                              │
│  DATA LAYER (PostgreSQL RDS)                                     │
│  ┌────────────────────────────────────┐                         │
│  │ AWS RDS PostgreSQL (read replicas)  │                         │
│  │  • Primary: writes + reads          │                         │
│  │  • Replica 1: chatbot reads         │                         │
│  │  • Replica 2: analytics reads       │                         │
│  │  • Connection pooling (PgBouncer)   │                         │
│  └────────────────────────────────────┘                         │
│                                                                   │
│  OBSERVABILITY (Prometheus + Grafana)                            │
│  ┌────────────────────────────────────┐                         │
│  │  • Metrics: latency, errors, cost   │                         │
│  │  • Logs: ELK Stack (ES, Logstash)   │                         │
│  │  • Tracing: Jaeger (distributed)    │                         │
│  │  • Alerts: PagerDuty integration    │                         │
│  └────────────────────────────────────┘                         │
└──────────────────────────────────────────────────────────────────┘
```

**Kubernetes Deployment Manifest:**

```yaml
# chatbot-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatbot-api
  namespace: data-platform
spec:
  replicas: 3  # Minimum replicas
  selector:
    matchLabels:
      app: chatbot-api
  template:
    metadata:
      labels:
        app: chatbot-api
        version: v1.2.0
    spec:
      containers:
      - name: fastapi
        image: myregistry.io/chatbot-api:v1.2.0
        ports:
        - containerPort: 8000
          name: http
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-secrets
              key: api-key
        - name: REDIS_URL
          value: "redis://redis-cluster:6379/0"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secrets
              key: connection-string
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        
---
apiVersion: v1
kind: Service
metadata:
  name: chatbot-api
  namespace: data-platform
spec:
  selector:
    app: chatbot-api
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: chatbot-hpa
  namespace: data-platform
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: chatbot-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
```

**Health Checks & Circuit Breaker:**

```python
from fastapi import FastAPI, Response, status
from datetime import datetime
import psutil

app = FastAPI()

class HealthChecker:
    """Verifica salud del servicio"""
    
    def __init__(self):
        self.start_time = datetime.now()
        self.request_count = 0
        self.error_count = 0
    
    def check_health(self) -> tuple[bool, Dict]:
        """Comprehensive health check"""
        
        checks = {}
        is_healthy = True
        
        # 1. Redis connectivity
        try:
            cache.redis.ping()
            checks['redis'] = {'status': 'UP', 'latency_ms': 1}
        except Exception as e:
            checks['redis'] = {'status': 'DOWN', 'error': str(e)}
            is_healthy = False
        
        # 2. Database connectivity
        try:
            conn.execute('SELECT 1')
            checks['database'] = {'status': 'UP'}
        except Exception as e:
            checks['database'] = {'status': 'DOWN', 'error': str(e)}
            is_healthy = False
        
        # 3. OpenAI API
        try:
            # Quick test with minimal cost
            client.models.list()
            checks['openai'] = {'status': 'UP'}
        except Exception as e:
            checks['openai'] = {'status': 'DOWN', 'error': str(e)}
            is_healthy = False
        
        # 4. System resources
        cpu_percent = psutil.cpu_percent()
        memory_percent = psutil.virtual_memory().percent
        
        checks['system'] = {
            'cpu_percent': cpu_percent,
            'memory_percent': memory_percent,
            'status': 'UP' if cpu_percent < 90 and memory_percent < 90 else 'DEGRADED'
        }
        
        # 5. Error rate
        error_rate = self.error_count / max(self.request_count, 1)
        checks['error_rate'] = {
            'value': error_rate,
            'threshold': 0.05,
            'status': 'UP' if error_rate < 0.05 else 'DEGRADED'
        }
        
        return is_healthy, checks

health_checker = HealthChecker()

@app.get('/health')
def health_check():
    """Health check endpoint (for liveness probe)"""
    is_healthy, checks = health_checker.check_health()
    
    status_code = 200 if is_healthy else 503
    
    return Response(
        content=json.dumps({
            'status': 'UP' if is_healthy else 'DOWN',
            'uptime_seconds': (datetime.now() - health_checker.start_time).total_seconds(),
            'checks': checks
        }),
        status_code=status_code,
        media_type='application/json'
    )

@app.get('/ready')
def readiness_check():
    """Readiness check (for readiness probe)"""
    # Simpler check - can service handle requests?
    try:
        cache.redis.ping()
        return {'status': 'READY'}
    except:
        return Response(status_code=503, content='{"status": "NOT_READY"}')

@app.get('/metrics')
def metrics_endpoint():
    """Prometheus metrics endpoint"""
    from prometheus_client import generate_latest, CONTENT_TYPE_LATEST
    
    return Response(
        content=generate_latest(),
        media_type=CONTENT_TYPE_LATEST
    )
```

**Circuit Breaker Pattern:**

```python
from datetime import datetime, timedelta
from enum import Enum

class CircuitState(Enum):
    CLOSED = 'closed'      # Normal operation
    OPEN = 'open'          # Failures exceeded threshold, rejecting requests
    HALF_OPEN = 'half_open'  # Testing if service recovered

class CircuitBreaker:
    """
    Circuit breaker para external services (OpenAI, Database).
    
    Prevents cascading failures cuando un servicio está caído.
    """
    
    def __init__(
        self,
        failure_threshold: int = 5,
        timeout_seconds: int = 60,
        half_open_max_calls: int = 3
    ):
        self.failure_threshold = failure_threshold
        self.timeout_seconds = timeout_seconds
        self.half_open_max_calls = half_open_max_calls
        
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = None
        self.half_open_calls = 0
    
    def call(self, func, *args, **kwargs):
        """Execute function with circuit breaker protection"""
        
        if self.state == CircuitState.OPEN:
            # Check if timeout expired
            if datetime.now() - self.last_failure_time > timedelta(seconds=self.timeout_seconds):
                self.state = CircuitState.HALF_OPEN
                self.half_open_calls = 0
            else:
                raise CircuitBreakerOpenError("Circuit breaker is OPEN")
        
        if self.state == CircuitState.HALF_OPEN:
            if self.half_open_calls >= self.half_open_max_calls:
                raise CircuitBreakerOpenError("Half-open limit reached")
        
        try:
            result = func(*args, **kwargs)
            
            # Success - reset or close circuit
            if self.state == CircuitState.HALF_OPEN:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
            
            return result
        
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = datetime.now()
            
            if self.state == CircuitState.HALF_OPEN:
                # Failed during half-open, go back to open
                self.state = CircuitState.OPEN
            elif self.failure_count >= self.failure_threshold:
                # Threshold exceeded, open circuit
                self.state = CircuitState.OPEN
            
            raise e

class CircuitBreakerOpenError(Exception):
    pass

# Uso
openai_circuit = CircuitBreaker(failure_threshold=5, timeout_seconds=60)
db_circuit = CircuitBreaker(failure_threshold=3, timeout_seconds=30)

def chatbot_with_circuit_breaker(question: str) -> Dict:
    """Chatbot con circuit breaker protection"""
    
    try:
        # Try OpenAI with circuit breaker
        intent = openai_circuit.call(classify_intent_advanced, question)
        
        if intent.type == 'QUERY':
            # Try database with circuit breaker
            sql = openai_circuit.call(nl_to_sql, question)
            df = db_circuit.call(pd.read_sql_query, sql, conn)
            
            return {'type': 'SQL', 'data': df}
        else:
            answer = openai_circuit.call(rag_answer, question)
            return {'type': 'RAG', 'answer': answer}
    
    except CircuitBreakerOpenError as e:
        # Service unavailable - return cached or degraded response
        return {
            'type': 'ERROR',
            'message': 'Service temporarily unavailable. Please try again later.',
            'retry_after_seconds': 60
        }
```

**Scaling Strategy & Cost Optimization:**

```python
class CostOptimizer:
    """Optimiza costos de operación"""
    
    # Pricing (as of 2024)
    PRICING = {
        'gpt-4': {'input': 0.03, 'output': 0.06},  # per 1K tokens
        'gpt-3.5-turbo': {'input': 0.0015, 'output': 0.002},
        'text-embedding-ada-002': 0.0001,  # per 1K tokens
        'redis': 0.05,  # per GB-hour
        'postgres': 0.10,  # per GB-hour
        'compute': 0.05,  # per vCPU-hour
    }
    
    def estimate_monthly_cost(
        self,
        daily_requests: int,
        cache_hit_rate: float = 0.6,
        avg_tokens_per_request: int = 500
    ) -> Dict:
        """Estima costo mensual"""
        
        monthly_requests = daily_requests * 30
        
        # LLM costs (considering cache)
        llm_requests = monthly_requests * (1 - cache_hit_rate)
        
        # Assume 30% use GPT-4, 70% use GPT-3.5
        gpt4_requests = llm_requests * 0.3
        gpt35_requests = llm_requests * 0.7
        
        gpt4_cost = (gpt4_requests * avg_tokens_per_request / 1000) * (
            self.PRICING['gpt-4']['input'] + self.PRICING['gpt-4']['output']
        )
        
        gpt35_cost = (gpt35_requests * avg_tokens_per_request / 1000) * (
            self.PRICING['gpt-3.5-turbo']['input'] + self.PRICING['gpt-3.5-turbo']['output']
        )
        
        llm_cost = gpt4_cost + gpt35_cost
        
        # Embeddings (only for new documents)
        embedding_cost = 100  # Assume 100K tokens/month for new docs
        
        # Infrastructure
        redis_cost = 4 * 720 * self.PRICING['redis']  # 4GB Redis
        postgres_cost = 20 * 720 * self.PRICING['postgres']  # 20GB Postgres
        compute_cost = 10 * 720 * self.PRICING['compute']  # 10 vCPUs average
        
        infra_cost = redis_cost + postgres_cost + compute_cost
        
        total = llm_cost + embedding_cost + infra_cost
        
        return {
            'llm_cost': llm_cost,
            'embedding_cost': embedding_cost,
            'infrastructure_cost': infra_cost,
            'total_monthly_usd': total,
            'cost_per_request_usd': total / monthly_requests,
            'breakdown': {
                'gpt-4': gpt4_cost,
                'gpt-3.5-turbo': gpt35_cost,
                'embeddings': embedding_cost,
                'redis': redis_cost,
                'postgres': postgres_cost,
                'compute': compute_cost
            }
        }
    
    def recommend_optimizations(self, current_config: Dict) -> List[str]:
        """Recomienda optimizaciones de costo"""
        
        recommendations = []
        
        if current_config['cache_hit_rate'] < 0.5:
            recommendations.append(
                "⬆️ Increase cache TTL → 50% hit rate → save ~$500/month"
            )
        
        if current_config['gpt4_usage_pct'] > 0.5:
            recommendations.append(
                "⬇️ Route more queries to GPT-3.5 → 20x cheaper → save ~$1000/month"
            )
        
        if current_config['avg_tokens_per_request'] > 1000:
            recommendations.append(
                "✂️ Optimize prompts to reduce tokens → save ~$300/month"
            )
        
        return recommendations

# Ejemplo
optimizer = CostOptimizer()

cost_estimate = optimizer.estimate_monthly_cost(
    daily_requests=10000,
    cache_hit_rate=0.6,
    avg_tokens_per_request=500
)

print("💰 MONTHLY COST ESTIMATE:")
print(f"Total: ${cost_estimate['total_monthly_usd']:,.2f}")
print(f"Cost per request: ${cost_estimate['cost_per_request_usd']:.4f}")
print(f"\nBreakdown:")
for component, cost in cost_estimate['breakdown'].items():
    print(f"  {component}: ${cost:,.2f}")

# Recommendations
recommendations = optimizer.recommend_optimizations({
    'cache_hit_rate': 0.45,
    'gpt4_usage_pct': 0.60,
    'avg_tokens_per_request': 1200
})

print("\n💡 OPTIMIZATION RECOMMENDATIONS:")
for rec in recommendations:
    print(f"  {rec}")
```

**Disaster Recovery & Backup Strategy:**

```python
from datetime import datetime
import boto3

class BackupManager:
    """Gestiona backups y disaster recovery"""
    
    def __init__(self, s3_bucket: str):
        self.s3 = boto3.client('s3')
        self.bucket = s3_bucket
    
    def backup_vector_store(self):
        """Backup de ChromaDB/Pinecone a S3"""
        
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        
        # Export ChromaDB
        collection = chroma_client.get_collection('data_docs')
        data = collection.get(include=['documents', 'embeddings', 'metadatas'])
        
        # Save to S3
        backup_key = f'backups/vector_store_{timestamp}.json'
        self.s3.put_object(
            Bucket=self.bucket,
            Key=backup_key,
            Body=json.dumps(data)
        )
        
        print(f"✅ Vector store backed up to s3://{self.bucket}/{backup_key}")
    
    def backup_database(self):
        """Backup de PostgreSQL a S3"""
        
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        
        # Use pg_dump
        import subprocess
        dump_file = f'/tmp/db_backup_{timestamp}.sql'
        
        subprocess.run([
            'pg_dump',
            '-h', os.getenv('DB_HOST'),
            '-U', os.getenv('DB_USER'),
            '-d', os.getenv('DB_NAME'),
            '-f', dump_file
        ], check=True)
        
        # Upload to S3
        with open(dump_file, 'rb') as f:
            self.s3.upload_fileobj(f, self.bucket, f'backups/database_{timestamp}.sql')
        
        print(f"✅ Database backed up to S3")
    
    def restore_from_backup(self, backup_date: str):
        """Restaura desde backup"""
        
        # Restore vector store
        vector_key = f'backups/vector_store_{backup_date}.json'
        obj = self.s3.get_object(Bucket=self.bucket, Key=vector_key)
        data = json.loads(obj['Body'].read())
        
        # Re-index
        collection = chroma_client.get_or_create_collection('data_docs')
        collection.add(
            ids=data['ids'],
            documents=data['documents'],
            embeddings=data['embeddings'],
            metadatas=data['metadatas']
        )
        
        print(f"✅ Vector store restored from {backup_date}")

# Scheduled backups (use Airflow or Kubernetes CronJob)
backup_mgr = BackupManager(s3_bucket='chatbot-backups')

# Daily backups
# Schedule: 0 2 * * * (2 AM daily)
backup_mgr.backup_vector_store()
backup_mgr.backup_database()
```

**Production Readiness Checklist:**

```markdown
## 🚀 PRODUCTION READINESS CHECKLIST

### 1. Performance ✅
- [ ] Latency p95 < 3s
- [ ] Cache hit rate > 50%
- [ ] Error rate < 1%
- [ ] Horizontal scaling tested (3-20 pods)
- [ ] Load testing completed (10K concurrent users)

### 2. Security ✅
- [ ] SQL injection prevention validated
- [ ] Prompt injection detection active
- [ ] RBAC implemented and tested
- [ ] Audit logging enabled
- [ ] PII masking working
- [ ] Rate limiting configured

### 3. Observability ✅
- [ ] Metrics exported to Prometheus
- [ ] Grafana dashboards created
- [ ] Distributed tracing (Jaeger)
- [ ] Centralized logging (ELK)
- [ ] Alerts configured (PagerDuty)

### 4. Reliability ✅
- [ ] Health checks (liveness + readiness)
- [ ] Circuit breakers for external services
- [ ] Graceful degradation on failures
- [ ] Retry logic with exponential backoff
- [ ] Chaos engineering tests passed

### 5. Data & Backups ✅
- [ ] Daily vector store backups
- [ ] Database backups (automated)
- [ ] Disaster recovery plan documented
- [ ] Restore tested (RTO < 1 hour)

### 6. Cost Optimization ✅
- [ ] Multi-layer caching active
- [ ] Model routing (GPT-3.5 vs GPT-4)
- [ ] Token optimization (prompts < 500 tokens)
- [ ] Resource limits set
- [ ] Cost monitoring with alerts

### 7. Documentation ✅
- [ ] API documentation (OpenAPI/Swagger)
- [ ] Runbooks for common issues
- [ ] Architecture diagrams
- [ ] Onboarding guide for new team members
```

---
**Autor:** Luis J. Raigoso V. (LJRV)

### Siguientes pasos

1. **Caché**: implementa Redis para queries frecuentes
2. **Feedback loop**: permite al usuario marcar respuestas correctas/incorrectas
3. **Multi-tabla**: soporta JOINs automáticos
4. **Exportación**: permite descargar resultados en CSV/Excel
5. **Auditoría**: loggea todas las queries ejecutadas
6. **Permisos**: implementa autenticación y control de acceso
7. **Visualizaciones**: genera gráficos automáticos según el tipo de datos
8. **Sugerencias**: recomienda preguntas frecuentes

## Evaluación

**Criterios**:

- ✅ Sistema RAG funcional (20%)
- ✅ NL2SQL con validación de seguridad (25%)
- ✅ Clasificación correcta de intención (15%)
- ✅ Interfaz funcional (20%)
- ✅ Manejo de errores (10%)
- ✅ Código limpio y documentado (10%)