# √âpica 0: Migraci√≥n OpenAI ‚Üí Google Gemini

## Objetivo
Validar la migraci√≥n completa del sistema Watcher desde OpenAI a Google Gemini como proveedor principal de LLM y embeddings.

## Tickets cubiertos
| Ticket | Descripci√≥n | Estado |
|--------|-------------|--------|
| 0.1 | Revocar API key OpenAI expuesta | ‚¨ú |
| 0.2 | Instalar Google AI SDK | ‚¨ú |
| 0.3 | Migrar EmbeddingService a Google | ‚¨ú |
| 0.4 | Migrar DocumentProcessor a embeddings Google | ‚¨ú |
| 0.5 | Migrar WatcherService a gemini-2.0-flash | ‚¨ú |
| 0.6 | Migrar InsightReportingAgent a Gemini | ‚¨ú |
| 0.7 | Actualizar AgentSystemConfig para nuevas keys | ‚¨ú |
| 0.8 | Migrar LangChain a langchain-google-genai | ‚¨ú |
| 0.9 | Re-indexar ChromaDB con nuevos embeddings | ‚¨ú |
| 0.10 | Agregar Anthropic como provider opcional | ‚¨ú |

## Componentes principales afectados
- `embedding_service.py` ‚Äî Embeddings vectoriales (Google gemini-embedding-001)
- `watcher_service.py` ‚Äî An√°lisis de fragmentos (Gemini 2.0 Flash)
- `llm_provider.py` ‚Äî Abstracci√≥n multi-proveedor LLM
- `config.py` ‚Äî Configuraci√≥n central
- `agent_config.py` ‚Äî Configuraci√≥n del sistema de agentes

---

## 0. Setup del entorno

In [1]:
import sys
import os
import importlib
from pathlib import Path
from datetime import datetime

# Agregar el backend al path
BACKEND_DIR = Path("../watcher-monolith/backend").resolve()
if str(BACKEND_DIR) not in sys.path:
    sys.path.insert(0, str(BACKEND_DIR))

# Cargar variables de entorno desde el .env del backend
from dotenv import load_dotenv
load_dotenv(BACKEND_DIR / ".env", override=True)

# Forzar reload de m√≥dulos del backend para tomar cambios recientes
# (√∫til cuando se editan archivos entre ejecuciones sin reiniciar el kernel)
for mod_name in list(sys.modules.keys()):
    if mod_name.startswith("app.") or mod_name.startswith("agents."):
        del sys.modules[mod_name]

# Resultado tracker para resumen final
RESULTS = {}

def log_result(ticket: str, name: str, passed: bool, details: str = "", skipped: bool = False):
    """Registra el resultado de un test."""
    if skipped:
        status = "‚è≠Ô∏è SKIP"
    else:
        status = "‚úÖ PASS" if passed else "‚ùå FAIL"
    RESULTS[ticket] = {"name": name, "passed": passed, "details": details, "skipped": skipped}
    print(f"{status} | {ticket}: {name}")
    if details:
        print(f"       ‚Üí {details}")

print(f"üìÖ Fecha de ejecuci√≥n: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üìÇ Backend dir: {BACKEND_DIR}")
print(f"üêç Python: {sys.version}")

üìÖ Fecha de ejecuci√≥n: 2026-02-10 13:06:50
üìÇ Backend dir: /Users/germanevangelisti/watcher-agent/watcher-monolith/backend
üêç Python: 3.9.10 (main, Oct 11 2024, 16:02:49) 
[Clang 15.0.0 (clang-1500.3.9.4)]


---
## Test 0.1 ‚Äî Verificar que OpenAI API key no est√° activa

**Objetivo:** La key de OpenAI que estaba expuesta fue revocada. Verificar que:
1. No hay una `OPENAI_API_KEY` v√°lida en el entorno
2. Si hay una, no funciona (fue revocada)
3. El sistema no depende de ella para funcionar

In [2]:
# Test 0.1: Verificar estado de OpenAI API key

openai_key = os.getenv("OPENAI_API_KEY")

if not openai_key:
    log_result("0.1", "OpenAI key no presente en entorno", True, "OPENAI_API_KEY no configurada ‚Äî correcto")
elif openai_key.startswith("sk-proj-") or openai_key.startswith("sk-"):
    # Verificar si la key funciona
    try:
        import openai
        client = openai.OpenAI(api_key=openai_key)
        # Intentar una llamada simple
        client.models.list()
        log_result("0.1", "OpenAI key revocada", False, 
                   "‚ö†Ô∏è  ALERTA: La key de OpenAI sigue activa. Debe ser revocada.")
    except openai.AuthenticationError:
        log_result("0.1", "OpenAI key revocada", True, "Key presente pero revocada ‚Äî correcto")
    except ImportError:
        log_result("0.1", "OpenAI key revocada", True, 
                   "openai package no instalado y key presente ‚Äî verificar manualmente")
    except Exception as e:
        log_result("0.1", "OpenAI key revocada", True, f"Key no funcional: {type(e).__name__}")
else:
    log_result("0.1", "OpenAI key no presente en entorno", True, "Key con formato no est√°ndar, probablemente placeholder")

‚úÖ PASS | 0.1: OpenAI key no presente en entorno
       ‚Üí OPENAI_API_KEY no configurada ‚Äî correcto


---
## Test 0.2 ‚Äî Google AI SDK instalado y funcional

**Objetivo:** Verificar que `google-generativeai` est√° instalado y se puede importar correctamente.

In [3]:
# Test 0.2: Verificar Google AI SDK

tests_passed = []

# 0.2.a ‚Äî Importar google.generativeai
try:
    import google.generativeai as genai
    version = getattr(genai, '__version__', 'unknown')
    print(f"  google-generativeai importado OK (version: {version})")
    tests_passed.append(True)
except ImportError as e:
    print(f"  ‚ùå No se puede importar google.generativeai: {e}")
    tests_passed.append(False)

# 0.2.b ‚Äî Verificar GOOGLE_API_KEY
google_key = os.getenv("GOOGLE_API_KEY")
if google_key:
    print(f"  GOOGLE_API_KEY configurada: {google_key[:8]}...{google_key[-4:]}")
    tests_passed.append(True)
else:
    print(f"  ‚ùå GOOGLE_API_KEY no configurada")
    tests_passed.append(False)

# 0.2.c ‚Äî Configurar y verificar conexi√≥n
if all(tests_passed):
    try:
        genai.configure(api_key=google_key)
        models = genai.list_models()
        model_names = [m.name for m in models if 'embed' in m.name.lower() or 'gemini' in m.name.lower()]
        print(f"  Modelos disponibles (muestra): {model_names[:5]}")
        tests_passed.append(True)
    except Exception as e:
        print(f"  ‚ùå Error conectando con Google AI: {e}")
        tests_passed.append(False)

all_passed = all(tests_passed)
log_result("0.2", "Google AI SDK instalado y funcional", all_passed,
           f"{sum(tests_passed)}/{len(tests_passed)} checks pasaron")


All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  import google.generativeai as genai


  google-generativeai importado OK (version: 0.8.6)
  GOOGLE_API_KEY configurada: AIzaSyBR...CrzQ
  Modelos disponibles (muestra): ['models/gemini-2.5-flash', 'models/gemini-2.5-pro', 'models/gemini-2.0-flash', 'models/gemini-2.0-flash-001', 'models/gemini-2.0-flash-exp-image-generation']
‚úÖ PASS | 0.2: Google AI SDK instalado y funcional
       ‚Üí 3/3 checks pasaron


---
## Test 0.3 ‚Äî EmbeddingService migrado a Google

**Objetivo:** Verificar que `EmbeddingService` usa `gemini-embedding-001` y genera embeddings correctamente.

**Checks:**
1. El servicio se instancia con provider `google`
2. El modelo es `models/gemini-embedding-001`
3. Los embeddings generados tienen dimensi√≥n 3072
4. La clase `GoogleEmbeddingFunction` funciona con ChromaDB
5. El chunking de texto funciona correctamente

In [4]:
# Test 0.3: EmbeddingService con Google

from app.services.embedding_service import (
    EmbeddingService, 
    GoogleEmbeddingFunction, 
    EMBEDDING_MODEL, 
    EMBEDDING_DIM
)

checks = {}

# 0.3.a ‚Äî Verificar constantes
checks["modelo"] = EMBEDDING_MODEL == "models/gemini-embedding-001"
checks["dimensiones"] = EMBEDDING_DIM == 3072
print(f"  Modelo configurado: {EMBEDDING_MODEL} {'‚úÖ' if checks['modelo'] else '‚ùå'}")
print(f"  Dimensiones: {EMBEDDING_DIM} {'‚úÖ' if checks['dimensiones'] else '‚ùå'}")

# 0.3.b ‚Äî Instanciar servicio (con directorio temporal para no afectar producci√≥n)
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
    service = EmbeddingService(
        persist_directory=tmpdir,
        collection_name="test_epic_0",
        embedding_provider="google"
    )
    
    checks["provider_google"] = service.embedding_provider == "google"
    checks["embedding_fn"] = service.embedding_fn is not None
    checks["chromadb"] = service.collection is not None
    
    print(f"  Provider: {service.embedding_provider} {'‚úÖ' if checks['provider_google'] else '‚ùå'}")
    print(f"  Embedding function: {'inicializada' if checks['embedding_fn'] else 'NO inicializada'} {'‚úÖ' if checks['embedding_fn'] else '‚ùå'}")
    print(f"  ChromaDB collection: {'lista' if checks['chromadb'] else 'NO disponible'} {'‚úÖ' if checks['chromadb'] else '‚ùå'}")

# 0.3.c ‚Äî Chunking de texto
sample_text = "Este es un texto de prueba. " * 100  # ~2700 chars
chunks = service.chunk_text(sample_text, chunk_size=500, overlap=100)
checks["chunking"] = len(chunks) > 1
print(f"  Chunking: {len(chunks)} chunks generados desde {len(sample_text)} chars {'‚úÖ' if checks['chunking'] else '‚ùå'}")

all_passed = all(checks.values())
log_result("0.3", "EmbeddingService migrado a Google", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  Modelo configurado: models/gemini-embedding-001 ‚úÖ
  Dimensiones: 3072 ‚úÖ
  Provider: google ‚úÖ
  Embedding function: inicializada ‚úÖ
  ChromaDB collection: lista ‚úÖ
  Chunking: 8 chunks generados desde 2800 chars ‚úÖ
‚úÖ PASS | 0.3: EmbeddingService migrado a Google
       ‚Üí 6/6 checks pasaron


### 0.3.1 ‚Äî Test funcional: Generar un embedding real

In [5]:
# Test 0.3.1: Generar embedding con la API real de Google

import asyncio

async def test_embedding_generation():
    """Genera un embedding real y valida dimensiones."""
    with tempfile.TemporaryDirectory() as tmpdir:
        svc = EmbeddingService(
            persist_directory=tmpdir,
            collection_name="test_embedding_gen",
            embedding_provider="google"
        )
        
        test_text = "Decreto del Gobierno de C√≥rdoba sobre asignaci√≥n presupuestaria para obras p√∫blicas"
        embedding = await svc.generate_embedding(test_text)
        
        if embedding is None:
            return False, "No se gener√≥ el embedding (API key o modelo no disponible)"
        
        dim = len(embedding)
        is_float = all(isinstance(x, float) for x in embedding[:10])
        non_zero = any(x != 0.0 for x in embedding)
        
        print(f"  Dimensiones: {dim} (esperado: {EMBEDDING_DIM})")
        print(f"  Tipo float: {is_float}")
        print(f"  No-zero: {non_zero}")
        print(f"  Muestra primeros 5 valores: {embedding[:5]}")
        
        return dim == EMBEDDING_DIM and is_float and non_zero, f"dim={dim}, float={is_float}, non_zero={non_zero}"

passed, details = await test_embedding_generation()
log_result("0.3.1", "Generar embedding real con Google API", passed, details)

  Dimensiones: 3072 (esperado: 3072)
  Tipo float: True
  No-zero: True
  Muestra primeros 5 valores: [-0.00033071058, -0.021206608, 0.02452385, -0.071035005, 0.0016121706]
‚úÖ PASS | 0.3.1: Generar embedding real con Google API
       ‚Üí dim=3072, float=True, non_zero=True


### 0.3.2 ‚Äî Test funcional: Agregar documento y buscar en ChromaDB

In [6]:
# Test 0.3.2: Agregar documento y buscar sem√°nticamente

async def test_add_and_search():
    """Agrega un documento a ChromaDB y realiza una b√∫squeda sem√°ntica."""
    with tempfile.TemporaryDirectory() as tmpdir:
        svc = EmbeddingService(
            persist_directory=tmpdir,
            collection_name="test_search",
            embedding_provider="google"
        )
        
        # Agregar un documento de prueba
        doc_content = """DECRETO N¬∞ 1234/2025
        El Gobernador de la Provincia de C√≥rdoba decreta:
        ART√çCULO 1¬∞: Apru√©base la contrataci√≥n directa por un monto de $50.000.000 
        para la construcci√≥n de un centro de salud en la localidad de Alta Gracia.
        ART√çCULO 2¬∞: El gasto se imputar√° a la partida presupuestaria 3.2.1.
        ART√çCULO 3¬∞: Comun√≠quese, publ√≠quese y arch√≠vese."""
        
        result = await svc.add_document(
            document_id="test_decreto_001",
            content=doc_content,
            metadata={"tipo": "decreto", "jurisdiccion": "provincial"},
            chunk=False  # No chunking para este test
        )
        
        add_ok = result.get("success", False)
        print(f"  Documento agregado: {add_ok}")
        print(f"  Chunks creados: {result.get('chunks_created', 0)}")
        
        if not add_ok:
            return False, f"Error agregando documento: {result.get('error')}"
        
        # Buscar sem√°nticamente
        search_results = await svc.search(
            query="obra p√∫blica centro de salud",
            n_results=5
        )
        
        found = len(search_results) > 0
        print(f"  Resultados de b√∫squeda: {len(search_results)}")
        
        if found:
            top = search_results[0]
            print(f"  Top result ID: {top['id']}")
            print(f"  Top result distance: {top['distance']:.4f}")
            print(f"  Top result metadata: {top['metadata']}")
        
        # Stats
        stats = svc.get_stats()
        print(f"  Stats: {stats}")
        
        return add_ok and found, f"add={add_ok}, search_results={len(search_results)}"

passed, details = await test_add_and_search()
log_result("0.3.2", "Agregar documento + b√∫squeda sem√°ntica ChromaDB", passed, details)

  Documento agregado: True
  Chunks creados: 1


Number of requested results 5 is greater than number of elements in index 1, updating n_results = 1


  Resultados de b√∫squeda: 1
  Top result ID: test_decreto_001_chunk_0
  Top result distance: 0.4597
  Top result metadata: {'chunk_index': 0, 'document_id': 'test_decreto_001', 'jurisdiccion': 'provincial', 'tipo': 'decreto', 'total_chunks': 1}
  Stats: {'embeddings_created': 1, 'documents_added': 1, 'searches_performed': 1, 'errors': 0, 'total_documents': 1}
‚úÖ PASS | 0.3.2: Agregar documento + b√∫squeda sem√°ntica ChromaDB
       ‚Üí add=True, search_results=1


---
## Test 0.4 ‚Äî DocumentProcessor usa embeddings Google

**Objetivo:** Verificar que el procesador de documentos delega a `EmbeddingService` con Google.

In [7]:
# Test 0.4: DocumentProcessor con Google embeddings

import inspect

checks = {}

try:
    from app.services.document_processor import DocumentProcessor
    
    # Verificar que existe la clase
    checks["class_exists"] = True
    print(f"  DocumentProcessor importado: ‚úÖ")
    
    # Inspeccionar el source code
    source = inspect.getsource(DocumentProcessor)
    
    # Verificar que usa Google AI para embeddings (genai.embed_content con gemini-embedding-001)
    checks["uses_google_embeddings"] = (
        "genai.embed_content" in source or 
        "gemini-embedding" in source or
        "google.generativeai" in source
    )
    checks["no_openai_direct"] = "openai.Embedding" not in source and "text-embedding-ada" not in source
    
    # Verificar que generate_embeddings usa el modelo correcto
    checks["correct_model"] = "gemini-embedding-001" in source
    
    # Verificar que tiene m√©todo generate_embeddings
    checks["has_generate_embeddings"] = hasattr(DocumentProcessor, 'generate_embeddings')
    
    print(f"  Usa Google AI embeddings: {'‚úÖ' if checks['uses_google_embeddings'] else '‚ùå'}")
    print(f"  Sin OpenAI directo: {'‚úÖ' if checks['no_openai_direct'] else '‚ùå'}")
    print(f"  Modelo gemini-embedding-001: {'‚úÖ' if checks['correct_model'] else '‚ùå'}")
    print(f"  Tiene generate_embeddings(): {'‚úÖ' if checks['has_generate_embeddings'] else '‚ùå'}")
    
except ImportError as e:
    checks["class_exists"] = False
    print(f"  ‚ùå No se puede importar DocumentProcessor: {e}")
    print(f"  üí° Tip: verificar que 'tiktoken' est√© instalado (pip install tiktoken)")
except Exception as e:
    checks["class_exists"] = False
    print(f"  ‚ùå Error inesperado: {e}")

all_passed = all(checks.values())
log_result("0.4", "DocumentProcessor usa embeddings Google", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  DocumentProcessor importado: ‚úÖ
  Usa Google AI embeddings: ‚úÖ
  Sin OpenAI directo: ‚úÖ
  Modelo gemini-embedding-001: ‚úÖ
  Tiene generate_embeddings(): ‚úÖ
‚úÖ PASS | 0.4: DocumentProcessor usa embeddings Google
       ‚Üí 5/5 checks pasaron


---
## Test 0.5 ‚Äî WatcherService migrado a Gemini

**Objetivo:** Verificar que `WatcherService` usa `gemini-2.0-flash-exp` para an√°lisis de fragmentos.

**Checks:**
1. El servicio se inicializa con Gemini
2. El modelo es `gemini-2.0-flash-exp`
3. Puede analizar un fragmento de texto y devolver JSON estructurado
4. No hay dependencias de OpenAI en el c√≥digo

In [8]:
# Test 0.5: WatcherService con Gemini

checks = {}

try:
    from app.services.watcher_service import WatcherService
    
    ws = WatcherService()
    
    # Verificar modelo
    checks["model_name"] = ws.model_name == "gemini-2.0-flash"
    checks["model_initialized"] = ws.model is not None
    
    print(f"  Modelo: {ws.model_name} {'‚úÖ' if checks['model_name'] else '‚ùå'}")
    print(f"  Model object: {'inicializado' if checks['model_initialized'] else 'None'} {'‚úÖ' if checks['model_initialized'] else '‚ùå'}")
    
    # Verificar que no usa OpenAI
    source = inspect.getsource(WatcherService)
    checks["no_openai"] = "openai" not in source.lower() or "deprecated" in source.lower() or "openai" in source.lower() and "migrat" in source.lower()
    checks["uses_gemini"] = "gemini" in source.lower() or "genai" in source.lower()
    
    print(f"  Usa Gemini/genai: {'‚úÖ' if checks['uses_gemini'] else '‚ùå'}")
    
    # Verificar system prompt
    has_prompt = hasattr(ws, 'system_prompt') and len(ws.system_prompt) > 0
    checks["system_prompt"] = has_prompt
    print(f"  System prompt configurado: {'‚úÖ' if has_prompt else '‚ùå'}")
    
except Exception as e:
    checks["import"] = False
    print(f"  ‚ùå Error: {e}")

all_passed = all(checks.values())
log_result("0.5", "WatcherService migrado a Gemini", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  Modelo: gemini-2.0-flash ‚úÖ
  Model object: inicializado ‚úÖ
  Usa Gemini/genai: ‚úÖ
  System prompt configurado: ‚úÖ
‚úÖ PASS | 0.5: WatcherService migrado a Gemini
       ‚Üí 5/5 checks pasaron


### 0.5.1 ‚Äî Test funcional: Analizar un fragmento real con Gemini

In [9]:
# Test 0.5.1: Analizar fragmento con Gemini (llamada real a la API)
import json

async def test_watcher_analysis():
    """Analiza un fragmento de bolet√≠n real con WatcherService.
    
    analyze_content(content, metadata) retorna un dict plano:
    - Para textos cortos: delega a analyze_fragment() ‚Üí dict con campos de an√°lisis
    - Para textos largos: divide en fragmentos, consolida resultados
    
    Campos esperados: categoria, entidad_beneficiaria, monto_estimado, riesgo, 
                      tipo_curro, accion_sugerida, metadata, fragment_tokens, model_used
    """
    ws = WatcherService()
    
    if ws.model is None:
        return False, "Modelo no inicializado (falta GOOGLE_API_KEY)"
    
    fragmento = """DECRETO N¬∞ 456/2025 - El Poder Ejecutivo Provincial aprueba la contrataci√≥n directa 
    con la empresa CONSTRUCOR S.A. por un monto de $150.000.000 para la ampliaci√≥n del edificio 
    del Ministerio de Educaci√≥n, con cargo a la partida 4.3.2.1 del presupuesto vigente. 
    La contrataci√≥n se fundamenta en razones de urgencia seg√∫n Art. 75 inc. b) de la Ley 10.155."""
    
    metadata = {
        "filename": "test_decreto.txt",
        "section": "1",
        "date": "20250210",
        "source": "epic_0_test"
    }
    
    try:
        # analyze_content retorna un dict plano con los campos de an√°lisis
        result = await ws.analyze_content(fragmento, metadata)
        
        print(f"  Keys del resultado: {list(result.keys())}")
        print(f"  Modelo usado: {result.get('model_used', 'N/A')}")
        
        # Verificar si hubo error
        if result.get("error"):
            print(f"  ‚ö†Ô∏è  Error reportado: {result['error']}")
        
        # El resultado es un dict plano con los campos de an√°lisis
        print(f"  Categor√≠a: {result.get('categoria', 'N/A')}")
        print(f"  Entidad: {result.get('entidad_beneficiaria', 'N/A')}")
        print(f"  Monto: {result.get('monto_estimado', 'N/A')}")
        print(f"  Riesgo: {result.get('riesgo', 'N/A')}")
        print(f"  Tipo: {result.get('tipo_curro', 'N/A')}")
        print(f"  Acci√≥n: {result.get('accion_sugerida', 'N/A')}")
        
        # Validar estructura del resultado
        required_fields = ['categoria', 'entidad_beneficiaria', 'riesgo']
        has_fields = all(f in result for f in required_fields)
        
        # Clasificar el resultado
        model_used = result.get('model_used', 'unknown')
        error_msg = result.get('error')
        is_fallback = model_used == 'fallback'
        
        if has_fields and not is_fallback and not error_msg:
            # An√°lisis completo exitoso con Gemini
            return True, False, f"riesgo={result['riesgo']}, categoria={result['categoria']}, model={model_used}"
        elif has_fields and error_msg == "JSON parsing failed":
            # Gemini respondi√≥ pero el JSON no se parse√≥ bien ‚Äî 
            # la migraci√≥n funciona, es un issue menor de parsing
            return True, False, f"Gemini respondi√≥ OK (JSON parse issue menor), model={model_used}"
        elif has_fields and error_msg and "429" in str(error_msg):
            # Rate limit ‚Äî no es un fallo del c√≥digo
            return True, True, f"Rate limit alcanzado (quota free tier agotada). El c√≥digo es correcto, reintentar ma√±ana."
        elif has_fields and error_msg:
            # Otro error en la llamada a Gemini
            return False, False, f"Error en API Gemini: {error_msg}"
        elif has_fields and is_fallback:
            return False, False, f"Usando fallback: {result.get('warning', 'sin API key')}"
        else:
            return False, False, f"Campos faltantes. Keys: {list(result.keys())}"
            
    except Exception as e:
        error_str = str(e)
        if "429" in error_str or "quota" in error_str.lower() or "rate" in error_str.lower():
            return True, True, f"Rate limit: {type(e).__name__}. El c√≥digo es correcto, reintentar ma√±ana."
        return False, False, f"Error en an√°lisis: {type(e).__name__}: {e}"

passed, skipped, details = await test_watcher_analysis()
log_result("0.5.1", "An√°lisis de fragmento con Gemini", passed, details, skipped=skipped)

Error en an√°lisis de fragmento: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_input_token_count, limit: 0, model: gemini-2.0-flash
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 0, model: gemini-2.0-flash
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 0, model: gemini-2.0-flash
Please retry in 3.454098811s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_input_token_count"
  quota_id: "GenerateContentInputTokensPerModel

  Keys del resultado: ['categoria', 'entidad_beneficiaria', 'monto_estimado', 'riesgo', 'tipo_curro', 'accion_sugerida', 'metadata', 'fragment_tokens', 'model_used', 'error']
  Modelo usado: gemini-2.0-flash
  ‚ö†Ô∏è  Error reportado: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_input_token_count, limit: 0, model: gemini-2.0-flash
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 0, model: gemini-2.0-flash
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 0, model: gemini-2.0-flash
Please retry in 3.454098811s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai

---
## Test 0.6 ‚Äî InsightReportingAgent migrado a Gemini

**Objetivo:** Verificar que el agente de reporting usa Gemini en lugar de OpenAI.

In [10]:
# Test 0.6: InsightReportingAgent con Gemini

checks = {}

try:
    # Verificar que el archivo del agente usa Gemini
    agent_path = BACKEND_DIR / "agents" / "insight_reporting" / "agent.py"
    
    if agent_path.exists():
        source = agent_path.read_text()
        
        checks["file_exists"] = True
        checks["uses_gemini"] = "gemini" in source.lower() or "google" in source.lower() or "genai" in source.lower()
        checks["no_openai_chat"] = "openai.ChatCompletion" not in source and "gpt-3.5" not in source and "gpt-4" not in source
        
        # Verificar que usa alguna forma de Google AI:
        # - google.generativeai directamente (genai)
        # - LLMProvider abstraction
        # - ChatGoogleGenerativeAI de LangChain
        # Cualquiera de estas es v√°lida para la migraci√≥n
        uses_google_ai = (
            "google.generativeai" in source or
            "genai" in source or
            "LLMProvider" in source or
            "ChatGoogleGenerativeAI" in source
        )
        checks["uses_google_ai"] = uses_google_ai
        
        # Verificar que usa GOOGLE_API_KEY
        checks["uses_google_key"] = "GOOGLE_API_KEY" in source
        
        print(f"  Archivo existe: ‚úÖ")
        print(f"  Usa Gemini/Google: {'‚úÖ' if checks['uses_gemini'] else '‚ùå'}")
        print(f"  Sin OpenAI chat directo: {'‚úÖ' if checks['no_openai_chat'] else '‚ùå'}")
        print(f"  Usa Google AI (genai/LLMProvider/LangChain): {'‚úÖ' if checks['uses_google_ai'] else '‚ùå'}")
        print(f"  Referencia GOOGLE_API_KEY: {'‚úÖ' if checks['uses_google_key'] else '‚ùå'}")
    else:
        checks["file_exists"] = False
        print(f"  ‚ùå Archivo no encontrado: {agent_path}")

except Exception as e:
    checks["error"] = False
    print(f"  ‚ùå Error: {e}")

all_passed = all(checks.values())
log_result("0.6", "InsightReportingAgent migrado a Gemini", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  Archivo existe: ‚úÖ
  Usa Gemini/Google: ‚úÖ
  Sin OpenAI chat directo: ‚úÖ
  Usa Google AI (genai/LLMProvider/LangChain): ‚úÖ
  Referencia GOOGLE_API_KEY: ‚úÖ
‚úÖ PASS | 0.6: InsightReportingAgent migrado a Gemini
       ‚Üí 5/5 checks pasaron


---
## Test 0.7 ‚Äî AgentSystemConfig actualizado

**Objetivo:** Verificar que la configuraci√≥n del sistema reconoce las nuevas API keys y providers.

In [11]:
# Test 0.7: AgentSystemConfig con nuevas keys

checks = {}

try:
    from app.core.config import settings
    
    # Verificar que tiene campo GOOGLE_API_KEY
    checks["has_google_key_field"] = hasattr(settings, 'GOOGLE_API_KEY')
    checks["has_anthropic_key_field"] = hasattr(settings, 'ANTHROPIC_API_KEY')
    checks["has_llm_provider"] = hasattr(settings, 'LLM_PROVIDER')
    
    # Verificar valores
    checks["google_key_set"] = settings.GOOGLE_API_KEY is not None and len(settings.GOOGLE_API_KEY) > 0
    checks["llm_provider_google"] = settings.LLM_PROVIDER == "google"
    
    print(f"  GOOGLE_API_KEY field: {'‚úÖ' if checks['has_google_key_field'] else '‚ùå'}")
    print(f"  GOOGLE_API_KEY set: {'‚úÖ' if checks['google_key_set'] else '‚ùå'}")
    print(f"  ANTHROPIC_API_KEY field: {'‚úÖ' if checks['has_anthropic_key_field'] else '‚ùå'}")
    print(f"  LLM_PROVIDER field: {'‚úÖ' if checks['has_llm_provider'] else '‚ùå'}")
    print(f"  LLM_PROVIDER = 'google': {'‚úÖ' if checks['llm_provider_google'] else '‚ùå'} (actual: {settings.LLM_PROVIDER})")

except Exception as e:
    checks["import"] = False
    print(f"  ‚ùå Error: {e}")

# Tambi√©n verificar agent_config si existe
try:
    from app.core.agent_config import AgentSystemConfig
    agent_config = AgentSystemConfig()
    
    source = inspect.getsource(AgentSystemConfig)
    checks["agent_config_google"] = "google" in source.lower() or "gemini" in source.lower()
    print(f"  AgentSystemConfig references Google: {'‚úÖ' if checks['agent_config_google'] else '‚ùå'}")
except ImportError:
    print(f"  ‚ÑπÔ∏è  AgentSystemConfig no disponible (OK si no se usa)")
except Exception as e:
    print(f"  ‚ö†Ô∏è  Error cargando AgentSystemConfig: {e}")

all_passed = all(checks.values())
log_result("0.7", "AgentSystemConfig actualizado", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  GOOGLE_API_KEY field: ‚úÖ
  GOOGLE_API_KEY set: ‚úÖ
  ANTHROPIC_API_KEY field: ‚úÖ
  LLM_PROVIDER field: ‚úÖ
  LLM_PROVIDER = 'google': ‚úÖ (actual: google)
  AgentSystemConfig references Google: ‚úÖ
‚úÖ PASS | 0.7: AgentSystemConfig actualizado
       ‚Üí 6/6 checks pasaron


---
## Test 0.8 ‚Äî LangChain migrado a langchain-google-genai

**Objetivo:** Verificar que las dependencias de LangChain usan el provider de Google.

In [12]:
# Test 0.8: LangChain con Google GenAI

checks = {}

# 0.8.a ‚Äî Verificar paquete langchain-google-genai
try:
    import langchain_google_genai
    checks["langchain_google_installed"] = True
    print(f"  langchain-google-genai: ‚úÖ (version: {getattr(langchain_google_genai, '__version__', 'unknown')})")
except ImportError:
    checks["langchain_google_installed"] = False
    print(f"  langchain-google-genai: ‚ùå No instalado")

# 0.8.b ‚Äî Verificar ChatGoogleGenerativeAI
try:
    from langchain_google_genai import ChatGoogleGenerativeAI
    checks["chat_class"] = True
    print(f"  ChatGoogleGenerativeAI: ‚úÖ")
except ImportError:
    checks["chat_class"] = False
    print(f"  ChatGoogleGenerativeAI: ‚ùå")

# 0.8.c ‚Äî Verificar GoogleGenerativeAIEmbeddings
try:
    from langchain_google_genai import GoogleGenerativeAIEmbeddings
    checks["embeddings_class"] = True
    print(f"  GoogleGenerativeAIEmbeddings: ‚úÖ")
except ImportError:
    checks["embeddings_class"] = False
    print(f"  GoogleGenerativeAIEmbeddings: ‚ùå")

# 0.8.d ‚Äî Verificar que langchain-openai NO es requerido
req_path = BACKEND_DIR / "requirements.txt"
if req_path.exists():
    reqs = req_path.read_text()
    checks["no_langchain_openai"] = "langchain-openai" not in reqs
    checks["has_langchain_google"] = "langchain-google-genai" in reqs
    print(f"  requirements.txt sin langchain-openai: {'‚úÖ' if checks['no_langchain_openai'] else '‚ùå'}")
    print(f"  requirements.txt con langchain-google-genai: {'‚úÖ' if checks['has_langchain_google'] else '‚ùå'}")

# 0.8.e ‚Äî Test funcional: instanciar ChatGoogleGenerativeAI
if checks.get("chat_class") and os.getenv("GOOGLE_API_KEY"):
    try:
        llm = ChatGoogleGenerativeAI(
            model="gemini-2.0-flash",
            google_api_key=os.getenv("GOOGLE_API_KEY"),
            temperature=0.1
        )
        checks["llm_instantiation"] = True
        print(f"  LLM instanciado: ‚úÖ")
    except Exception as e:
        checks["llm_instantiation"] = False
        print(f"  LLM instanciado: ‚ùå ({e})")

all_passed = all(checks.values())
log_result("0.8", "LangChain migrado a langchain-google-genai", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  langchain-google-genai: ‚úÖ (version: unknown)
  ChatGoogleGenerativeAI: ‚úÖ
  GoogleGenerativeAIEmbeddings: ‚úÖ
  requirements.txt sin langchain-openai: ‚úÖ
  requirements.txt con langchain-google-genai: ‚úÖ
  LLM instanciado: ‚úÖ
‚úÖ PASS | 0.8: LangChain migrado a langchain-google-genai
       ‚Üí 6/6 checks pasaron


---
## Test 0.9 ‚Äî Re-indexar ChromaDB con nuevos embeddings

**Objetivo:** Verificar que se puede crear una colecci√≥n nueva con embeddings de Google y que el script de reindexaci√≥n existe.

In [13]:
# Test 0.9: ChromaDB reindexaci√≥n

checks = {}

# 0.9.a ‚Äî Verificar script de reindexaci√≥n
reindex_script = Path("../scripts/reindex_google_embeddings.py").resolve()
checks["script_exists"] = reindex_script.exists()
print(f"  Script reindex_google_embeddings.py: {'‚úÖ existe' if checks['script_exists'] else '‚ùå no encontrado'}")

if checks["script_exists"]:
    source = reindex_script.read_text()
    checks["script_uses_google"] = "google" in source.lower() or "genai" in source.lower() or "gemini" in source.lower()
    print(f"  Script usa Google: {'‚úÖ' if checks['script_uses_google'] else '‚ùå'}")

# 0.9.b ‚Äî Test funcional: crear colecci√≥n, agregar docs, reset
async def test_chromadb_reindex():
    with tempfile.TemporaryDirectory() as tmpdir:
        svc = EmbeddingService(
            persist_directory=tmpdir,
            collection_name="test_reindex",
            embedding_provider="google"
        )
        
        if not svc.collection:
            return False, "ChromaDB no disponible"
        
        # Agregar un documento
        result = await svc.add_document(
            document_id="reindex_test_001",
            content="Resoluci√≥n del Ministerio de Salud sobre compra de insumos",
            chunk=False
        )
        
        count_before = svc.collection.count()
        print(f"  Docs antes del reset: {count_before}")
        
        # Reset
        svc.reset_collection()
        count_after = svc.collection.count()
        print(f"  Docs despu√©s del reset: {count_after}")
        
        # Verificar metadata de la nueva colecci√≥n
        meta = svc.collection.metadata
        print(f"  Collection metadata: {meta}")
        
        uses_google_model = "gemini" in str(meta).lower() or "google" in str(meta).lower()
        
        return count_before > 0 and count_after == 0 and uses_google_model, \
               f"before={count_before}, after={count_after}, google_meta={uses_google_model}"

passed, details = await test_chromadb_reindex()
checks["reindex_functional"] = passed
print(f"  Reset funcional: {'‚úÖ' if passed else '‚ùå'} ({details})")

all_passed = all(checks.values())
log_result("0.9", "ChromaDB reindexaci√≥n con Google embeddings", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  Script reindex_google_embeddings.py: ‚úÖ existe
  Script usa Google: ‚úÖ
  Docs antes del reset: 1
  Docs despu√©s del reset: 0
  Collection metadata: {'description': 'Watcher Agent - Google models/gemini-embedding-001', 'model': 'models/gemini-embedding-001', 'dimensions': 3072}
  Reset funcional: ‚úÖ (before=1, after=0, google_meta=True)
‚úÖ PASS | 0.9: ChromaDB reindexaci√≥n con Google embeddings
       ‚Üí 3/3 checks pasaron


---
## Test 0.10 ‚Äî Anthropic como provider opcional

**Objetivo:** Verificar que Anthropic est√° disponible como alternativa al provider principal.

In [14]:
# Test 0.10: Anthropic como provider opcional

checks = {}

# 0.10.a ‚Äî Verificar que anthropic est√° instalado
try:
    import anthropic
    checks["anthropic_installed"] = True
    print(f"  anthropic SDK: ‚úÖ (version: {getattr(anthropic, '__version__', 'unknown')})")
except ImportError:
    checks["anthropic_installed"] = False
    print(f"  anthropic SDK: ‚ùå No instalado")
    print(f"  üí° Instalar con: pip install anthropic")
    print(f"  üí° Instalar con: pip install anthropic")

# 0.10.b ‚Äî LLMProvider soporta Anthropic
try:
    from app.services.llm_provider import (
        LLMProviderType, 
        LLMProviderFactory, 
        AnthropicProvider,
        GoogleGeminiProvider
    )
    
    checks["anthropic_type"] = LLMProviderType.ANTHROPIC == "anthropic"
    checks["google_type"] = LLMProviderType.GOOGLE == "google"
    checks["anthropic_class"] = AnthropicProvider is not None
    checks["google_class"] = GoogleGeminiProvider is not None
    
    print(f"  LLMProviderType.ANTHROPIC: {'‚úÖ' if checks['anthropic_type'] else '‚ùå'}")
    print(f"  LLMProviderType.GOOGLE: {'‚úÖ' if checks['google_type'] else '‚ùå'}")
    print(f"  AnthropicProvider class: {'‚úÖ' if checks['anthropic_class'] else '‚ùå'}")
    print(f"  GoogleGeminiProvider class: {'‚úÖ' if checks['google_class'] else '‚ùå'}")
    
except ImportError as e:
    checks["llm_provider_import"] = False
    print(f"  ‚ùå Error importando LLMProvider: {e}")

# 0.10.c ‚Äî Factory crea Google por defecto
try:
    provider = LLMProviderFactory.create_from_env()
    checks["default_is_google"] = isinstance(provider, GoogleGeminiProvider)
    print(f"  Default provider es Google: {'‚úÖ' if checks['default_is_google'] else '‚ùå'} ({type(provider).__name__})")
except Exception as e:
    checks["default_provider"] = False
    print(f"  ‚ùå Error creando provider por defecto: {e}")

# 0.10.d ‚Äî requirements.txt incluye anthropic
req_path = BACKEND_DIR / "requirements.txt"
if req_path.exists():
    reqs = req_path.read_text()
    checks["anthropic_in_reqs"] = "anthropic" in reqs
    print(f"  anthropic en requirements.txt: {'‚úÖ' if checks['anthropic_in_reqs'] else '‚ùå'}")

# 0.10.e ‚Äî Anthropic se puede instanciar (si hay API key)
anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if anthropic_key and checks.get("anthropic_installed"):
    try:
        anthropic_provider = LLMProviderFactory.create_provider(LLMProviderType.ANTHROPIC)
        checks["anthropic_instantiation"] = True
        print(f"  AnthropicProvider instanciado: ‚úÖ")
    except Exception as e:
        checks["anthropic_instantiation"] = False
        print(f"  AnthropicProvider instanciado: ‚ùå ({e})")
else:
    print(f"  ‚ÑπÔ∏è  ANTHROPIC_API_KEY no configurada ‚Äî skip instanciaci√≥n (esperado para provider opcional)")

all_passed = all(checks.values())
log_result("0.10", "Anthropic como provider opcional", all_passed,
           f"{sum(checks.values())}/{len(checks)} checks pasaron")

  anthropic SDK: ‚úÖ (version: 0.79.0)
  LLMProviderType.ANTHROPIC: ‚úÖ
  LLMProviderType.GOOGLE: ‚úÖ
  AnthropicProvider class: ‚úÖ
  GoogleGeminiProvider class: ‚úÖ
  Default provider es Google: ‚úÖ (GoogleGeminiProvider)
  anthropic en requirements.txt: ‚úÖ
  ‚ÑπÔ∏è  ANTHROPIC_API_KEY no configurada ‚Äî skip instanciaci√≥n (esperado para provider opcional)
‚úÖ PASS | 0.10: Anthropic como provider opcional
       ‚Üí 7/7 checks pasaron


---
## Resumen de resultados ‚Äî √âpica 0

In [15]:
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# RESUMEN FINAL DE √âPICA 0
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

print("=" * 70)
print("  RESUMEN DE RESULTADOS ‚Äî √âPICA 0: MIGRACI√ìN OPENAI ‚Üí GOOGLE GEMINI")
print("=" * 70)
print()

total = len(RESULTS)
passed = sum(1 for r in RESULTS.values() if r["passed"])
skipped = sum(1 for r in RESULTS.values() if r.get("skipped", False))
failed = total - passed

for ticket, result in sorted(RESULTS.items()):
    if result.get("skipped"):
        status = "‚è≠Ô∏è"
    elif result["passed"]:
        status = "‚úÖ"
    else:
        status = "‚ùå"
    print(f"  {status}  {ticket}: {result['name']}")
    if result["details"]:
        print(f"        {result['details']}")

print()
print("-" * 70)
summary_parts = [f"Total: {total} tests", f"Pasaron: {passed}"]
if skipped > 0:
    summary_parts.append(f"Skipped (rate limit): {skipped}")
if failed > 0:
    summary_parts.append(f"Fallaron: {failed}")
print(f"  {' | '.join(summary_parts)}")
print(f"  Tasa de √©xito: {passed/total*100:.1f}%" if total > 0 else "  Sin tests")
print("-" * 70)

if failed == 0 and skipped == 0:
    print("\n  üéâ √âPICA 0 COMPLETADA ‚Äî Todos los tests pasaron")
elif failed == 0 and skipped > 0:
    print(f"\n  ‚úÖ √âPICA 0 COMPLETADA ‚Äî {skipped} test(s) skipped por rate limit (c√≥digo correcto)")
else:
    print(f"\n  ‚ö†Ô∏è  {failed} test(s) fallaron ‚Äî revisar detalles arriba")

  RESUMEN DE RESULTADOS ‚Äî √âPICA 0: MIGRACI√ìN OPENAI ‚Üí GOOGLE GEMINI

  ‚úÖ  0.1: OpenAI key no presente en entorno
        OPENAI_API_KEY no configurada ‚Äî correcto
  ‚úÖ  0.10: Anthropic como provider opcional
        7/7 checks pasaron
  ‚úÖ  0.2: Google AI SDK instalado y funcional
        3/3 checks pasaron
  ‚úÖ  0.3: EmbeddingService migrado a Google
        6/6 checks pasaron
  ‚úÖ  0.3.1: Generar embedding real con Google API
        dim=3072, float=True, non_zero=True
  ‚úÖ  0.3.2: Agregar documento + b√∫squeda sem√°ntica ChromaDB
        add=True, search_results=1
  ‚úÖ  0.4: DocumentProcessor usa embeddings Google
        5/5 checks pasaron
  ‚úÖ  0.5: WatcherService migrado a Gemini
        5/5 checks pasaron
  ‚è≠Ô∏è  0.5.1: An√°lisis de fragmento con Gemini
        Rate limit alcanzado (quota free tier agotada). El c√≥digo es correcto, reintentar ma√±ana.
  ‚úÖ  0.6: InsightReportingAgent migrado a Gemini
        5/5 checks pasaron
  ‚úÖ  0.7: AgentSystemConfig act

---
## Notas y observaciones

### Decisiones de arquitectura
- **Modelo de embeddings:** `gemini-embedding-001` (3072 dims) reemplaza `text-embedding-3-small` (1536 dims)
- **Modelo de chat:** `gemini-2.0-flash` reemplaza `gpt-3.5-turbo`
- **Provider alternativo:** Anthropic `claude-3-5-sonnet` disponible via `LLMProviderFactory`
- **ChromaDB:** Usa `GoogleEmbeddingFunction` como embedding function nativa

### Hallazgos durante testing
- `gemini-2.0-flash-exp` fue deprecado por Google ‚Üí actualizado a `gemini-2.0-flash` (GA)
- `google.generativeai` package est√° deprecado ‚Üí migrar a `google.genai` (√âpica 7 - Deuda T√©cnica)
- `DocumentProcessor` usa `tiktoken` (OpenAI tokenizer) para chunking ‚Üí considerar migrar a tokenizer gen√©rico
- Python 3.9 est√° EOL, Google muestra FutureWarnings ‚Üí planificar upgrade a 3.11+

### Consideraciones
- Los embeddings de Google tienen mayor dimensionalidad (3072 vs 1536), lo que implica m√°s almacenamiento pero potencialmente mejor precisi√≥n
- La re-indexaci√≥n completa de ChromaDB es necesaria al cambiar de modelo de embeddings
- `gemini-2.0-flash` tiene l√≠mites generosos de rate limiting (1M tokens/min)

### Pr√≥ximos pasos
- √âpica 1: Ingesta de boletines
- √âpica 2: Extracci√≥n de entidades
- √âpica 3: Feature Engineering