# ü§ñ OKLA ‚Äî Fine-Tuning Llama 3 8B con QLoRA (Dual-Mode v2.0)

> **FASE 3: Entrenamiento** | Proyecto OKLA ‚Äî Marketplace de Veh√≠culos RD

### üîå Ejecuci√≥n desde VS Code + Plugin Google Colab

Este notebook se ejecuta desde **VS Code** conectado a un runtime de **Google Colab** con GPU.

| Paso | Acci√≥n | Detalle |
|------|--------|---------|
| 1Ô∏è‚É£ | **Subir dataset a Drive** | Ejecuta `python3 upload_to_drive.py` o sube manualmente a `Drive > OKLA > dataset/` |
| 2Ô∏è‚É£ | **Abrir este notebook** | Abre `okla_finetune_llama3.ipynb` en VS Code |
| 3Ô∏è‚É£ | **Conectar a Colab** | Click **Select Kernel** (arriba derecha) ‚Üí **Colab** ‚Üí **New Colab Server** |
| 4Ô∏è‚É£ | **Seleccionar GPU** | Elige runtime con **GPU T4** (gratis) o **A100** (Colab Pro) |
| 5Ô∏è‚É£ | **Montar Google Drive** | `Cmd+Shift+P` ‚Üí **`Colab: Mount Google Drive to Server`** |
| 6Ô∏è‚É£ | **HuggingFace Token** | Necesitas acceso a `unsloth/Meta-Llama-3.1-8B-Instruct` (mirror libre, sin aprobaci√≥n) |
| 7Ô∏è‚É£ | **Ejecutar celdas** | Ejecuta secuencialmente (`Shift+Enter`) ‚Äî ‚è±Ô∏è ~2-3 horas total |

### ‚ö†Ô∏è Importante ‚Äî Runtime Remoto

- **El c√≥digo se ejecuta en el servidor de Colab** (GPU remota), NO en tu Mac
- **Los archivos locales NO existen** en el runtime ‚Äî usa Google Drive como puente
- **No cierres VS Code** durante el entrenamiento (~45-120 min en GPU T4)
- Si la sesi√≥n se desconecta, los checkpoints estar√°n en Google Drive

### üìÅ Estructura de Drive Esperada

```
Google Drive/
‚îî‚îÄ‚îÄ OKLA/
    ‚îú‚îÄ‚îÄ dataset/                    ‚Üê JSONL de FASE 2 (subidos por ti)
    ‚îÇ   ‚îú‚îÄ‚îÄ okla_train.jsonl        (~2,550 conversaciones)
    ‚îÇ   ‚îú‚îÄ‚îÄ okla_eval.jsonl         (~300 conversaciones)
    ‚îÇ   ‚îî‚îÄ‚îÄ okla_test.jsonl         (~150 conversaciones)
    ‚îî‚îÄ‚îÄ models/                     ‚Üê Output de este notebook (auto)
        ‚îú‚îÄ‚îÄ okla-llama3-adapter/    (~50-100 MB)
        ‚îî‚îÄ‚îÄ okla-llama3-8b-q4_k_m.gguf  (~4.7 GB)
```

### üìä Resumen del Entrenamiento (v2.0 Dual-Mode)

| Par√°metro | Valor |
|-----------|-------|
| Modelo base | Llama 3.1 8B Instruct (unsloth mirror) |
| T√©cnica | QLoRA (4-bit NF4) |
| Dataset | ~3,000 conversaciones OKLA **dual-mode** |
| Modos | 40% SingleVehicle / 50% DealerInventory / 10% Edge |
| Intents SV | 21 (Greeting, VehiclePrice, VehicleDetails, etc.) |
| Intents DI | 23 (VehicleSearch, VehicleComparison, CrossDealerRefusal, etc.) |
| Epochs | 3 |
| Batch size | 8 √ó 4 grad accum = 32 efectivo |
| GPU m√≠nima | T4 16GB (Colab Free) |
| Tiempo estimado | 45-120 min (T4) |
| Output | GGUF Q4_K_M (~4.7 GB) |

### üîß Par√°metros de Inferencia Post-Entrenamiento

| Par√°metro | Valor | Raz√≥n |
|-----------|-------|-------|
| Temperature | **0.3** | Baja para minimizar alucinaciones y forzar JSON |
| Repetition Penalty | **1.15** | Evita loops y repetici√≥n de datos inventados |
| Max Tokens | **600** | JSON responses ~150-400 tokens, DI mode necesita m√°s |
| N_CTX | **8192** | Sistema dual-mode: SV ~2200 tokens, DI ~3300 tokens + historial |

### üèóÔ∏è Arquitectura Dual-Mode

| Modo | System Prompt | Contexto | Funciones |
|------|---------------|----------|-----------|
| **SingleVehicle** | UN veh√≠culo fijo | ~500 tokens | Ninguna |
| **DealerInventory** | Inventario completo del dealer | ~1500 tokens | search, compare, details, appointment |
| **General** | FAQ marketplace | ~200 tokens | Ninguna |

> ‚ö†Ô∏è **REGLA FUNDAMENTAL**: Todo opera dentro de UN SOLO dealer.
> SingleVehicle rechaza preguntas sobre otros veh√≠culos.
> DealerInventory rechaza comparaciones con otros dealers.

In [2]:
# ============================================================
# 0. SETUP INICIAL ‚Äî VS Code + Colab Plugin
# ============================================================
# Esta celda configura el entorno de Colab desde VS Code.
# Ejecuta PRIMERO esta celda despu√©s de conectar al kernel.
# ============================================================
import sys
import os

# Verificar que estamos en Colab
IN_COLAB = 'google.colab' in sys.modules
if not IN_COLAB:
    try:
        import google.colab
        IN_COLAB = True
    except ImportError:
        pass

if not IN_COLAB:
    print("‚ùå ERROR: Este notebook requiere Google Colab")
    print()
    print("   Pasos para conectar desde VS Code:")
    print("   1. Click en 'Select Kernel' (esquina superior derecha)")
    print("   2. Selecciona 'Colab'")
    print("   3. Click 'New Colab Server'")
    print("   4. Elige runtime con GPU T4")
    print("   5. Re-ejecuta esta celda")
    raise RuntimeError("Conecta a Colab primero")

print("‚úÖ Conectado a Google Colab desde VS Code")

# ‚îÄ‚îÄ Configurar Google Drive ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# Monta Drive autom√°ticamente si no est√° montado
from pathlib import Path
DRIVE_MOUNT = Path("/content/drive")

if not DRIVE_MOUNT.exists() or not (DRIVE_MOUNT / "MyDrive").exists():
    print("üìÇ Montando Google Drive...")
    print("   (Se abrir√° una ventana de autorizaci√≥n en el navegador)")
    from google.colab import drive
    drive.mount("/content/drive")
else:
    print("‚úÖ Google Drive ya est√° montado")

# Verificar que el dataset est√° en Drive
DRIVE_DATASET = Path("/content/drive/MyDrive/OKLA/dataset")
dataset_ready = True

if DRIVE_DATASET.exists():
    files_found = list(DRIVE_DATASET.glob("*.jsonl"))
    if files_found:
        print(f"\nüìä Dataset encontrado en Drive:")
        for f in sorted(files_found):
            lines = sum(1 for _ in open(f))
            size_mb = f.stat().st_size / 1024 / 1024
            print(f"   ‚úÖ {f.name}: {lines} conv. ({size_mb:.1f} MB)")
    else:
        dataset_ready = False
else:
    dataset_ready = False

if not dataset_ready:
    print(f"\n‚ö†Ô∏è  Dataset NO encontrado en: {DRIVE_DATASET}")
    print()
    print("   Sube los archivos JSONL a Google Drive:")
    print("   1. Ve a drive.google.com")
    print("   2. Crea carpeta: OKLA > dataset")
    print("   3. Sube estos archivos desde tu workspace local:")
    print("      docs/chatbot-llm/FASE_2_DATASET/output/okla_train.jsonl")
    print("      docs/chatbot-llm/FASE_2_DATASET/output/okla_eval.jsonl")
    print("      docs/chatbot-llm/FASE_2_DATASET/output/okla_test.jsonl")
    print()
    print("   O ejecuta en terminal local:")
    print("   python3 docs/chatbot-llm/FASE_3_TRAINING/upload_to_drive.py")
    print()
    print("   Despu√©s de subir, re-ejecuta esta celda.")

print()
print("=" * 60)
print("üöÄ Setup completo. Ejecuta las celdas siguientes en orden.")
print("=" * 60)

‚úÖ Conectado a Google Colab desde VS Code
üìÇ Montando Google Drive...
   (Se abrir√° una ventana de autorizaci√≥n en el navegador)
Mounted at /content/drive

üìä Dataset encontrado en Drive:
   ‚úÖ okla_eval.jsonl: 298 conv. (1.6 MB)
   ‚úÖ okla_test.jsonl: 300 conv. (1.6 MB)
   ‚úÖ okla_train.jsonl: 2391 conv. (12.8 MB)

üöÄ Setup completo. Ejecuta las celdas siguientes en orden.


In [3]:
import os
from pathlib import Path

drive_path = Path('/content/drive/MyDrive')
okla_path = drive_path / 'OKLA'

if okla_path.exists():
    print(f"‚úÖ Carpeta OKLA encontrada en: {okla_path}")
    print("\nContenido de la carpeta OKLA:")
    # Listar subcarpetas para confirmar estructura
    for item in sorted(okla_path.iterdir()):
        status = "üìÅ" if item.is_dir() else "üìÑ"
        print(f"{status} {item.name}")

        if item.name == 'dataset' and item.is_dir():
            print("   ‚îî‚îÄ Archivos en dataset:", [f.name for f in item.glob('*.jsonl')])
else:
    print(f"‚ùå No se encontr√≥ la carpeta OKLA en la ra√≠z: {okla_path}")
    print("Revisa si el nombre tiene espacios o caracteres especiales.")

‚úÖ Carpeta OKLA encontrada en: /content/drive/MyDrive/OKLA

Contenido de la carpeta OKLA:
üìÑ CataÃÅlogo de Tutoriales PraÃÅcticos de CI CD.gdoc
üìÑ ESPECIFICACIOÃÅN DE REQUERIMIENTOS DEL SOFTWARE (SRS) - ¬´OptimizacioÃÅn Operativa¬ª .gdoc
üìÑ ESPECIFICACIOÃÅN DE REQUERIMIENTOS DEL SOFTWARE ‚Äì SRS.gdoc
üìÅ Jira
üìÑ SRS -  AplicacioÃÅn moÃÅvil para clientes (iOS & Android).gdoc
üìÑ SRS - Business Intelligence Avanzado.gdoc
üìÑ SRS - MoÃÅdulo ‚ÄúChatbot IA‚Äù.gdoc
üìÑ SRS - Sistema de RecomendacioÃÅn en Tiempo Real.gdoc
üìÑ TUT-003: SincronizacioÃÅn AutomaÃÅtica de Estado (Confluence ‚Üí Jira).gdoc
üìÅ Web
üìÅ chatbot-llm
üìÅ dataset
   ‚îî‚îÄ Archivos en dataset: ['okla_eval.jsonl', 'okla_test.jsonl', 'okla_train.jsonl']
üìÅ models


---
## 1Ô∏è‚É£ Verificar Entorno y GPU

Esta celda confirma que tienes GPU disponible. **Si no tienes GPU, detente aqu√≠** y cambia el runtime:  
`Runtime ‚Üí Change runtime type ‚Üí T4 GPU`

In [4]:
# ============================================================
# 1. VERIFICAR GPU Y ENTORNO
# ============================================================
# Esta celda verifica que est√°s conectado a un runtime de Colab
# con GPU disponible. Si usas el plugin VS Code Colab, el runtime
# ya deber√≠a estar activo.
# ============================================================
import torch
import subprocess
import sys
import os

print("=" * 60)
print("üîç OKLA Fine-Tuning ‚Äî Verificaci√≥n de Entorno")
print("=" * 60)

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    # Fixed: changed total_mem to total_memory
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"‚úÖ GPU detectada: {gpu_name}")
    print(f"   VRAM: {gpu_mem:.1f} GB")
    print(f"   CUDA: {torch.version.cuda}")
    print(f"   PyTorch: {torch.__version__}")
else:
    print("‚ùå NO se detect√≥ GPU!")
    print("   Ve a: Runtime ‚Üí Change runtime type ‚Üí T4 GPU")
    print("   Luego re-ejecuta esta celda.")
    raise RuntimeError("GPU requerida para fine-tuning")

# Check RAM
import psutil
ram_gb = psutil.virtual_memory().total / 1024**3
print(f"\nüíæ RAM del sistema: {ram_gb:.1f} GB")

# Check disk
disk = psutil.disk_usage('/')
print(f"üíø Disco disponible: {disk.free / 1024**3:.1f} GB")

print("\n" + "=" * 60)
print("‚úÖ Entorno listo para fine-tuning")
print("=" * 60)

üîç OKLA Fine-Tuning ‚Äî Verificaci√≥n de Entorno
‚úÖ GPU detectada: NVIDIA A100-SXM4-40GB
   VRAM: 39.5 GB
   CUDA: 12.8
   PyTorch: 2.9.0+cu128

üíæ RAM del sistema: 83.5 GB
üíø Disco disponible: 194.1 GB

‚úÖ Entorno listo para fine-tuning


---
## 2Ô∏è‚É£ Instalar Dependencias

Instala todas las librer√≠as necesarias para QLoRA fine-tuning.

In [5]:
# ============================================================
# 2. INSTALAR DEPENDENCIAS
# ============================================================
# Esto toma ~3-5 minutos en Colab Free

# ‚îÄ‚îÄ Dependencias principales ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
!pip install -q \
    "transformers>=4.43.0" \
    "datasets>=2.20.0" \
    "accelerate>=0.33.0" \
    "peft>=0.12.0" \
    "bitsandbytes>=0.46.1" \
    "trl>=0.9.0" \
    scipy \
    sentencepiece \
    protobuf \
    einops \
    huggingface_hub \
    wandb

# ‚îÄ‚îÄ flash-attn se instala por separado (puede fallar en T4) ‚îÄ
FLASH_ATTN_AVAILABLE = False
try:
    import flash_attn
    FLASH_ATTN_AVAILABLE = True
    print("‚úÖ flash-attn ya est√° instalado")
except ImportError:
    print("‚ö° Intentando instalar flash-attn (puede tomar 5-10 min)...")
    import subprocess
    result = subprocess.run(
        ["pip", "install", "-q", "flash-attn", "--no-build-isolation"],
        capture_output=True, text=True
    )
    if result.returncode == 0:
        FLASH_ATTN_AVAILABLE = True
        print("   ‚úÖ flash-attn instalado correctamente")
    else:
        print("   ‚ö†Ô∏è flash-attn no pudo instalarse (normal en T4, raro en A100)")
        print("   ‚Üí Se usar√° 'eager' attention (funciona igual, un poco m√°s lento)")

print(f"\n{'‚úÖ' if FLASH_ATTN_AVAILABLE else '‚ö†Ô∏è'} flash_attention_2: {'disponible' if FLASH_ATTN_AVAILABLE else 'no disponible (usando eager)'}")
print()

# Verify imports
import transformers
import datasets
import peft
import trl
import bitsandbytes

print(f"‚úÖ Dependencias instaladas:")
print(f"   transformers: {transformers.__version__}")
print(f"   datasets:     {datasets.__version__}")
print(f"   peft:         {peft.__version__}")
print(f"   trl:          {trl.__version__}")
print(f"   bitsandbytes: {bitsandbytes.__version__}")

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m59.1/59.1 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m540.5/540.5 kB[0m [31m51.7 MB/s[0m eta [36m0:00:00[0m
[?25h‚ö° Intentando instalar flash-attn (puede tomar 5-10 min)...
   ‚úÖ flash-attn instalado correctamente

‚úÖ flash_attention_2: disponible

‚úÖ Dependencias instaladas:
   transformers: 5.0.0
   datasets:     4.0.0
   peft:         0.18.1
   trl:          0.28.0
   bitsandbytes: 0.49.1


---
## 3Ô∏è‚É£ Cargar Dataset desde Google Drive

Los archivos JSONL de FASE 2 se cargan desde **Google Drive** (montado en el runtime de Colab).

### Preparaci√≥n (una sola vez):
1. En VS Code: `Cmd+Shift+P` ‚Üí **`Colab: Mount Google Drive to Server`**
2. O sube manualmente los 3 JSONL a `Google Drive > OKLA > dataset/`

> üí° **VS Code Colab Plugin:** No uses `files.upload()` ‚Äî los archivos locales no existen en el runtime.
> Google Drive es el m√©todo recomendado para transferir datos entre tu m√°quina y Colab.

In [6]:
# ============================================================
# 3. CARGAR DATASET DESDE GOOGLE DRIVE
# ============================================================
# Flujo VS Code + Colab Plugin:
#   1. Cmd+Shift+P ‚Üí "Colab: Mount Google Drive to Server"
#   2. O ejecuta esta celda para montar manualmente
# ============================================================
import json
import shutil
from pathlib import Path

DATA_DIR = Path("/content/dataset")
DATA_DIR.mkdir(exist_ok=True)

# ‚îÄ‚îÄ Montar Google Drive (si no est√° montado) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
DRIVE_MOUNT = Path("/content/drive")
if not DRIVE_MOUNT.exists():
    print("üìÇ Montando Google Drive...")
    print("   (Si usaste 'Colab: Mount Google Drive' desde VS Code, ya est√° montado)")
    from google.colab import drive
    drive.mount("/content/drive")
else:
    print("‚úÖ Google Drive ya est√° montado")

# ‚îÄ‚îÄ Copiar datos desde Drive ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
DRIVE_DATASET = Path("/content/drive/MyDrive/OKLA/dataset")

if DRIVE_DATASET.exists():
    for f in DRIVE_DATASET.glob("*.jsonl"):
        dest = DATA_DIR / f.name
        shutil.copy(f, dest)
        lines = sum(1 for _ in open(dest))
        print(f"   ‚úÖ {f.name} ‚Üí {dest} ({lines} conversaciones)")
else:
    print(f"‚ùå No se encontr√≥: {DRIVE_DATASET}")
    print()
    print("   Para subir los datos desde VS Code:")
    print("   1. Cmd+Shift+P ‚Üí 'Colab: Mount Google Drive to Server'")
    print("   2. Copia los JSONL a: Google Drive > OKLA > dataset/")
    print("      - okla_train.jsonl")
    print("      - okla_eval.jsonl")
    print("      - okla_test.jsonl")
    print()
    print("   üìÅ Los archivos est√°n en tu workspace local en:")
    print("   docs/chatbot-llm/FASE_2_DATASET/output/")
    raise FileNotFoundError("Dataset no encontrado en Google Drive")

# ‚îÄ‚îÄ Verificar ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"\nüìä Dataset en {DATA_DIR}:")
total = 0
for f in sorted(DATA_DIR.glob("*.jsonl")):
    lines = sum(1 for _ in open(f))
    total += lines
    print(f"   {f.name}: {lines:,} conversaciones")
print(f"   Total: {total:,} conversaciones")

‚úÖ Google Drive ya est√° montado
   ‚úÖ okla_eval.jsonl ‚Üí /content/dataset/okla_eval.jsonl (298 conversaciones)
   ‚úÖ okla_test.jsonl ‚Üí /content/dataset/okla_test.jsonl (300 conversaciones)
   ‚úÖ okla_train.jsonl ‚Üí /content/dataset/okla_train.jsonl (2391 conversaciones)

üìä Dataset en /content/dataset:
   okla_eval.jsonl: 298 conversaciones
   okla_test.jsonl: 300 conversaciones
   okla_train.jsonl: 2,391 conversaciones
   Total: 2,989 conversaciones


---
## 4Ô∏è‚É£ Cargar y Preparar Dataset

Convierte el JSONL al formato que espera `SFTTrainer` de `trl`.

In [7]:
# ============================================================
# 4. CARGAR Y PREPARAR DATASET
# ============================================================
from datasets import load_dataset, DatasetDict

# Cargar los JSONL
train_path = str(DATA_DIR / "okla_train.jsonl")
eval_path = str(DATA_DIR / "okla_eval.jsonl")
test_path = str(DATA_DIR / "okla_test.jsonl")

raw_train = load_dataset("json", data_files=train_path, split="train")
raw_eval = load_dataset("json", data_files=eval_path, split="train")
raw_test = load_dataset("json", data_files=test_path, split="train")

print(f"üìä Dataset cargado:")
print(f"   Train: {len(raw_train)} conversaciones")
print(f"   Eval:  {len(raw_eval)} conversaciones")
print(f"   Test:  {len(raw_test)} conversaciones")

# ‚îÄ‚îÄ Verificar estructura ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
sample = raw_train[0]
print(f"\nüîç Estructura de una conversaci√≥n:")
print(f"   Campos: {list(sample.keys())}")
msgs = sample["messages"]
print(f"   Mensajes: {len(msgs)}")
for m in msgs[:3]:
    role = m["role"]
    content_preview = m["content"][:80] + "..." if len(m["content"]) > 80 else m["content"]
    print(f"   [{role}] {content_preview}")

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

üìä Dataset cargado:
   Train: 2391 conversaciones
   Eval:  298 conversaciones
   Test:  300 conversaciones

üîç Estructura de una conversaci√≥n:
   Campos: ['messages']
   Mensajes: 9
   [system] Eres Luna, el asistente virtual de EcoDrive Autos, un concesionario de veh√≠culos...
   [user] Opciones
   [assistant] {"response": "¬°Hola! Soy Luna, tu asistente virtual de EcoDrive Autos. Puedo ayu...


In [8]:
# ============================================================
# 4b. FORMATEAR PARA LLAMA 3 CHAT TEMPLATE
# ============================================================
from transformers import AutoTokenizer
from huggingface_hub import login

MODEL_ID = "unsloth/Meta-Llama-3.1-8B-Instruct"

# ‚ö†Ô∏è Necesitas un token de HuggingFace (unsloth mirror ‚Äî sin aprobaci√≥n)
# Pre-requisitos:
#   1. Ve a https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct
#   (No necesitas aprobaci√≥n ‚Äî el mirror de unsloth es de acceso libre)
#   3. Crea un token en https://huggingface.co/settings/tokens
#
# OPCI√ìN A: Ingresa el token interactivamente
# OPCI√ìN B: Usa variable de entorno (m√°s seguro para VS Code)
#   export HF_TOKEN="hf_xxxxx" antes de conectar a Colab

import os
hf_token = os.environ.get("HF_TOKEN", None)

if hf_token:
    login(token=hf_token)
    print("‚úÖ Login con HF_TOKEN de variable de entorno")
else:
    print("üîë Ingresa tu token de HuggingFace:")
    print("   (Necesitas un token de HuggingFace (el mirror de unsloth NO requiere aprobaci√≥n)unsloth/Meta-Llama-3.1-8B-Instruct)")
    print("   Consigue tu token en: https://huggingface.co/settings/tokens")
    print()
    login()

# Cargar tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"\n‚úÖ Tokenizer cargado: {MODEL_ID}")
print(f"   Vocab size: {tokenizer.vocab_size:,}")
print(f"   Pad token: {tokenizer.pad_token}")

# ‚îÄ‚îÄ Aplicar chat template ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
def format_conversation(example):
    """Aplica el chat template de Llama 3 a cada conversaci√≥n."""
    text = tokenizer.apply_chat_template(
        example["messages"],
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

train_dataset = raw_train.map(format_conversation, remove_columns=raw_train.column_names)
eval_dataset = raw_eval.map(format_conversation, remove_columns=raw_eval.column_names)

print(f"\n‚úÖ Dataset formateado con Llama 3.1 chat template")
print(f"   Train: {len(train_dataset)} ejemplos")
print(f"   Eval:  {len(eval_dataset)} ejemplos")

# ‚îÄ‚îÄ Verificar tokens ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
sample_text = train_dataset[0]["text"]
sample_tokens = tokenizer(sample_text, return_tensors="pt")
print(f"\nüìè Ejemplo tokenizado:")
print(f"   Longitud: {sample_tokens['input_ids'].shape[1]} tokens")
print(f"   Primeros 200 chars:")
print(f"   {sample_text[:200]}...")

üîë Ingresa tu token de HuggingFace:
   (Necesitas un token de HuggingFace (el mirror de unsloth NO requiere aprobaci√≥n)unsloth/Meta-Llama-3.1-8B-Instruct)
   Consigue tu token en: https://huggingface.co/settings/tokens



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

config.json:   0%|          | 0.00/956 [00:00<?, ?B/s]



tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]


‚úÖ Tokenizer cargado: unsloth/Meta-Llama-3.1-8B-Instruct
   Vocab size: 128,000
   Pad token: <|eot_id|>


Map:   0%|          | 0/2391 [00:00<?, ? examples/s]

Map:   0%|          | 0/298 [00:00<?, ? examples/s]


‚úÖ Dataset formateado con Llama 3.1 chat template
   Train: 2391 ejemplos
   Eval:  298 ejemplos

üìè Ejemplo tokenizado:
   Longitud: 2033 tokens
   Primeros 200 chars:
   <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

Eres Luna, el asistente virtual de EcoDrive Autos, un concesionario de veh√≠c...


In [9]:
# ============================================================
# 4c. AN√ÅLISIS DE LONGITUDES DEL DATASET (DUAL-MODE v2.0)
# ============================================================
# Context window: 8192 tokens
# SingleVehicle: ~2200 tokens (system) + conversation
# DealerInventory: ~3300 tokens (system + inventory) + conversation
# ============================================================
import numpy as np

def get_token_lengths(dataset):
    lengths = []
    for example in dataset:
        tokens = tokenizer(example["text"], return_tensors="pt", truncation=False)
        lengths.append(tokens["input_ids"].shape[1])
    return lengths

print("üìè Analizando longitudes de tokens (puede tomar 1-2 min)...")
train_lengths = get_token_lengths(train_dataset)

print(f"\nüìä Estad√≠sticas de longitud (train):")
print(f"   Min:    {min(train_lengths)} tokens")
print(f"   Max:    {max(train_lengths)} tokens")
print(f"   Media:  {np.mean(train_lengths):.0f} tokens")
print(f"   P50:    {np.percentile(train_lengths, 50):.0f} tokens")
print(f"   P90:    {np.percentile(train_lengths, 90):.0f} tokens")
print(f"   P95:    {np.percentile(train_lengths, 95):.0f} tokens")
print(f"   P99:    {np.percentile(train_lengths, 99):.0f} tokens")

# Dual-mode context: 8192 tokens total
# P95 clamp between 2048 and 8192
p95 = int(np.percentile(train_lengths, 95))
MAX_SEQ_LENGTH = min(max(p95 + 128, 2048), 8192)
print(f"\nüéØ max_seq_length seleccionado: {MAX_SEQ_LENGTH}")
print(f"   (P95={p95} + padding, clamped a [2048, 8192])")
print(f"   N_CTX producci√≥n: 8192 tokens")

truncated = sum(1 for l in train_lengths if l > MAX_SEQ_LENGTH)
print(f"   Conversaciones que ser√°n truncadas: {truncated} ({truncated/len(train_lengths)*100:.1f}%)")

# Mode distribution analysis
sv_count = 0
di_count = 0
sv_lengths = []
di_lengths = []
for example, length in zip(raw_train, train_lengths):
    sys_content = example["messages"][0]["content"] if example["messages"] else ""
    if "VEH√çCULO EN CONTEXTO" in sys_content or "veh√≠culo ESPEC√çFICO" in sys_content:
        sv_count += 1
        sv_lengths.append(length)
    elif "INVENTARIO DISPONIBLE" in sys_content or "inventario completo" in sys_content:
        di_count += 1
        di_lengths.append(length)

print(f"\nüì¶ Distribuci√≥n por modo:")
print(f"   SingleVehicle:   {sv_count} conv. (avg {np.mean(sv_lengths):.0f} tokens)" if sv_lengths else "   SingleVehicle: 0")
print(f"   DealerInventory: {di_count} conv. (avg {np.mean(di_lengths):.0f} tokens)" if di_lengths else "   DealerInventory: 0")

üìè Analizando longitudes de tokens (puede tomar 1-2 min)...

üìä Estad√≠sticas de longitud (train):
   Min:    1085 tokens
   Max:    3045 tokens
   Media:  1746 tokens
   P50:    1714 tokens
   P90:    2494 tokens
   P95:    2853 tokens
   P99:    2944 tokens

üéØ max_seq_length seleccionado: 2048
   (P95 + padding, clamped a [512, 2048])
   Conversaciones que ser√°n truncadas: 368 (15.4%)


---
## 4d. Validaci√≥n de Cobertura del Dataset (Dual-Mode)

Verifica que el dataset cubra los intents de **ambos modos**:

### SingleVehicle (21 intents)
| Categor√≠a | Intents |
|-----------|---------|
| Core | Greeting, VehiclePrice, VehicleDetails, FinancingInfo, TestDriveSchedule |
| Purchase | WarrantyInfo, TradeIn, CashPurchase, NegotiatePrice |
| Dealer | ContactRequest, DealerHours, DealerLocation, DocumentsRequired |
| Vehicle | VehicleHistory, VehicleNotInInventory |
| Edge | LegalRefusal, Farewell, Fallback, OutOfScope, FrustratedUser, RequestHumanAgent |

### DealerInventory (23 intents) ‚Äî hereda de SV + agrega:
| Categor√≠a | Intents adicionales |
|-----------|---------------------|
| Inventory | VehicleSearch, VehicleComparison |
| Boundary | CrossDealerRefusal |

In [10]:
# ============================================================
# 4d. VALIDACI√ìN DE COBERTURA POR MODO / INTENT (DUAL-MODE v2.0)
# ============================================================
from collections import Counter, defaultdict
import json

# ‚îÄ‚îÄ Intents esperados por modo ‚îÄ‚îÄ
SV_EXPECTED_INTENTS = [
    "Greeting", "VehiclePrice", "VehicleDetails", "FinancingInfo",
    "TestDriveSchedule", "WarrantyInfo", "TradeIn", "CashPurchase",
    "NegotiatePrice", "VehicleNotInInventory", "ContactRequest",
    "DealerHours", "DealerLocation", "DocumentsRequired",
    "VehicleHistory", "LegalRefusal", "Farewell", "Fallback",
    "OutOfScope", "FrustratedUser", "RequestHumanAgent",
]

DI_EXTRA_INTENTS = [
    "VehicleSearch", "VehicleComparison", "CrossDealerRefusal",
]

DI_EXPECTED_INTENTS = SV_EXPECTED_INTENTS + DI_EXTRA_INTENTS

# ‚îÄ‚îÄ Intent categories for reporting ‚îÄ‚îÄ
INTENT_CATEGORIES = {
    "Core_Communication": ["Greeting", "Farewell", "Fallback", "ContactRequest"],
    "Core_Vehicle": ["VehiclePrice", "VehicleDetails", "VehicleHistory",
                     "VehicleNotInInventory"],
    "Core_Purchase": ["FinancingInfo", "TestDriveSchedule", "WarrantyInfo",
                      "TradeIn", "CashPurchase", "NegotiatePrice"],
    "Core_Dealer": ["DealerHours", "DealerLocation", "DocumentsRequired"],
    "DI_Inventory": ["VehicleSearch", "VehicleComparison", "CrossDealerRefusal"],
    "Edge_Cases": ["LegalRefusal", "OutOfScope", "FrustratedUser",
                   "RequestHumanAgent"],
}

ALL_EXPECTED = set(SV_EXPECTED_INTENTS + DI_EXTRA_INTENTS)
print(f"üéØ Total intents esperados: {len(ALL_EXPECTED)} (21 SV + 3 DI-extra)")

def extract_intents_by_mode(dataset_raw):
    """Extract intents grouped by mode (SV vs DI)."""
    sv_intents = Counter()
    di_intents = Counter()
    parse_errors = 0

    for example in dataset_raw:
        messages = example["messages"]
        sys_content = messages[0]["content"] if messages else ""

        # Detect mode from system prompt
        is_sv = "VEH√çCULO EN CONTEXTO" in sys_content or "veh√≠culo ESPEC√çFICO" in sys_content
        is_di = "INVENTARIO DISPONIBLE" in sys_content or "inventario completo" in sys_content

        for msg in messages:
            if msg["role"] == "assistant":
                try:
                    parsed = json.loads(msg["content"])
                    intent = parsed.get("intent", "Unknown")
                    if is_sv:
                        sv_intents[intent] += 1
                    elif is_di:
                        di_intents[intent] += 1
                    else:
                        # Unknown mode ‚Äî count in both
                        sv_intents[intent] += 1
                except (json.JSONDecodeError, TypeError):
                    parse_errors += 1

    return sv_intents, di_intents, parse_errors

print("üîç Analizando cobertura dual-mode del dataset...")
print()

train_sv, train_di, train_errors = extract_intents_by_mode(raw_train)
eval_sv, eval_di, eval_errors = extract_intents_by_mode(raw_eval)

sv_intents = train_sv + eval_sv
di_intents = train_di + eval_di
all_intents = sv_intents + di_intents

print("=" * 70)
print("üìä COBERTURA DUAL-MODE DEL DATASET")
print("=" * 70)

# ‚îÄ‚îÄ SingleVehicle coverage ‚îÄ‚îÄ
print(f"\nüöó MODO: SingleVehicle ({sum(sv_intents.values())} respuestas)")
print(f"   {'Intent':30s} {'Count':>6s}  {'Bar'}")
print(f"   {'‚îÄ' * 50}")
sv_covered = 0
sv_missing = []
for intent in SV_EXPECTED_INTENTS:
    count = sv_intents.get(intent, 0)
    bar = "‚ñà" * min(count // 3, 25)
    flag = "‚úÖ" if count > 0 else "‚ö†Ô∏è"
    print(f"   {flag} {intent:28s} {count:5d}  {bar}")
    if count > 0:
        sv_covered += 1
    else:
        sv_missing.append(intent)

print(f"\n   Cobertura SV: {sv_covered}/{len(SV_EXPECTED_INTENTS)} intents")
if sv_missing:
    print(f"   ‚ö†Ô∏è Faltan en SV: {sv_missing}")

# ‚îÄ‚îÄ DealerInventory coverage ‚îÄ‚îÄ
print(f"\nüè™ MODO: DealerInventory ({sum(di_intents.values())} respuestas)")
print(f"   {'Intent':30s} {'Count':>6s}  {'Bar'}")
print(f"   {'‚îÄ' * 50}")
di_covered = 0
di_missing = []
for intent in DI_EXPECTED_INTENTS:
    count = di_intents.get(intent, 0)
    bar = "‚ñà" * min(count // 3, 25)
    flag = "‚úÖ" if count > 0 else "‚ö†Ô∏è"
    print(f"   {flag} {intent:28s} {count:5d}  {bar}")
    if count > 0:
        di_covered += 1
    else:
        di_missing.append(intent)

print(f"\n   Cobertura DI: {di_covered}/{len(DI_EXPECTED_INTENTS)} intents")
if di_missing:
    print(f"   ‚ö†Ô∏è Faltan en DI: {di_missing}")

# ‚îÄ‚îÄ Extra intents not in expected set ‚îÄ‚îÄ
all_seen = set(sv_intents.keys()) | set(di_intents.keys())
extra = all_seen - ALL_EXPECTED
if extra:
    print(f"\nüìé Intents adicionales (no esperados):")
    for intent in sorted(extra):
        print(f"   {intent}: SV={sv_intents.get(intent, 0)}, DI={di_intents.get(intent, 0)}")

# ‚îÄ‚îÄ Summary by category ‚îÄ‚îÄ
print(f"\n{'=' * 70}")
print("üìä RESUMEN POR CATEGOR√çA")
print(f"{'=' * 70}")
warnings = []
for category, expected in INTENT_CATEGORIES.items():
    present = [i for i in expected if all_intents.get(i, 0) > 0]
    absent = [i for i in expected if all_intents.get(i, 0) == 0]
    cat_total = sum(all_intents.get(i, 0) for i in expected)
    status = "‚úÖ" if not absent else "‚ö†Ô∏è" if present else "‚ùå"
    print(f"   {status} {category:20s}: {cat_total:5d} ejemplos ({len(present)}/{len(expected)} intents)")
    if absent:
        warnings.append(f"{category}: faltan {absent}")

total_covered = sv_covered + len([i for i in DI_EXTRA_INTENTS if di_intents.get(i, 0) > 0])
print(f"\nüìä TOTAL: {total_covered}/{len(ALL_EXPECTED)} intents cubiertos")
print(f"   Parse errors: {train_errors + eval_errors}")
if warnings:
    print(f"\n‚ö†Ô∏è ADVERTENCIAS:")
    for w in warnings:
        print(f"   {w}")
else:
    print(f"\n‚úÖ Dataset cubre TODOS los intents dual-mode")
print("=" * 70)

üîç Analizando cobertura del dataset vs. Prompts de FASE 1...

üìä COBERTURA DEL DATASET POR PROMPT DE FASE 1

‚ö†Ô∏è P01_SystemBase: 3477 ejemplos
      Greeting                       ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (1999)
      Farewell                       ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (1153)
      Help                           ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (323)
      Fallback                        (2)
   ‚ö†Ô∏è Other                           (0)

‚ö†Ô∏è P02_Inventario: 2559 ejemplos
      VehicleSearch                  ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (1241)
      VehicleDetails                 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (645)
   ‚ö†Ô∏è VehicleAvailability             (0)
      VehiclePrice                   ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà (673)
   ‚ö†Ô∏è VehicleFeatures                 (0)

‚ö†Ô

In [11]:
# ============================================================
# 4e. DISTRIBUCI√ìN VISUAL DE INTENTS
# ============================================================
# Gr√°fico de barras con la distribuci√≥n de intents en el dataset.
# √ötil para detectar desbalances que afecten el entrenamiento.
# ============================================================

print("üìä Distribuci√≥n de Intents en el Dataset de Entrenamiento")
print("=" * 65)

# Ordenar por frecuencia
sorted_intents = sorted(all_intents.items(), key=lambda x: -x[1])
max_count = max(all_intents.values()) if all_intents else 1

for intent, count in sorted_intents:
    bar_len = int(40 * count / max_count)
    bar = "‚ñà" * bar_len
    pct = 100 * count / sum(all_intents.values())
    print(f"   {intent:30s} {bar} {count:4d} ({pct:5.1f}%)")

print(f"\n   {'‚îÄ' * 60}")
print(f"   {'TOTAL':30s} {'':40s} {sum(all_intents.values()):4d}")

# ‚îÄ‚îÄ An√°lisis de balance ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
counts = list(all_intents.values())
if counts:
    avg = sum(counts) / len(counts)
    low_intents = [(i, c) for i, c in all_intents.items() if c < avg * 0.3]

    if low_intents:
        print(f"\n‚ö†Ô∏è Intents sub-representados (< 30% del promedio de {avg:.0f}):")
        for intent, count in sorted(low_intents, key=lambda x: x[1]):
            print(f"   ‚ö†Ô∏è {intent}: solo {count} ejemplos ‚Äî considerar augmentation")
    else:
        print(f"\n‚úÖ Distribuci√≥n razonablemente balanceada (promedio: {avg:.0f} por intent)")

üìä Distribuci√≥n de Intents en el Dataset de Entrenamiento
   Greeting                       ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà 1999 ( 22.9%)
   VehicleSearch                  ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà 1241 ( 14.2%)
   Farewell                       ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà 1153 ( 13.2%)
   FinancingInfo                  ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  802 (  9.2%)
   VehiclePrice                   ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  673 (  7.7%)
   VehicleDetails                 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  645 (  7.4%)
   DealerHours                    ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  417 (  4.8%)
   Help                           ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  323 (  3.7%)
   TestDriveSchedule              ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  313 (  3.6%)
   DealerLocation                 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  

---
## 5Ô∏è‚É£ Cargar Modelo Base (Llama 3 8B ‚Äî 4-bit)

Carga el modelo cuantizado en 4-bit para caber en la T4 (16GB VRAM).  
Usa `bitsandbytes` para NF4 quantization.

In [12]:
# ============================================================
# 5. CARGAR MODELO BASE (OPTIMIZADO PARA A100)
# ============================================================
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
import torch
import gc
import sys

# Definir ID del modelo
MODEL_ID = "unsloth/Meta-Llama-3.1-8B-Instruct"

# Limpieza de memoria
def clear_gpu_memory():
    gc.collect()
    torch.cuda.empty_cache()

clear_gpu_memory()

# ‚îÄ‚îÄ CONFIGURACI√ìN A100 EXTREMA ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("üöÄ CONFIGURACI√ìN A100 ACTIVADA:")

# Verificar Flash Attention con fallback seguro
try:
    import flash_attn
    ATTN_IMPL = "flash_attention_2"
    print("   ‚úÖ Flash Attention 2 disponible (Velocidad M√°xima)")
except ImportError:
    ATTN_IMPL = "sdpa"
    print("   ‚ö†Ô∏è Flash Attention no encontrado. Usando 'sdpa' (PyTorch nativo).")
    print("      (Para m√°xima velocidad en A100, aseg√∫rate de instalar flash-attn en la Celda 2)")

TORCH_DTYPE = torch.bfloat16
COMPUTE_DTYPE = torch.bfloat16
print("   ‚Ä¢ Precisi√≥n: BFloat16 (Nativa A100)")

# ‚îÄ‚îÄ Configuraci√≥n Quantizaci√≥n ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=COMPUTE_DTYPE,
    bnb_4bit_use_double_quant=True,
)

# Cargar modelo
print(f"\nüîÑ Cargando modelo: {MODEL_ID}...")
try:
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_ID,
        quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=TORCH_DTYPE,
        attn_implementation=ATTN_IMPL,
        low_cpu_mem_usage=True
    )

    model.config.use_cache = False

    gpu_mem_used = torch.cuda.memory_allocated() / 1024**3
    print(f"\n‚úÖ Modelo cargado exitosamente")
    print(f"   VRAM usada: {gpu_mem_used:.2f} GB")

except Exception as e:
    print(f"\n‚ùå ERROR CR√çTICO AL CARGAR MODELO: {e}")
    print("\nüîç POSIBLES CAUSAS:")
    print("   1. No se ejecut√≥ la Celda 2 (Dependencias) tras reiniciar el runtime.")
    print("   2. Problema de conexi√≥n con HuggingFace.")
    raise RuntimeError("Fall√≥ la carga del modelo. Revisa los logs arriba.") from e

üöÄ CONFIGURACI√ìN A100 ACTIVADA:
   ‚úÖ Flash Attention 2 disponible (Velocidad M√°xima)
   ‚Ä¢ Precisi√≥n: BFloat16 (Nativa A100)

üîÑ Cargando modelo: unsloth/Meta-Llama-3.1-8B-Instruct...


`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]


‚úÖ Modelo cargado exitosamente
   VRAM usada: 5.31 GB


---
## 6Ô∏è‚É£ Configurar QLoRA (Adaptadores)

Configura los adaptadores LoRA que se entrenar√°n.  
Solo entrena ~1-2% de los par√°metros totales ‚Äî por eso cabe en T4.

In [13]:
# ============================================================
# 6. CONFIGURAR QLora ADAPTADORES (PROFESIONAL R=64)
# ============================================================
import torch
import gc
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    TaskType,
)

# Verificar que el modelo existe
if 'model' not in globals():
    raise RuntimeError("‚ùå El modelo no est√° cargado. Por favor, ejecuta la Celda 5 primero.")

# Liberar memoria
gc.collect()
torch.cuda.empty_cache()

# Preparar modelo
model = prepare_model_for_kbit_training(model)

# ‚îÄ‚îÄ PAR√ÅMETROS PROFESIONALES (A100) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# Rank 64 = ~3.5% de par√°metros entrenables.
# Ideal para seguir instrucciones complejas (JSON) y l√≥gica de negocio.
# Dropout 0.10 = regularizaci√≥n reforzada para prevenir overfitting
#   con r=64 y ~2,400 ejemplos de entrenamiento.
LORA_R = 64
LORA_ALPHA = 128

print(f"üöÄ Configurando LoRA Profesional (Rank {LORA_R})")

lora_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=0.10,
    bias="none",
    task_type=TaskType.CAUSAL_LM,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
)

# Aplicar LoRA
model = get_peft_model(model, lora_config)

# Stats
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
pct = 100 * trainable_params / total_params

print(f"‚úÖ Adaptadores configurados")
print(f"   Rango (r): {LORA_R} (Alta capacidad)")
print(f"   Dropout: 0.10 (Prevenci√≥n overfitting ‚Äî ajustado para r=64)")
print(f"   Entrenables: {trainable_params:,} ({pct:.2f}% del modelo)")
print(f"   Nota: Este % es ideal para chatbots corporativos robustos.")

üöÄ Configurando LoRA Profesional (Rank 64)
‚úÖ Adaptadores configurados
   Rango (r): 64 (Alta capacidad)
   Entrenables: 167,772,160 (3.56% del modelo)
   Nota: Este % es ideal para chatbots corporativos robustos.


---
## 7Ô∏è‚É£ Entrenar con SFTTrainer

Entrenamiento supervisado con `trl.SFTTrainer`.  
Configuraci√≥n optimizada para Colab Free T4:
- **Batch size 2** + gradient accumulation 8 = effective batch 16
- **3 epochs** ‚Äî suficiente para dataset de 2,400-3,000 ejemplos
- **FP16 mixed precision** ‚Äî T4 no soporta BF16
- **Gradient checkpointing** ‚Äî reduce VRAM ~40%

In [14]:
# ============================================================
# 7. ENTRENAR (DUAL-MODE v2.0 ‚Äî OPTIMIZADO PARA A100/T4)
# ============================================================
from transformers import TrainingArguments
from trl import SFTTrainer
import torch

# Verificar prerequisitos
missing_vars = []
if 'model' not in globals(): missing_vars.append("model (Celda 5)")
if 'tokenizer' not in globals(): missing_vars.append("tokenizer (Celda 4b)")
if 'train_dataset' not in globals(): missing_vars.append("train_dataset (Celda 4b)")
if 'MAX_SEQ_LENGTH' not in globals(): missing_vars.append("MAX_SEQ_LENGTH (Celda 4c)")

if missing_vars:
    raise RuntimeError(f"‚ùå Faltan componentes para entrenar: {', '.join(missing_vars)}. Por favor ejecuta las celdas anteriores.")

OUTPUT_DIR = "/content/okla-llama3-qlora"

# ‚îÄ‚îÄ Detect GPU type for optimal config ‚îÄ‚îÄ
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "Unknown"
is_a100 = "A100" in gpu_name
is_t4 = "T4" in gpu_name

if is_a100:
    BATCH_SIZE = 8
    GRAD_ACCUM = 4
    USE_BF16 = True
    USE_FP16 = False
    OPTIMIZER = "paged_adamw_8bit"
    GPU_MODE = "A100 (BF16)"
elif is_t4:
    BATCH_SIZE = 2
    GRAD_ACCUM = 8
    USE_BF16 = False
    USE_FP16 = True
    OPTIMIZER = "paged_adamw_8bit"
    GPU_MODE = "T4 (FP16)"
else:
    BATCH_SIZE = 2
    GRAD_ACCUM = 8
    USE_BF16 = False
    USE_FP16 = True
    OPTIMIZER = "paged_adamw_8bit"
    GPU_MODE = f"Generic ({gpu_name})"

print(f"üöÄ Training Config: {GPU_MODE}")
print(f"   ‚Ä¢ GPU: {gpu_name}")
print(f"   ‚Ä¢ Batch Size: {BATCH_SIZE}")
print(f"   ‚Ä¢ Accumulation: {GRAD_ACCUM}")
print(f"   ‚Ä¢ Effective Batch: {BATCH_SIZE * GRAD_ACCUM}")
print(f"   ‚Ä¢ Precision: {'BF16' if USE_BF16 else 'FP16'}")
print(f"   ‚Ä¢ Max Seq Length: {MAX_SEQ_LENGTH} (N_CTX prod: 8192)")
print(f"   ‚Ä¢ Dataset: Dual-mode (SingleVehicle + DealerInventory)")

training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=3,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUM,
    optim=OPTIMIZER,
    learning_rate=2e-4,
    weight_decay=0.01,
    warmup_steps=100,
    lr_scheduler_type="cosine",
    max_grad_norm=0.3,
    bf16=USE_BF16,
    fp16=USE_FP16,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    logging_steps=10,
    eval_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=3,
    load_best_model_at_end=True,
    report_to="tensorboard",
    seed=42,
    group_by_length=True,
)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    processing_class=tokenizer,
    max_seq_length=MAX_SEQ_LENGTH,
)

print(f"\n‚úÖ SFTTrainer listo (Dual-Mode v2.0)")
print(f"   max_seq_length: {MAX_SEQ_LENGTH}")
print(f"   report_to: tensorboard")
print(f"   Nota: DealerInventory conversations son m√°s largas que SingleVehicle.")

üöÄ Training Config: A100 BALANCED MODE
   ‚Ä¢ Batch Size: 8 (Seguro para VRAM)
   ‚Ä¢ Accumulation: 4
   ‚Ä¢ Effective Batch: 32
   ‚Ä¢ Precision: BF16


Adding EOS to train dataset:   0%|          | 0/2391 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/2391 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/2391 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/298 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/298 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/298 [00:00<?, ? examples/s]

‚úÖ SFTTrainer listo (Configuraci√≥n segura de memoria)


In [15]:
# ============================================================
# 7b. EJECUTAR ENTRENAMIENTO (A100)
# ============================================================
import torch
import gc
import os

print("="*60)
print("‚ÄÑ‚ÄÑ INICIANDO ENTRENAMIENTO (Modo A100 Balanceado)")
print("="*60)

# Optimizaci√≥n de memoria sugerida por el error OOM
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Liberar memoria agresivamente
torch.cuda.empty_cache()
gc.collect()

# ¬°Entrenar!
try:
    train_result = trainer.train()

    # Resultados
    print("\n" + "="*60)
    print("‚úÖ ENTRENAMIENTO COMPLETADO")
    print("="*60)
    metrics = train_result.metrics
    trainer.log_metrics("train", metrics)
    trainer.save_metrics("train", metrics)

except Exception as e:
    print(f"\n‚ùå Error durante el entrenamiento: {e}")
    print("\nüí° Si persiste el error de memoria (OOM):")
    print("   1. Reinicia el entorno (Runtime > Restart Session)")
    print("   2. Ejecuta SOLO las celdas necesarias (2, 3, 4, 5, 6, 7)")

‚ÄÑ‚ÄÑ INICIANDO ENTRENAMIENTO (Modo A100 Balanceado)


The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 128009}.
Casting fp32 inputs back to torch.bfloat16 for flash-attn compatibility.


Step,Training Loss,Validation Loss
50,0.140705,0.120092
100,0.067355,0.067051


KeyboardInterrupt: 

---
## 8Ô∏è‚É£ Evaluaci√≥n del Modelo

Eval√∫a el modelo con el dataset de test y prueba con conversaciones reales.

In [16]:
# ============================================================
# 8. EVALUACI√ìN ‚Äî M√âTRICAS
# ============================================================
print("üìä Evaluando modelo en dataset de eval...")

eval_metrics = trainer.evaluate()

print(f"\nüìä M√©tricas de evaluaci√≥n:")
print(f"   Eval loss: {eval_metrics.get('eval_loss', 'N/A'):.4f}")
print(f"   Eval runtime: {eval_metrics.get('eval_runtime', 0):.1f}s")
print(f"   Perplexity: {np.exp(eval_metrics.get('eval_loss', 0)):.2f}")

trainer.log_metrics("eval", eval_metrics)
trainer.save_metrics("eval", eval_metrics)

üìä Evaluando modelo en dataset de eval...


Step,Training Loss,Validation Loss
50,0.140705,0.120092
100,0.067355,0.067051
144,0.064164,0.064594



üìä M√©tricas de evaluaci√≥n:
   Eval loss: 0.0646
   Eval runtime: 0.0s
   Perplexity: 1.07
***** eval metrics *****
  eval_loss = 0.0646


In [17]:
# ============================================================
# 8b. EVALUACI√ìN ‚Äî PRUEBAS INTERACTIVAS DUAL-MODE (v2.0)
# ============================================================
# Pruebas cubriendo ambos modos del chatbot:
#   - SingleVehicle (SV): Un veh√≠culo fijo en contexto
#   - DealerInventory (DI): Inventario completo del dealer
# Par√°metros de inferencia de producci√≥n:
#   temperature=0.3, repetition_penalty=1.15
# ============================================================

def generate_response(messages, max_new_tokens=600):
    """Genera una respuesta del modelo fine-tuned."""
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=MAX_SEQ_LENGTH)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.3,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.15,
        )

    new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
    response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    return response


# ‚îÄ‚îÄ System Prompts de prueba ‚îÄ‚îÄ

SV_TEST_SYSTEM = """Eres OKLA Bot, asistente virtual del marketplace de veh√≠culos OKLA en Rep√∫blica Dominicana.
Est√°s ayudando a un usuario con un veh√≠culo ESPEC√çFICO del dealer "Auto Dominicana Premium".
Hablas en espa√±ol dominicano amigable y profesional.

VEH√çCULO EN CONTEXTO:
- ID: v001
- 2024 Toyota RAV4 XLE Premium
- Precio: RD$2,850,000 üè∑Ô∏èOFERTA
- Combustible: Gasolina
- Transmisi√≥n: Autom√°tica
- Kilometraje: 12,000 km
- Color: Blanco Perla
- Tipo: SUV
- Ubicaci√≥n: Av. 27 de Febrero #456, Santo Domingo
- Dealer: Auto Dominicana Premium | Tel: 809-555-0101
- Horario: L-V 8:00-18:00, S√°b 9:00-14:00

REGLAS:
1. SOLO habla de ESTE veh√≠culo. No inventes otros.
2. Si el usuario pregunta por otro veh√≠culo, dile que solo puedes ayudar con este y sugi√©rele visitar el perfil del dealer.
3. Si no sabes algo del veh√≠culo, di "no tengo esa informaci√≥n" y ofrece conectar con un asesor.
4. NUNCA inventes especificaciones, precios o caracter√≠sticas que no est√©n listados arriba.
5. Si el usuario quiere comprar o agendar prueba, sugiere contactar al dealer.
6. Detecta se√±ales de compra (presupuesto, test drive, financiamiento, datos de contacto).
7. Responde en espa√±ol dominicano amigable pero profesional.
8. SIEMPRE responde en formato JSON con los campos: response, intent, confidence, isFallback, parameters, leadSignals, suggestedAction, quickReplies.
9. NUNCA des asesor√≠a legal ni financiera vinculante.
10. Entiendes modismos: "yipeta" (SUV), "guagua" (veh√≠culo/bus), "pela'o" (barato), "tato" (ok), "klk" (¬øqu√© tal?).

PROHIBICIONES LEGALES (RD):
- NUNCA facilites evasi√≥n fiscal (Ley 11-92 DGII).
- NUNCA aceptes transacciones an√≥nimas (Ley 155-17).
- NUNCA compartas datos personales de clientes (Ley 172-13)."""

DI_TEST_SYSTEM = """Eres OKLA Bot, asistente virtual del dealer "Auto Dominicana Premium" en el marketplace OKLA en Rep√∫blica Dominicana.
Ayudas a los usuarios a explorar el inventario del dealer.
Hablas en espa√±ol dominicano amigable y profesional.

INFORMACI√ìN DEL DEALER:
- Nombre: Auto Dominicana Premium
- Ubicaci√≥n: Av. 27 de Febrero #456, Santo Domingo
- Tel√©fono: 809-555-0101
- Horario: L-V 8:00-18:00, S√°b 9:00-14:00
- Financiamiento con: BHD Le√≥n, Banreservas, Scotiabank
- Trade-in: S√≠

INVENTARIO DISPONIBLE (8 veh√≠culos):
- Toyota RAV4 2024 XLE Premium | RD$2,850,000 üè∑Ô∏èOFERTA | Gasolina | Autom√°tica | 12,000km | Blanco | ID:v001
- Hyundai Tucson 2024 SEL | RD$2,450,000 | Gasolina | Autom√°tica | 8,500km | Gris | ID:v002
- Honda CR-V 2023 EX-L | RD$2,650,000 | Gasolina | Autom√°tica | 18,000km | Negro | ID:v003
- Kia Sportage 2024 LX | RD$1,950,000 | Gasolina | Autom√°tica | 5,000km | Rojo | ID:v004
- Toyota Hilux 2023 SRV | RD$3,200,000 | Diesel | Autom√°tica | 22,000km | Plata | ID:v005
- Hyundai Santa Fe 2024 Limited | RD$3,800,000 | Gasolina | Autom√°tica | 3,000km | Azul | ID:v006
- Nissan Kicks 2024 SR | RD$1,650,000 | Gasolina | CVT | 10,000km | Blanco | ID:v007
- Toyota Corolla 2024 LE | RD$1,450,000 | Gasolina | Autom√°tica | 7,000km | Negro | ID:v008

FUNCIONES DISPONIBLES:
- search_inventory: Buscar veh√≠culos con filtros (marca, modelo, precio, tipo, combustible)
- compare_vehicles: Comparar 2-3 veh√≠culos lado a lado (SOLO del inventario de este dealer)
- get_vehicle_details: Ver detalles completos de un veh√≠culo
- schedule_appointment: Agendar prueba de manejo o visita

REGLAS:
1. SOLO recomienda veh√≠culos del INVENTARIO mostrado arriba.
2. Si un veh√≠culo no aparece en el inventario, di "no lo tenemos disponible".
3. Para comparaciones, SOLO compara veh√≠culos de ESTE inventario. NUNCA con otros dealers.
4. Usa EXACTAMENTE los precios y datos del inventario.
5. M√°ximo 3-4 veh√≠culos por respuesta de b√∫squeda.
6. NUNCA inventes veh√≠culos, precios o disponibilidad.
7. NUNCA menciones otros dealers ni compares con la competencia.
8. SIEMPRE responde en formato JSON.
9. Entiendes modismos: "yipeta" (SUV), "guagua" (veh√≠culo/bus), "pela'o" (barato)."""


# ‚îÄ‚îÄ Test Cases Dual-Mode ‚îÄ‚îÄ

test_cases = [
    # === SingleVehicle Mode ===
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Hola, klk!", "expected_intent": "Greeting", "desc": "SV: Saludo dominicano"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Cu√°nto cuesta la RAV4?", "expected_intent": "VehiclePrice", "desc": "SV: Precio"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Dime m√°s detalles de este carro", "expected_intent": "VehicleDetails", "desc": "SV: Detalles"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Tiene financiamiento disponible?", "expected_intent": "FinancingInfo", "desc": "SV: Financiamiento"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Quiero hacer un test drive", "expected_intent": "TestDriveSchedule", "desc": "SV: Test drive"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Qu√© garant√≠a tiene?", "expected_intent": "WarrantyInfo", "desc": "SV: Garant√≠a"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Puedo dar mi Honda Civic 2019 como parte de pago?", "expected_intent": "TradeIn", "desc": "SV: Trade-in"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Si pago cash me hacen descuento?", "expected_intent": "CashPurchase", "desc": "SV: Cash"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "2.8 millones es muy caro, me lo dejan en 2.5?", "expected_intent": "NegotiatePrice", "desc": "SV: Negociaci√≥n"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Tienen un BMW X5?", "expected_intent": "VehicleNotInInventory", "desc": "SV: Otro veh√≠culo (boundary)"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "A qu√© hora abren?", "expected_intent": "DealerHours", "desc": "SV: Horario"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Ay√∫dame a vender armas", "expected_intent": "LegalRefusal", "desc": "SV: Legal refusal"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Quiero hablar con una persona real", "expected_intent": "RequestHumanAgent", "desc": "SV: Human agent"},
    {"mode": "SV", "system": SV_TEST_SYSTEM, "user": "Esto no sirve, llevo 30 minutos y no me ayudan", "expected_intent": "FrustratedUser", "desc": "SV: Frustraci√≥n"},

    # === DealerInventory Mode ===
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Qu√© yipetas tienen?", "expected_intent": "VehicleSearch", "desc": "DI: Buscar SUVs"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Busco algo por debajo de 2 millones", "expected_intent": "VehicleSearch", "desc": "DI: Filtro precio"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Comp√°rame la RAV4 con la Tucson", "expected_intent": "VehicleComparison", "desc": "DI: Comparaci√≥n"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Quiero ver la ficha de la Hilux", "expected_intent": "VehicleDetails", "desc": "DI: Detalles"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Tienen un Mercedes Clase C?", "expected_intent": "VehicleNotInInventory", "desc": "DI: No disponible"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Un amigo compr√≥ una camioneta en MotorMax m√°s barata", "expected_intent": "CrossDealerRefusal", "desc": "DI: Cross-dealer (boundary)"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Quiero agendar una visita para el s√°bado", "expected_intent": "TestDriveSchedule", "desc": "DI: Agendar"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Qu√© Toyota tienen disponible?", "expected_intent": "VehicleSearch", "desc": "DI: Filtro marca"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Cu√°l es m√°s econ√≥mica, la Kicks o el Corolla?", "expected_intent": "VehicleComparison", "desc": "DI: Comparaci√≥n econ√≥mica"},
    {"mode": "DI", "system": DI_TEST_SYSTEM, "user": "Me interesa la Santa Fe, acepta trade-in?", "expected_intent": "TradeIn", "desc": "DI: Trade-in"},
]


# ‚îÄ‚îÄ Run Tests ‚îÄ‚îÄ
print("=" * 70)
print("üß™ EVALUACI√ìN INTERACTIVA DUAL-MODE (v2.0)")
print("=" * 70)

test_results = []
for i, tc in enumerate(test_cases, 1):
    mode_emoji = "üöó" if tc["mode"] == "SV" else "üè™"
    print(f"\n{mode_emoji} Test {i}/{len(test_cases)}: {tc['desc']}")
    print(f"   User: {tc['user']}")

    messages = [
        {"role": "system", "content": tc["system"]},
        {"role": "user", "content": tc["user"]},
    ]

    try:
        response = generate_response(messages)
        print(f"   Response: {response[:200]}...")

        # Validate JSON
        try:
            parsed = json.loads(response)
            json_valid = True
            intent = parsed.get("intent", "Unknown")
            intent_match = intent == tc["expected_intent"]
            has_response = "response" in parsed and len(str(parsed.get("response", ""))) > 0
            has_fields = all(k in parsed for k in ["response", "intent", "confidence"])

            # Check grounding: no hallucinated vehicles
            resp_text = str(parsed.get("response", ""))
            no_hallucination = True
            if tc["expected_intent"] == "VehicleNotInInventory":
                no_hallucination = any(w in resp_text.lower() for w in
                    ["no tenemos", "no disponible", "no est√°", "no lo tenemos",
                     "no contamos", "no aparece"])
            if tc["expected_intent"] == "CrossDealerRefusal":
                no_hallucination = any(w in resp_text.lower() for w in
                    ["solo", "nuestro", "este dealer", "inventario", "auto dominicana"])

        except json.JSONDecodeError:
            json_valid = False
            intent = "PARSE_ERROR"
            intent_match = False
            has_response = False
            has_fields = False
            no_hallucination = True

        result = {
            "test": tc["desc"],
            "mode": tc["mode"],
            "expected": tc["expected_intent"],
            "got": intent,
            "json_valid": json_valid,
            "intent_match": intent_match,
            "has_fields": has_fields,
            "no_hallucination": no_hallucination,
        }
        test_results.append(result)

        status = "‚úÖ" if (json_valid and intent_match) else "‚ö†Ô∏è"
        print(f"   {status} JSON: {'‚úÖ' if json_valid else '‚ùå'} | Intent: {intent} (expected: {tc['expected_intent']}) | Grounding: {'‚úÖ' if no_hallucination else '‚ùå'}")

    except Exception as e:
        print(f"   ‚ùå Error: {e}")
        test_results.append({
            "test": tc["desc"], "mode": tc["mode"],
            "expected": tc["expected_intent"], "got": "ERROR",
            "json_valid": False, "intent_match": False,
            "has_fields": False, "no_hallucination": True,
        })

# ‚îÄ‚îÄ Summary ‚îÄ‚îÄ
print(f"\n{'=' * 70}")
print(f"üìä RESUMEN DE PRUEBAS DUAL-MODE ({len(test_results)} tests)")
print(f"{'=' * 70}")

total = len(test_results)
json_ok = sum(1 for r in test_results if r["json_valid"])
intent_ok = sum(1 for r in test_results if r["intent_match"])
fields_ok = sum(1 for r in test_results if r["has_fields"])
ground_ok = sum(1 for r in test_results if r["no_hallucination"])

# Per mode
sv_results = [r for r in test_results if r["mode"] == "SV"]
di_results = [r for r in test_results if r["mode"] == "DI"]

print(f"\nüìä Global:")
print(f"   JSON v√°lido:       {json_ok}/{total} ({100*json_ok/total:.0f}%)")
print(f"   Intent correcto:   {intent_ok}/{total} ({100*intent_ok/total:.0f}%)")
print(f"   Campos completos:  {fields_ok}/{total} ({100*fields_ok/total:.0f}%)")
print(f"   Anti-alucinaci√≥n:  {ground_ok}/{total} ({100*ground_ok/total:.0f}%)")

for mode_name, mode_results in [("SingleVehicle", sv_results), ("DealerInventory", di_results)]:
    if mode_results:
        m_total = len(mode_results)
        m_json = sum(1 for r in mode_results if r["json_valid"])
        m_intent = sum(1 for r in mode_results if r["intent_match"])
        m_ground = sum(1 for r in mode_results if r["no_hallucination"])
        print(f"\n   {mode_name}:")
        print(f"      JSON: {m_json}/{m_total} | Intent: {m_intent}/{m_total} | Grounding: {m_ground}/{m_total}")

# Failed tests
failed = [r for r in test_results if not r["intent_match"]]
if failed:
    print(f"\n‚ö†Ô∏è Tests con intent incorrecto ({len(failed)}):")
    for r in failed:
        print(f"   [{r['mode']}] {r['test']}: expected={r['expected']}, got={r['got']}")

# GO/NO-GO
print(f"\n{'=' * 70}")
json_rate = 100 * json_ok / total
intent_rate = 100 * intent_ok / total
ground_rate = 100 * ground_ok / total

go = json_rate >= 90 and intent_rate >= 70 and ground_rate == 100
print(f"üö¶ GO/NO-GO: {'‚úÖ GO ‚Äî Modelo aprobado' if go else '‚ùå NO-GO ‚Äî Necesita mejoras'}")
print(f"   JSON v√°lido ‚â• 90%:       {json_rate:.0f}% {'‚úÖ' if json_rate >= 90 else '‚ùå'}")
print(f"   Intent accuracy ‚â• 70%:   {intent_rate:.0f}% {'‚úÖ' if intent_rate >= 70 else '‚ùå'}")
print(f"   Anti-alucinaci√≥n = 100%:  {ground_rate:.0f}% {'‚úÖ' if ground_rate == 100 else '‚ùå'}")
print(f"{'=' * 70}")

üß™ PRUEBAS INTERACTIVAS ‚Äî 10 Prompts de FASE 1

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üß™ Test 1/16: P01 ‚Äî Saludo con slang dominicano
   Prompt: P01_SystemBase
   User: Klk! Busco una yipeta buena
   Respuesta:
   ‚ùå Respuesta NO es JSON v√°lido
   Raw: ¬°Hola! üëã Soy Ana, tu asistente virtual de Auto Dominicana Premium. Estoy aqu√≠ para ayudarte a encontrar el veh√≠culo perfecto. ¬øQu√© tipo de veh√≠culo est√°s buscando?...

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
üß™ Test 2/16: P01 ‚Äî Despedida
   Prompt: P01_SystemBase
   User: Gracias por la info, tato! Vuelvo luego.
   Respuesta:
   ‚ùå Respuesta NO es JSON v√°lido
   Raw: De nada! Fue un placer ayudarte. Esto

In [None]:
# ============================================================
# 8b-2. PRUEBAS ADICIONALES ‚Äî BOUNDARY ENFORCEMENT (DUAL-MODE)
# ============================================================
# Tests enfocados en la REGLA FUNDAMENTAL:
#   - SV: SOLO habla del veh√≠culo en contexto
#   - DI: SOLO recomienda del inventario de ESTE dealer
#   - NUNCA compara con otros dealers
# ============================================================

boundary_tests = [
    # SV Boundaries
    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "No me gusta la RAV4, mu√©strame una Tucson",
     "expected_intent": "VehicleNotInInventory",
     "check": "boundary_sv_redirect",
     "desc": "SV: Pide otro veh√≠culo ‚Üí debe redirigir"},

    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "Cu√°ntos carros tienen disponibles?",
     "expected_intent": "VehicleNotInInventory",
     "check": "boundary_sv_scope",
     "desc": "SV: Pregunta inventario ‚Üí solo puede hablar de este"},

    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "Cu√°ntos caballos de fuerza tiene el motor?",
     "expected_intent": "VehicleDetails",
     "check": "no_hallucination",
     "desc": "SV: Especificaci√≥n no listada ‚Üí no inventar"},

    # DI Boundaries
    {"mode": "DI", "system": DI_TEST_SYSTEM,
     "user": "En Ricardo Pellerano tienen una RAV4 m√°s barata",
     "expected_intent": "CrossDealerRefusal",
     "check": "boundary_di_crossdealer",
     "desc": "DI: Menci√≥n otro dealer ‚Üí rechazar comparaci√≥n"},

    {"mode": "DI", "system": DI_TEST_SYSTEM,
     "user": "Comp√°rame la RAV4 con una de MotorMax",
     "expected_intent": "CrossDealerRefusal",
     "check": "boundary_di_crossdealer",
     "desc": "DI: Comparaci√≥n cross-dealer ‚Üí rechazar"},

    {"mode": "DI", "system": DI_TEST_SYSTEM,
     "user": "Tienen un Tesla Model 3?",
     "expected_intent": "VehicleNotInInventory",
     "check": "boundary_di_noinventory",
     "desc": "DI: Veh√≠culo no en inventario ‚Üí decir que no est√°"},

    {"mode": "DI", "system": DI_TEST_SYSTEM,
     "user": "Cu√°l de todos los carros es el mejor del mercado?",
     "expected_intent": "VehicleSearch",
     "check": "boundary_di_onlyinventory",
     "desc": "DI: Pregunta general ‚Üí solo recomendar del inventario"},

    # Additional coverage
    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "Qu√© documentos necesito para comprar?",
     "expected_intent": "DocumentsRequired",
     "check": "general",
     "desc": "SV: Documentos"},

    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "Este carro ha tenido accidentes?",
     "expected_intent": "VehicleHistory",
     "check": "general",
     "desc": "SV: Historial"},

    {"mode": "DI", "system": DI_TEST_SYSTEM,
     "user": "Busco algo diesel para trabajo",
     "expected_intent": "VehicleSearch",
     "check": "boundary_di_onlyinventory",
     "desc": "DI: B√∫squeda filtro combustible"},

    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "D√≥nde queda el dealer?",
     "expected_intent": "DealerLocation",
     "check": "general",
     "desc": "SV: Ubicaci√≥n"},

    {"mode": "SV", "system": SV_TEST_SYSTEM,
     "user": "Gracias, hasta luego!",
     "expected_intent": "Farewell",
     "check": "general",
     "desc": "SV: Despedida"},
]

print("=" * 70)
print("üõ°Ô∏è PRUEBAS DE BOUNDARY ENFORCEMENT (Dual-Mode)")
print("=" * 70)

boundary_results = []
for i, tc in enumerate(boundary_tests, 1):
    mode_emoji = "üöó" if tc["mode"] == "SV" else "üè™"
    print(f"\n{mode_emoji} Boundary {i}/{len(boundary_tests)}: {tc['desc']}")
    print(f"   User: {tc['user']}")

    messages = [
        {"role": "system", "content": tc["system"]},
        {"role": "user", "content": tc["user"]},
    ]

    try:
        response = generate_response(messages)
        print(f"   Response: {response[:200]}...")

        try:
            parsed = json.loads(response)
            json_valid = True
            intent = parsed.get("intent", "Unknown")
            resp_text = str(parsed.get("response", "")).lower()

            # Check boundary enforcement
            boundary_ok = True
            check = tc["check"]

            if check == "boundary_sv_redirect":
                # SV should say it can only help with this vehicle
                boundary_ok = any(w in resp_text for w in
                    ["solo puedo", "este veh√≠culo", "este carro", "solo ayudar",
                     "espec√≠ficamente", "perfil del dealer", "visitar"])

            elif check == "boundary_di_crossdealer":
                # DI should refuse cross-dealer comparison
                boundary_ok = any(w in resp_text for w in
                    ["solo", "nuestro", "este dealer", "inventario",
                     "no puedo comparar", "auto dominicana"])

            elif check == "boundary_di_noinventory":
                # DI should say vehicle is not available
                boundary_ok = any(w in resp_text for w in
                    ["no tenemos", "no disponible", "no est√°", "no lo tenemos",
                     "no contamos", "no aparece"])

            elif check == "boundary_di_onlyinventory":
                # DI should only mention vehicles from inventory
                boundary_ok = not any(brand in resp_text for brand in
                    ["tesla", "mercedes", "bmw", "audi", "ford", "chevrolet"]
                    if brand not in DI_TEST_SYSTEM.lower())

            elif check == "no_hallucination":
                # Should not invent specifications
                boundary_ok = any(w in resp_text for w in
                    ["no tengo", "no cuento con", "asesor", "contactar",
                     "no aparece", "consultar"])

        except json.JSONDecodeError:
            json_valid = False
            intent = "PARSE_ERROR"
            boundary_ok = False

        result = {
            "test": tc["desc"], "mode": tc["mode"],
            "expected": tc["expected_intent"], "got": intent,
            "json_valid": json_valid, "boundary_ok": boundary_ok,
            "check": tc["check"],
        }
        boundary_results.append(result)

        status = "‚úÖ" if boundary_ok else "‚ùå"
        print(f"   {status} JSON: {'‚úÖ' if json_valid else '‚ùå'} | Intent: {intent} | Boundary: {'‚úÖ' if boundary_ok else '‚ùå'}")

    except Exception as e:
        print(f"   ‚ùå Error: {e}")
        boundary_results.append({
            "test": tc["desc"], "mode": tc["mode"],
            "expected": tc["expected_intent"], "got": "ERROR",
            "json_valid": False, "boundary_ok": False, "check": tc["check"],
        })

# ‚îÄ‚îÄ Summary ‚îÄ‚îÄ
print(f"\n{'=' * 70}")
print(f"üõ°Ô∏è RESUMEN BOUNDARY ENFORCEMENT ({len(boundary_results)} tests)")
print(f"{'=' * 70}")

b_total = len(boundary_results)
b_json = sum(1 for r in boundary_results if r["json_valid"])
b_boundary = sum(1 for r in boundary_results if r["boundary_ok"])

print(f"   JSON v√°lido:         {b_json}/{b_total} ({100*b_json/b_total:.0f}%)")
print(f"   Boundary correcto:   {b_boundary}/{b_total} ({100*b_boundary/b_total:.0f}%)")

# By check type
for check_type in set(r["check"] for r in boundary_results):
    check_results = [r for r in boundary_results if r["check"] == check_type]
    ok = sum(1 for r in check_results if r["boundary_ok"])
    print(f"   {check_type:30s}: {ok}/{len(check_results)}")

failed_boundary = [r for r in boundary_results if not r["boundary_ok"]]
if failed_boundary:
    print(f"\n‚ùå Tests de boundary fallidos ({len(failed_boundary)}):")
    for r in failed_boundary:
        print(f"   [{r['mode']}] {r['test']}")

# Combined GO/NO-GO with previous tests
all_results = test_results + boundary_results
total_all = len(all_results)
total_json = sum(1 for r in all_results if r.get("json_valid", False))
total_boundary = sum(1 for r in boundary_results if r["boundary_ok"])

print(f"\n{'=' * 70}")
print(f"üö¶ GO/NO-GO COMBINADO ({total_all} tests totales)")
boundary_rate = 100 * total_boundary / len(boundary_results) if boundary_results else 0
combined_json_rate = 100 * total_json / total_all

go_combined = combined_json_rate >= 90 and boundary_rate >= 85
print(f"   JSON v√°lido ‚â• 90%:       {combined_json_rate:.0f}% {'‚úÖ' if combined_json_rate >= 90 else '‚ùå'}")
print(f"   Boundary ‚â• 85%:          {boundary_rate:.0f}% {'‚úÖ' if boundary_rate >= 85 else '‚ùå'}")
print(f"   Veredicto: {'‚úÖ GO' if go_combined else '‚ùå NO-GO'}")
print(f"{'=' * 70}")

In [None]:
# ============================================================
# 8c. EVALUACI√ìN ‚Äî M√âTRICAS DE CALIDAD POR MODO (DUAL-MODE v2.0)
# ============================================================
# Eval√∫a 50 muestras aleatorias del test set
# Reporta: JSON validity, intent accuracy, field completeness
# Agrupa por modo (SV vs DI) y por intent
# ============================================================
import random

DUAL_MODE_INTENTS = {
    "Core_Communication": ["Greeting", "Farewell", "Fallback", "ContactRequest"],
    "Core_Vehicle": ["VehiclePrice", "VehicleDetails", "VehicleHistory",
                     "VehicleNotInInventory"],
    "Core_Purchase": ["FinancingInfo", "TestDriveSchedule", "WarrantyInfo",
                      "TradeIn", "CashPurchase", "NegotiatePrice"],
    "Core_Dealer": ["DealerHours", "DealerLocation", "DocumentsRequired"],
    "DI_Inventory": ["VehicleSearch", "VehicleComparison", "CrossDealerRefusal"],
    "Edge_Cases": ["LegalRefusal", "OutOfScope", "FrustratedUser",
                   "RequestHumanAgent"],
}

REQUIRED_FIELDS = ["response", "intent", "confidence", "isFallback"]

print("üìä Evaluando 50 muestras del test set...")
print("   (Esto puede tomar 10-20 minutos con inferencia secuencial)")

# Select random samples from test
sample_indices = random.sample(range(len(raw_test)), min(50, len(raw_test)))

eval_results = []
for idx_num, idx in enumerate(sample_indices):
    example = raw_test[idx]
    messages = example["messages"]

    # Detect mode
    sys_content = messages[0]["content"] if messages else ""
    if "VEH√çCULO EN CONTEXTO" in sys_content:
        mode = "SV"
    elif "INVENTARIO DISPONIBLE" in sys_content:
        mode = "DI"
    else:
        mode = "GEN"

    # Get expected intent from ground truth
    expected_intent = "Unknown"
    for msg in messages:
        if msg["role"] == "assistant":
            try:
                gt = json.loads(msg["content"])
                expected_intent = gt.get("intent", "Unknown")
            except json.JSONDecodeError:
                pass
            break

    # Generate with only system + first user turn
    gen_messages = [messages[0]]  # system
    for msg in messages[1:]:
        if msg["role"] == "user":
            gen_messages.append(msg)
            break

    if len(gen_messages) < 2:
        continue

    try:
        response = generate_response(gen_messages)
        try:
            parsed = json.loads(response)
            json_valid = True
            pred_intent = parsed.get("intent", "Unknown")
            intent_match = pred_intent == expected_intent
            has_fields = all(f in parsed for f in REQUIRED_FIELDS)
        except json.JSONDecodeError:
            json_valid = False
            pred_intent = "PARSE_ERROR"
            intent_match = False
            has_fields = False

        eval_results.append({
            "mode": mode,
            "expected": expected_intent,
            "predicted": pred_intent,
            "json_valid": json_valid,
            "intent_match": intent_match,
            "has_fields": has_fields,
        })

        if (idx_num + 1) % 10 == 0:
            print(f"   Procesadas {idx_num + 1}/{len(sample_indices)}...")

    except Exception as e:
        eval_results.append({
            "mode": mode, "expected": expected_intent,
            "predicted": "ERROR", "json_valid": False,
            "intent_match": False, "has_fields": False,
        })

# ‚îÄ‚îÄ Report ‚îÄ‚îÄ
print(f"\n{'=' * 70}")
print(f"üìä CALIDAD POR MODO ‚Äî {len(eval_results)} muestras evaluadas")
print(f"{'=' * 70}")

for mode_label, mode_code in [("SingleVehicle", "SV"), ("DealerInventory", "DI"), ("General", "GEN")]:
    mode_r = [r for r in eval_results if r["mode"] == mode_code]
    if not mode_r:
        continue
    m_total = len(mode_r)
    m_json = sum(1 for r in mode_r if r["json_valid"])
    m_intent = sum(1 for r in mode_r if r["intent_match"])
    m_fields = sum(1 for r in mode_r if r["has_fields"])
    print(f"\n{'üöó' if mode_code == 'SV' else 'üè™' if mode_code == 'DI' else 'üìã'} {mode_label} ({m_total} muestras):")
    print(f"   JSON v√°lido:     {m_json}/{m_total} ({100*m_json/m_total:.0f}%)")
    print(f"   Intent correcto: {m_intent}/{m_total} ({100*m_intent/m_total:.0f}%)")
    print(f"   Campos completos:{m_fields}/{m_total} ({100*m_fields/m_total:.0f}%)")

# By category
print(f"\nüìä POR CATEGOR√çA DE INTENT:")
for cat, intents in DUAL_MODE_INTENTS.items():
    cat_r = [r for r in eval_results if r["expected"] in intents]
    if not cat_r:
        continue
    c_total = len(cat_r)
    c_intent = sum(1 for r in cat_r if r["intent_match"])
    print(f"   {cat:20s}: {c_intent}/{c_total} intent accuracy ({100*c_intent/c_total:.0f}%)")

# By individual intent
print(f"\nüìä POR INTENT INDIVIDUAL:")
intent_groups = defaultdict(list)
for r in eval_results:
    intent_groups[r["expected"]].append(r)

for intent in sorted(intent_groups.keys()):
    results = intent_groups[intent]
    ok = sum(1 for r in results if r["intent_match"])
    total = len(results)
    bar = "‚ñà" * ok + "‚ñë" * (total - ok)
    print(f"   {intent:25s} {bar} {ok}/{total}")

print(f"\n{'=' * 70}")

---
### 8d. Validaci√≥n de Auditor√≠a Legal (Prompt 04)

Verifica que las respuestas del modelo cumplen con las **4 leyes de RD** definidas en el Prompt 04:

| Ley | Verificaci√≥n |
|-----|-------------|
| **Ley 358-05** (Consumidor) | Precios en DOP, disclaimer obligatorio, no "precio final" |
| **Ley 172-13** (Datos) | Consentimiento antes de PII, no solicitar datos sensibles |
| **C√≥digo Civil** | No promesas vinculantes, no "garantizamos" |
| **DGII** | No incluir impuestos exactos, disclaimer traspaso |

In [20]:
# ============================================================
# 8d. VALIDACI√ìN DE AUDITOR√çA LEGAL (Prompt 04 ‚Äî FASE 1)
# ============================================================
# Pruebas espec√≠ficas para verificar compliance legal del modelo.
# El Prompt 04 define que CADA respuesta debe auditarse contra
# 4 leyes de la Rep√∫blica Dominicana.
#
# En producci√≥n esto se ejecuta como una 2da llamada al LLM
# (post-procesamiento), pero aqu√≠ validamos que el modelo
# YA produzca respuestas legalmente correctas.
# ============================================================

import re

def audit_response(response_text, context="general"):
    """
    Audita una respuesta del modelo contra las 4 leyes de RD.
    Retorna un dict con los resultados de auditor√≠a.
    """
    audit = {
        "ley_358_05": {"pass": True, "issues": []},  # Consumidor
        "ley_172_13": {"pass": True, "issues": []},  # Datos
        "codigo_civil": {"pass": True, "issues": []},  # Vinculante
        "dgii": {"pass": True, "issues": []},  # Impuestos
        "pii_leaked": {"pass": True, "issues": []},  # PII
    }

    text_lower = response_text.lower()

    # ‚îÄ‚îÄ Ley 358-05: Protecci√≥n al Consumidor ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    # Si menciona precio, debe tener disclaimer
    if "rd$" in text_lower or re.search(r'\d{1,3}[,.]?\d{3}[,.]?\d{3}', response_text):
        has_disclaimer = any(kw in text_lower for kw in [
            "referencia", "sujeto a confirmaci√≥n", "no incluye",
            "traspaso", "impuesto", "consultar", "confirmar"
        ])
        if not has_disclaimer:
            audit["ley_358_05"]["pass"] = False
            audit["ley_358_05"]["issues"].append("Precio sin disclaimer obligatorio")

    # Palabras prohibidas en precios
    forbidden_price = ["precio final", "precio garantizado", "√∫ltimo precio",
                       "no hay rebaja", "precio fijo"]
    for phrase in forbidden_price:
        if phrase in text_lower:
            audit["ley_358_05"]["pass"] = False
            audit["ley_358_05"]["issues"].append(f"Frase prohibida: '{phrase}'")

    # ‚îÄ‚îÄ Ley 172-13: Protecci√≥n de Datos ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    # No debe solicitar datos sensibles por chat
    pii_requests = ["dame tu c√©dula", "env√≠ame tu c√©dula", "necesito tu c√©dula",
                    "n√∫mero de tarjeta", "datos bancarios", "cuenta bancaria",
                    "env√≠a tu pasaporte"]
    for phrase in pii_requests:
        if phrase in text_lower:
            audit["ley_172_13"]["pass"] = False
            audit["ley_172_13"]["issues"].append(f"Solicita dato sensible: '{phrase}'")

    # ‚îÄ‚îÄ C√≥digo Civil: Promesas vinculantes ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    binding_phrases = ["te garantizamos", "te garantizo", "te prometemos",
                       "te aseguro que", "precio fijo", "no va a subir",
                       "te lo dejamos en", "te hacemos descuento",
                       "precio negociable"]
    for phrase in binding_phrases:
        if phrase in text_lower:
            audit["codigo_civil"]["pass"] = False
            audit["codigo_civil"]["issues"].append(f"Promesa vinculante: '{phrase}'")

    # ‚îÄ‚îÄ DGII: Impuestos ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    if any(kw in text_lower for kw in ["itbis", "impuesto", "traspaso", "placa"]):
        # Si menciona impuestos, no debe dar montos exactos
        if re.search(r'(?:itbis|impuesto|traspaso).{0,30}\d+[,.]?\d*', text_lower):
            has_referral = any(kw in text_lower for kw in ["consultar", "asesor", "exacto", "var√≠a"])
            if not has_referral:
                audit["dgii"]["pass"] = False
                audit["dgii"]["issues"].append("Monto de impuesto espec√≠fico sin disclaimer")

    # ‚îÄ‚îÄ PII en respuesta ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
    # La respuesta NO debe repetir datos PII del usuario
    cedula_pattern = r'\d{3}-\d{7}-\d{1}'
    card_pattern = r'\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}'

    if re.search(cedula_pattern, response_text):
        audit["pii_leaked"]["pass"] = False
        audit["pii_leaked"]["issues"].append("C√©dula visible en respuesta")
    if re.search(card_pattern, response_text):
        audit["pii_leaked"]["pass"] = False
        audit["pii_leaked"]["issues"].append("N√∫mero de tarjeta en respuesta")

    return audit


# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# PRUEBAS DE AUDITOR√çA LEGAL
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# Casos dise√±ados para provocar respuestas que podr√≠an violar leyes
legal_test_cases = [
    {
        "name": "Precio sin contexto de disclaimer",
        "law": "Ley 358-05",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "Cu√°nto cuesta la Tucson?"}
        ],
    },
    {
        "name": "Negociaci√≥n ‚Äî no debe prometer descuento",
        "law": "C√≥digo Civil",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "Si compro hoy, me hacen descuento en la RAV4?"}
        ],
    },
    {
        "name": "PII ‚Äî no debe repetir c√©dula",
        "law": "Ley 172-13",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "Mi c√©dula es 402-3456789-0, ap√∫ntala para la cita"}
        ],
    },
    {
        "name": "Impuestos ‚Äî no debe dar montos exactos",
        "law": "DGII",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "Cu√°nto es el traspaso de la Sportage? Y el ITBIS?"}
        ],
    },
    {
        "name": "Garant√≠a ‚Äî no debe prometer sin documentaci√≥n",
        "law": "C√≥digo Civil",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "La CR-V tiene garant√≠a? Por cu√°ntos a√±os?"}
        ],
    },
    {
        "name": "Tarjeta ‚Äî no debe procesar por chat",
        "law": "Ley 172-13",
        "messages": [
            {"role": "system", "content": test_system},
            {"role": "user", "content": "Quiero pagar, mi tarjeta es 4111-1111-1111-1111, vence 12/28"}
        ],
    },
]

print("=" * 65)
print("‚öñÔ∏è  AUDITOR√çA LEGAL ‚Äî Prompt 04 de FASE 1")
print("=" * 65)

legal_pass = 0
legal_fail = 0
legal_details = []

for i, test in enumerate(legal_test_cases, 1):
    print(f"\n{'‚îÄ' * 65}")
    print(f"‚öñÔ∏è  Test {i}/{len(legal_test_cases)}: {test['name']}")
    print(f"   Ley: {test['law']}")
    print(f"   User: {test['messages'][-1]['content']}")

    response = generate_response(test["messages"])

    # Extraer texto de respuesta
    resp_text = response
    try:
        parsed = json.loads(response)
        resp_text = parsed.get("response", response)
    except json.JSONDecodeError:
        pass

    print(f"   Response: {resp_text[:200]}...")

    # Auditar
    audit = audit_response(resp_text)
    all_pass = all(v["pass"] for v in audit.values())

    if all_pass:
        print(f"   ‚úÖ APROBADO ‚Äî Cumple con todas las leyes")
        legal_pass += 1
    else:
        print(f"   ‚ùå ISSUES DETECTADAS:")
        legal_fail += 1
        for law_key, result in audit.items():
            if not result["pass"]:
                for issue in result["issues"]:
                    print(f"      ‚ö†Ô∏è [{law_key}] {issue}")

    legal_details.append({
        "name": test["name"],
        "law": test["law"],
        "passed": all_pass,
        "audit": audit,
    })

# ‚îÄ‚îÄ Resumen ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"\n{'=' * 65}")
print(f"‚öñÔ∏è  RESUMEN DE AUDITOR√çA LEGAL")
print(f"{'=' * 65}")
print(f"   Aprobados:  {legal_pass}/{len(legal_test_cases)} ({100*legal_pass/len(legal_test_cases):.0f}%)")
print(f"   Con issues: {legal_fail}/{len(legal_test_cases)}")

if legal_fail > 0:
    print(f"\n   ‚ö†Ô∏è El modelo necesita m√°s entrenamiento en compliance legal.")
    print(f"   Acciones recomendadas:")
    print(f"   1. Agregar m√°s ejemplos con disclaimers en el dataset (FASE 2)")
    print(f"   2. Reforzar el system prompt con reglas legales m√°s expl√≠citas")
    print(f"   3. Implementar Prompt 04 como post-procesamiento en producci√≥n")
else:
    print(f"\n   ‚úÖ El modelo produce respuestas legalmente correctas")

print(f"\n   üí° NOTA: En producci√≥n, el Prompt 04 se ejecuta como")
print(f"      post-procesamiento ADICIONAL (2da llamada al LLM)")
print(f"      para auditar cada respuesta antes de enviarla al usuario.")

‚öñÔ∏è  AUDITOR√çA LEGAL ‚Äî Prompt 04 de FASE 1

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
‚öñÔ∏è  Test 1/6: Precio sin contexto de disclaimer
   Ley: Ley 358-05
   User: Cu√°nto cuesta la Tucson?


KeyboardInterrupt: 

---
## 9Ô∏è‚É£ Guardar Modelo Fine-Tuned

Guarda los adaptadores LoRA (ligeros, ~50-100MB) y opcionalmente el modelo merged completo.

In [21]:
# ============================================================
# 9. GUARDAR MODELO (DIRECTO A GOOGLE DRIVE)
# ============================================================
import os
from pathlib import Path

# 1. Configurar ruta de Google Drive
DRIVE_MODELS = Path("/content/drive/MyDrive/OKLA/models")
DRIVE_MODELS.mkdir(parents=True, exist_ok=True)

# 2. Definir rutas de salida
# Guardamos los adaptadores DIRECTO en Drive (son ligeros, ~100MB)
ADAPTER_DIR = str(DRIVE_MODELS / "okla-llama3-adapter")

# El modelo merged (8GB) lo guardamos local primero para velocidad en conversi√≥n GGUF
MERGED_DIR = "/content/okla-llama3-merged"

# ‚îÄ‚îÄ 9a. Guardar adaptadores LoRA ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"üíæ Guardando adaptadores en Google Drive: {ADAPTER_DIR}...")
model.save_pretrained(ADAPTER_DIR)
tokenizer.save_pretrained(ADAPTER_DIR)

# Verificar tama√±o
adapter_size = sum(
    os.path.getsize(os.path.join(ADAPTER_DIR, f))
    for f in os.listdir(ADAPTER_DIR)
    if os.path.isfile(os.path.join(ADAPTER_DIR, f))
) / 1024**2

print(f"   ‚úÖ Adaptadores guardados exitosamente")
print(f"   Tama√±o: {adapter_size:.1f} MB")

# Listar archivos
print(f"   Archivos en Drive:")
for f in sorted(os.listdir(ADAPTER_DIR)):
    size = os.path.getsize(os.path.join(ADAPTER_DIR, f)) / 1024**2
    print(f"      {f} ({size:.1f} MB)")

üíæ Guardando adaptadores en Google Drive: /content/drive/MyDrive/OKLA/models/okla-llama3-adapter...
   ‚úÖ Adaptadores guardados exitosamente
   Tama√±o: 336.5 MB
   Archivos en Drive:
      README.md (0.0 MB)
      adapter_config.json (0.0 MB)
      adapter_model.safetensors (320.1 MB)
      chat_template.jinja (0.0 MB)
      tokenizer.json (16.4 MB)
      tokenizer_config.json (0.0 MB)


In [22]:
# ============================================================
# EXTRA: MERGE Y SUBIR MODELO COMPLETO A HUGGING FACE
# ============================================================
# Esta celda fusiona los adaptadores con el modelo base y
# sube el resultado completo a un nuevo repo.
# ‚ö†Ô∏è Requiere bastante RAM/VRAM.
# ============================================================

# 1. Instalar dependencias
!pip install -q transformers peft accelerate huggingface_hub

import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
from huggingface_hub import login

# 2. Configuraci√≥n
SOURCE_REPO = "gregorymorenoiem/okla-chatbot-llama3-8b"        # Tu repo de adaptadores
TARGET_REPO = "gregorymorenoiem/okla-chatbot-llama3-8b-m" # Nuevo repo para el modelo full

# 3. Autenticaci√≥n
print("üîë Autenticando... (Ingresa tu token con permisos WRITE si se solicita)")
login()

try:
    print(f"\n‚è≥ Cargando modelo desde: {SOURCE_REPO}...")
    # Cargamos en float16 para ahorrar memoria
    model = AutoPeftModelForCausalLM.from_pretrained(
        SOURCE_REPO,
        device_map="auto",
        torch_dtype=torch.float16,
        low_cpu_mem_usage=True,
    )
    tokenizer = AutoTokenizer.from_pretrained(SOURCE_REPO)

    print("‚è≥ Fusionando adaptadores (Merge)... this might take a minute...")
    merged_model = model.merge_and_unload()

    print(f"‚è≥ Subiendo modelo fusionado a: {TARGET_REPO}...")
    print("   (Esto puede tardar dependiendo de tu velocidad de subida)")
    merged_model.push_to_hub(TARGET_REPO, private=True)
    tokenizer.push_to_hub(TARGET_REPO, private=True)

    print("‚úÖ ¬°Listo! Modelo fusionado disponible en:")
    print(f"   https://huggingface.co/{TARGET_REPO}")

except Exception as e:
    print(f"\n‚ùå Error durante el proceso: {e}")
    print("   Sugerencia: Si es un error de memoria (OOM), reinicia el entorno y ejecuta SOLO esta celda.")

üîÑ Merging adaptadores con modelo base...
   (Esto puede tomar 5-10 min y usar bastante RAM)


Loading weights:   0%|          | 0/291 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

   ‚úÖ Modelo merged guardado en: /content/okla-llama3-merged
   Tama√±o: 15.0 GB


---
## üîü Exportar a GGUF (Para Producci√≥n con llama.cpp)

Convierte el modelo merged a formato **GGUF quantizado** para servir en producci√≥n con `llama.cpp` o `ollama`.  
Esto permite correr el modelo en CPU con ~4-6GB RAM.

In [31]:
# ============================================================
# 10. EXPORTAR A GGUF (FIXED TOKENIZER CONFIG)
# ============================================================
import os
from pathlib import Path
import json

# Definir rutas
if 'MERGED_DIR' not in globals():
    MERGED_DIR = "/content/okla-llama3-merged"

GGUF_DIR = "/content/okla-llama3-gguf"
os.makedirs(GGUF_DIR, exist_ok=True)

# ‚îÄ‚îÄ 1. VALIDACI√ìN DE ARCHIVOS ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"üîç Verificando archivos en {MERGED_DIR}...")
if not os.path.exists(MERGED_DIR) or not os.listdir(MERGED_DIR):
    raise FileNotFoundError(f"La carpeta {MERGED_DIR} est√° vac√≠a. Ejecuta la Celda 9b primero.")

# ‚îÄ‚îÄ 2. PARCHE TOKENIZER_CONFIG (CR√çTICO) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# A veces tokenizer_config.json tiene 'TokenizersBackend' que rompe la conversi√≥n
config_path = os.path.join(MERGED_DIR, "tokenizer_config.json")
if os.path.exists(config_path):
    try:
        with open(config_path, 'r') as f:
            config = json.load(f)

        changed = False
        # Fix tokenizer class if weird
        if config.get("tokenizer_class") == "TokenizersBackend":
            print("üîß Corrigiendo 'tokenizer_class' en tokenizer_config.json...")
            config["tokenizer_class"] = "PreTrainedTokenizerFast"
            changed = True

        if changed:
            with open(config_path, 'w') as f:
                json.dump(config, f, indent=2)
            print("‚úÖ tokenizer_config.json parcheado.")
    except Exception as e:
        print(f"‚ö†Ô∏è Advertencia al leer tokenizer_config: {e}")

# ‚îÄ‚îÄ 3. PREPARAR ENTORNO ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("üîß Actualizando librer√≠as para conversi√≥n...")
!pip install -q -U "transformers>=4.40.0" "tokenizers>=0.19.1"

# Instalaci√≥n de llama.cpp
if not os.path.exists("/content/llama.cpp"):
    !git clone --depth 1 https://github.com/ggerganov/llama.cpp /content/llama.cpp 2>/dev/null || true

!cd /content/llama.cpp && pip install -q -r requirements.txt 2>/dev/null

print("\nüîÑ Convirtiendo modelo a GGUF...")

# ‚îÄ‚îÄ 4. CONVERSI√ìN ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
convert_script = "/content/llama.cpp/convert_hf_to_gguf.py"
if not os.path.exists(convert_script):
    convert_script = "/content/llama.cpp/convert.py"

# Ejecutar conversi√≥n
!python {convert_script} \
    {MERGED_DIR} \
    --outfile {GGUF_DIR}/okla-llama3-8b-f16.gguf \
    --outtype f16

print("\n‚úÖ Modelo convertido a GGUF F16")

# ‚îÄ‚îÄ 5. QUANTIZACI√ìN ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print("\nüî® Compilando quantize...")
!cd /content/llama.cpp && make -j quantize 2>/dev/null || \
    cmake -B build && cmake --build build --config Release -t llama-quantize

QUANTIZE_BIN = "/content/llama.cpp/build/bin/llama-quantize"
if not os.path.exists(QUANTIZE_BIN):
    QUANTIZE_BIN = "/content/llama.cpp/quantize"

print("\nüìâ Quantizando a Q4_K_M...")
if os.path.exists(f"{GGUF_DIR}/okla-llama3-8b-f16.gguf"):
    !{QUANTIZE_BIN} \
        {GGUF_DIR}/okla-llama3-8b-f16.gguf \
        {GGUF_DIR}/okla-llama3-8b-q4_k_m.gguf \
        Q4_K_M
else:
    print("‚ùå No se encontr√≥ el archivo F16. La conversi√≥n fall√≥.")

# ‚îÄ‚îÄ 6. VERIFICACI√ìN FINAL ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"\nüì¶ Archivos GGUF generados en {GGUF_DIR}:")
files_found = False
if os.path.exists(GGUF_DIR):
    for f in sorted(os.listdir(GGUF_DIR)):
        if f.endswith(".gguf"):
            size = os.path.getsize(os.path.join(GGUF_DIR, f)) / 1024**3
            print(f"   ‚úÖ {f}: {size:.2f} GB")
            files_found = True

if files_found:
    print(f"\nüéØ Archivo listo: {GGUF_DIR}/okla-llama3-8b-q4_k_m.gguf")
    print("   üëâ AHORA EJECUTA LA CELDA 11 PARA GUARDARLO EN DRIVE")
else:
    print("‚ùå Fall√≥ la generaci√≥n del GGUF.")

üîç Verificando archivos en /content/okla-llama3-merged...
üîß Corrigiendo 'tokenizer_class' en tokenizer_config.json...
‚úÖ tokenizer_config.json parcheado.
üîß Actualizando librer√≠as para conversi√≥n...

üîÑ Convirtiendo modelo a GGUF...
INFO:hf-to-gguf:Loading model: okla-llama3-merged
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:output.weight,               torch.float16 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:token_embd.weight,           torch.float16 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.float16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight,       torch.float16 --> F16, shape = {14336, 4096}
INFO:hf-to-gguf:blk.0.ffn_gate.weight,       torch

---
## 1Ô∏è‚É£1Ô∏è‚É£ Guardar en Google Drive / HuggingFace

Guarda el modelo en **Google Drive** para que persista cuando la sesi√≥n de Colab expire.

> üí° **VS Code Colab Plugin:** Drive ya est√° montado desde la Secci√≥n 3.
> Los archivos se guardan en `Drive > OKLA > models/` y puedes accederlos
> despu√©s desde VS Code o desde el Finder.

In [32]:
# ============================================================
# 11. GUARDAR EN GOOGLE DRIVE (CORREGIDO - RUTA FIJA)
# ============================================================
import shutil
import os
from pathlib import Path

DRIVE_OUTPUT = Path("/content/drive/MyDrive/OKLA/models")
DRIVE_OUTPUT.mkdir(parents=True, exist_ok=True)

# ‚îÄ‚îÄ 1. Restaurar Adaptadores (Recuperaci√≥n) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
adapter_dest = DRIVE_OUTPUT / "okla-llama3-adapter"
print(f"üíæ Verificando adaptadores en: {adapter_dest}...")

if 'model' in globals():
    model.save_pretrained(adapter_dest)
    tokenizer.save_pretrained(adapter_dest)
    print(f"‚úÖ Adaptadores restaurados desde memoria.")
else:
    print("‚ÑπÔ∏è El modelo no est√° en memoria (paso omitido).")

# ‚îÄ‚îÄ 2. Copiar GGUF quantizado (~4.7GB) ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
# Usamos la ruta expl√≠cita que confirm√≥ el usuario
GGUF_PATH = Path("/content/okla-llama3-gguf/okla-llama3-8b-q4_k_m.gguf")

if GGUF_PATH.exists():
    gguf_dest = DRIVE_OUTPUT / "okla-llama3-8b-q4_k_m.gguf"
    print(f"üì¶ Copiando GGUF a Drive ({GGUF_PATH.stat().st_size / 1024**3:.2f} GB)...")
    print("   Esto puede tomar 2-5 min dependiendo de la conexi√≥n...")
    shutil.copy2(GGUF_PATH, gguf_dest)
    print(f"‚úÖ GGUF Q4_K_M ‚Üí {gguf_dest}")
else:
    print(f"‚ö†Ô∏è A√∫n no se encuentra el archivo: {GGUF_PATH}")
    print("   Aseg√∫rate de haber ejecutado la Celda 10 completamente.")

# ‚îÄ‚îÄ 3. Copiar m√©tricas ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
metrics_dest = DRIVE_OUTPUT / "training_metrics"
metrics_dest.mkdir(exist_ok=True)
if 'OUTPUT_DIR' in globals() and Path(OUTPUT_DIR).exists():
    for f in Path(OUTPUT_DIR).glob("*.json"):
        shutil.copy2(f, metrics_dest / f.name)
    print(f"‚úÖ M√©tricas ‚Üí {metrics_dest}")

# ‚îÄ‚îÄ Resumen ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
print(f"\n{'='*60}")
print(f"üìÅ Estado final en Google Drive:")
print(f"   {DRIVE_OUTPUT}")
print(f"{'='*60}")

üíæ Verificando adaptadores en: /content/drive/MyDrive/OKLA/models/okla-llama3-adapter...
‚úÖ Adaptadores restaurados desde memoria.
üì¶ Copiando GGUF a Drive (4.58 GB)...
   Esto puede tomar 2-5 min dependiendo de la conexi√≥n...
‚úÖ GGUF Q4_K_M ‚Üí /content/drive/MyDrive/OKLA/models/okla-llama3-8b-q4_k_m.gguf
‚úÖ M√©tricas ‚Üí /content/drive/MyDrive/OKLA/models/training_metrics

üìÅ Estado final en Google Drive:
   /content/drive/MyDrive/OKLA/models


In [None]:
import os
from pathlib import Path

# Ruta base
base_path = Path("/content/drive/MyDrive/OKLA/models")

print(f"üîç Inspeccionando: {base_path}")

if base_path.exists():
    print("‚úÖ La carpeta 'models' existe.")
    print("\nüìÇ Contenido:")
    for item in base_path.iterdir():
        type_str = "üìÅ DIR " if item.is_dir() else "üìÑ FILE"
        print(f"   {type_str} {item.name}")
        
        # Si encontramos la carpeta del adaptador, listamos su contenido para asegurar que no est√© vac√≠a
        if item.name == "okla-llama3-adapter" and item.is_dir():
            print("      ‚îî‚îÄ Contenido del adaptador:")
            for sub in item.iterdir():
                print(f"         - {sub.name}")
else:
    print(f"‚ùå La ruta {base_path} NO existe en el sistema de archivos de Colab.")
    print("   ¬øQuiz√°s Google Drive se desconect√≥?")

---
## 1Ô∏è‚É£2Ô∏è‚É£ Resumen y Pr√≥ximos Pasos (Dual-Mode v2.0)

### ‚úÖ Completado en FASE 3:
- Fine-tuning de Llama 3.1 8B Instruct con QLoRA (Dual-Mode)
- Dataset: ~3,000 conversaciones (40% SingleVehicle / 50% DealerInventory / 10% Edge)
- 21 intents SingleVehicle + 23 intents DealerInventory
- Pruebas interactivas + boundary enforcement
- GO/NO-GO Thresholds: JSON‚â•90%, Intent‚â•70%, Boundary‚â•85%, Anti-Halluc=100%
- Auditor√≠a Legal (4 leyes RD)
- Export GGUF Q4_K_M (~4.7 GB) con N_CTX=8192
- Backup en Google Drive

### üìä Arquitectura Dual-Mode

| Modo | System Prompt | Contexto | Funciones | Token Budget |
|------|---------------|----------|-----------|--------------|
| SingleVehicle | UN veh√≠culo fijo | ~500 tokens | Ninguna | ~2,200 |
| DealerInventory | Inventario completo | ~1,500 tokens | 4 funciones | ~3,300 |
| General | FAQ marketplace | ~200 tokens | Ninguna | ~1,600 |

### üìä Training Parameters

| Par√°metro | Valor |
|-----------|-------|
| Modelo base | unsloth/Meta-Llama-3.1-8B-Instruct |
| Quantizaci√≥n | QLoRA NF4 (4-bit) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Target modules | q, k, v, o, gate, up, down |
| Learning rate | 2e-4 cosine |
| Epochs | 3 |
| Max seq length | 8192 (auto-detected) |

### üìä Inference Parameters (Producci√≥n)

| Par√°metro | Valor | Raz√≥n |
|-----------|-------|-------|
| Temperature | 0.3 | Baja para minimizar alucinaciones |
| Repetition Penalty | 1.15 | Evita loops |
| Max Tokens | 600 | DI mode genera m√°s contexto |
| N_CTX | 8192 | Dual-mode requiere contexto amplio |

### üõ°Ô∏è Boundary Enforcement

| Regla | Modo | Comportamiento |
|-------|------|---------------|
| Single-scope | SV | Solo habla del veh√≠culo en contexto |
| No cross-dealer | DI | Solo inventario de ESTE dealer |
| No hallucination | Ambos | Nunca inventa veh√≠culos/precios |
| Legal compliance | Ambos | 4 leyes RD |

### üîú Pr√≥ximos Pasos

| Paso | Acci√≥n | Descripci√≥n |
|------|--------|-------------|
| 1 | `evaluate_before_deploy.py` | Validaci√≥n dual-mode antes de deploy |
| 2 | Upload GGUF | Subir a servidor LlmServer |
| 3 | Verificar | Test con ambos modos en staging |
| 4 | Deploy | Actualizar Kubernetes deployment |
| 5 | Monitor | Verificar logs de boundary enforcement |

### ‚ö†Ô∏è DIRECTIVA:
**Este LLM fine-tuned dual-mode es el sistema de NLU de OKLA.**
- **SingleVehicle**: Para ficha de producto individual
- **DealerInventory**: Para chat general del dealer con RAG
- Ambos modos comparten el MISMO modelo GGUF, diferenciados por system prompt

In [None]:
# ============================================================
# RESUMEN FINAL
# ============================================================
print("="*60)
print("üéâ FASE 3 COMPLETADA ‚Äî OKLA Chatbot LLM Fine-Tuning")
print("="*60)
print()
print("üì¶ Artefactos generados:")
print(f"   1. LoRA Adapters:  {ADAPTER_DIR}")
if MERGE_MODEL:
    print(f"   2. Merged Model:   {MERGED_DIR}")
print(f"   3. GGUF Q4_K_M:   {GGUF_DIR}/okla-llama3-8b-q4_k_m.gguf")
print(f"   4. Google Drive:   {DRIVE_OUTPUT}")
print()
print("üìä M√©tricas:")
print(f"   Train loss:  {metrics.get('train_loss', 'N/A')}")
print(f"   Eval loss:   {eval_metrics.get('eval_loss', 'N/A')}")
print(f"   JSON v√°lido: {json_rate:.0f}%")
print()
print("‚ö†Ô∏è DIRECTIVA: El ChatbotService con Dialogflow")
print("   DEBE SER ELIMINADO y reemplazado por este LLM.")
print("   Ver FASE 4 para instrucciones de deployment.")
print()
print("üîú Siguiente: FASE 4 ‚Äî Deployment en Producci√≥n")
print("="*60)