# Cuarto notebook
Aquí se verá: cargar configuración central, cargar el mejor checkpoint entrenado, evaluar en validación con métricas prácticas (Precision/Recall/F1 y AP aproximado @ IoU=0.5) sobre un subconjunto (CPU-safe), registrar métricas en MLflow y registrar el modelo en el Model Registry (usando SQLite).
## Objetivos
1. Cargar `project_config.json` y `labelmap.json`.
2. Localizar el mejor checkpoint guardado (best_*.pt) en `models/local_checkpoints/`.
3. Cargar el modelo y ejecutar inferencia sobre un subconjunto de validación (CPU-safe).
4. Calcular métricas:
   - Precision / Recall / F1 (por clase y global) con IoU >= 0.5
   - AP aproximado @0.5 por clase (por ranking de score)
5. Loggear resultados en MLflow (SQLite):
   - parámetros de evaluación
   - métricas globales y por clase
   - artefactos (checkpoint evaluado + reporte)
6. Registrar el modelo en MLflow Model Registry con nombre estable y versión nueva.

In [1]:
"""
- Importa librerías necesarias para evaluación, carga del modelo y MLflow registry.
"""

import os
import json
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Tuple


from PIL import Image
from tqdm import tqdm

import mlflow
from mlflow.tracking import MlflowClient
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.transforms import functional as F


OSError: [WinError 1114] Error en una rutina de inicialización de biblioteca de vínculos dinámicos (DLL). Error loading "c:\Users\Johnny\AppData\Local\Python\pythoncore-3.12-64\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies.

1. Carga de archivos .json y validacion de estructura del proyecto

In [None]:
"""
En esta secccion se  Carga project_config.json y labelmap.json creados en el Notebook 01.
Aemas define rutas absolutas sin depender de la carpeta actual del notebook.
"""

def find_project_root(start: Path, max_up: int = 8) -> Path:
    cur = start.resolve()
    for _ in range(max_up):
        if (cur / "data" / "processed" / "project_config.json").exists():
            return cur
        cur = cur.parent
    raise FileNotFoundError("No se encontró data/processed/project_config.json. Ejecuta Notebook 01.")

PROJECT_ROOT = find_project_root(Path.cwd())
PROCESSED_DIR = (PROJECT_ROOT / "data" / "processed").resolve()

PROJECT_CONFIG_PATH = (PROCESSED_DIR / "project_config.json").resolve()
LABELMAP_PATH = (PROCESSED_DIR / "labelmap.json").resolve()

with open(PROJECT_CONFIG_PATH, "r", encoding="utf-8") as f:
    project_config = json.load(f)

with open(LABELMAP_PATH, "r", encoding="utf-8") as f:
    labelmap = json.load(f)

TRAIN_IMG_DIR = Path(project_config["train_dir"])
VAL_IMG_DIR = Path(project_config["val_dir"])

TARGET_CLASSES = project_config["target_classes"]
target_cat_ids = [int(x) for x in labelmap["target_category_ids"]]

VAL_JSON = (PROCESSED_DIR / "coco_person_car_airplane_val.json").resolve()

MODELS_DIR = (PROJECT_ROOT / "models" / "local_checkpoints").resolve()
MODELS_DIR.mkdir(parents=True, exist_ok=True)

print("PROJECT_ROOT:", PROJECT_ROOT)
print("VAL_JSON:", VAL_JSON)
print("TARGET_CLASSES:", TARGET_CLASSES)
print("target_cat_ids:", target_cat_ids)
print("MODELS_DIR:", MODELS_DIR)


PROJECT_ROOT: C:\Users\Johnny\Desktop\IA-final
VAL_JSON: C:\Users\Johnny\Desktop\IA-final\data\processed\coco_person_car_airplane_val.json
TARGET_CLASSES: ['person', 'car', 'airplane']
target_cat_ids: [1, 3, 5]
MODELS_DIR: C:\Users\Johnny\Desktop\IA-final\models\local_checkpoints


In [None]:
"""
Esta celda:
- Busca el archivo best_*.pt más reciente en MODELS_DIR.
- Falla con error claro si no existe ninguno.
"""

def find_latest_best_checkpoint(models_dir: Path) -> Path:
    cands = sorted(models_dir.glob("best_*.pt"), key=lambda p: p.stat().st_mtime, reverse=True)
    if not cands:
        raise FileNotFoundError(
            "No se encontró ningún checkpoint best_*.pt en models/local_checkpoints. "
            "Asegúrate de entrenar y guardar checkpoints en el Notebook 03."
        )
    return cands[0]

BEST_CKPT_PATH = find_latest_best_checkpoint(MODELS_DIR)
print("BEST_CKPT_PATH:", BEST_CKPT_PATH)


BEST_CKPT_PATH: C:\Users\Johnny\Desktop\IA-final\models\local_checkpoints\best_frcnn_cpu_base_train_20260201_083448.pt


In [None]:
"""
Esta celda:
- Carga el checkpoint.
- Reconstruye el modelo Faster R-CNN con la cantidad correcta de clases.
- Carga state_dict y deja el modelo en eval() para inferencia.
"""

DEVICE = torch.device("cpu")

def build_model(num_classes: int):
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(weights=None)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model

ckpt = torch.load(BEST_CKPT_PATH, map_location="cpu")

ckpt_target_classes = ckpt["target_classes"]
NUM_CLASSES = len(ckpt_target_classes) + 1

coco_to_internal = {int(k): int(v) for k, v in ckpt["coco_to_internal"].items()} if isinstance(next(iter(ckpt["coco_to_internal"].keys())), str) else ckpt["coco_to_internal"]
internal_to_name = {int(k): v for k, v in ckpt["internal_to_name"].items()} if isinstance(next(iter(ckpt["internal_to_name"].keys())), str) else ckpt["internal_to_name"]

model = build_model(NUM_CLASSES)
model.load_state_dict(ckpt["model_state_dict"])
model.to(DEVICE)
model.eval()

print("Checkpoint classes:", ckpt_target_classes)
print("NUM_CLASSES:", NUM_CLASSES)
print("coco_to_internal:", coco_to_internal)


Checkpoint classes: ['person', 'car', 'airplane']
NUM_CLASSES: 4
coco_to_internal: {1: 1, 3: 2, 5: 3}


In [None]:
"""
Esta celda:
- Carga el JSON reducido de validación (Notebook 02).
- Construye índices: image_id -> file_name y image_id -> GT boxes por clase interna.
- Convierte bbox COCO (x,y,w,h) a (x1,y1,x2,y2).
- Remapea category_id COCO a label interno 1..K.
"""

if not VAL_JSON.exists():
    raise FileNotFoundError(f"No existe VAL_JSON: {VAL_JSON}. Ejecuta Notebook 02.")

with open(VAL_JSON, "r", encoding="utf-8") as f:
    val_coco = json.load(f)

val_images = val_coco["images"]
val_anns = val_coco["annotations"]

img_id_to_file = {img["id"]: img["file_name"] for img in val_images}

gt_by_image: Dict[int, Dict[int, List[List[float]]]] = {}  # image_id -> internal_label -> [xyxy]
for ann in val_anns:
    img_id = ann["image_id"]
    cid = int(ann["category_id"])
    if cid not in coco_to_internal:
        continue
    internal_label = int(coco_to_internal[cid])
    x, y, w, h = ann["bbox"]
    box = [x, y, x + w, y + h]
    gt_by_image.setdefault(img_id, {}).setdefault(internal_label, []).append(box)

print("Val images:", len(val_images))
print("Val annotations:", len(val_anns))
print("Ejemplo image_id:", val_images[0]["id"], "file:", val_images[0]["file_name"])


Val images: 500
Val annotations: 2218
Ejemplo image_id: 397133 file: 000000397133.jpg


In [None]:
"""
Esta celda:
- Implementa IoU para boxes xyxy.
- Implementa matching greedy por clase:
  - un GT solo puede emparejarse con una predicción
  - IoU >= threshold => TP, si no => FP
  - GT no emparejado => FN
"""

def iou_xyxy(a: List[float], b: List[float]) -> float:
    ax1, ay1, ax2, ay2 = a
    bx1, by1, bx2, by2 = b

    inter_x1 = max(ax1, bx1)
    inter_y1 = max(ay1, by1)
    inter_x2 = min(ax2, bx2)
    inter_y2 = min(ay2, by2)

    inter_w = max(0.0, inter_x2 - inter_x1)
    inter_h = max(0.0, inter_y2 - inter_y1)
    inter_area = inter_w * inter_h

    area_a = max(0.0, ax2 - ax1) * max(0.0, ay2 - ay1)
    area_b = max(0.0, bx2 - bx1) * max(0.0, by2 - by1)

    union = area_a + area_b - inter_area
    return 0.0 if union <= 0 else inter_area / union

def match_predictions_to_gt(
    preds: List[Tuple[List[float], float]],  # (box, score) for a class
    gts: List[List[float]],
    iou_thr: float
) -> Tuple[int, int, int, List[Tuple[float, int]]]:
    """
    Retorna:
    - TP, FP, FN
    - ranked list (score, is_tp) para AP aproximado
    """
    preds_sorted = sorted(preds, key=lambda x: x[1], reverse=True)
    gt_used = [False] * len(gts)

    tp = 0
    fp = 0
    ranked = []

    for p_box, p_score in preds_sorted:
        best_iou = 0.0
        best_j = -1
        for j, gt_box in enumerate(gts):
            if gt_used[j]:
                continue
            cur_iou = iou_xyxy(p_box, gt_box)
            if cur_iou > best_iou:
                best_iou = cur_iou
                best_j = j

        if best_iou >= iou_thr and best_j >= 0:
            gt_used[best_j] = True
            tp += 1
            ranked.append((p_score, 1))
        else:
            fp += 1
            ranked.append((p_score, 0))

    fn = sum(1 for u in gt_used if not u)
    return tp, fp, fn, ranked


In [None]:
"""
Esta celda:
- Ejecuta inferencia sobre un subconjunto de imágenes de validación para no reventar CPU.
- Filtra por score_threshold.
- Calcula TP/FP/FN por clase con IoU >= 0.5.
- Calcula Precision/Recall/F1 por clase y global.
- Calcula AP aproximado por clase (métrica práctica por ranking de score).
"""

EVAL_MAX_IMAGES = 300   
SCORE_THRESHOLD = 0.5
IOU_THRESHOLD = 0.5

val_image_ids = [img["id"] for img in val_images]
eval_ids = val_image_ids[:min(EVAL_MAX_IMAGES, len(val_image_ids))]

# stats por clase interna: 1..K
K = len(TARGET_CLASSES)
tp_c = {c: 0 for c in range(1, K + 1)}
fp_c = {c: 0 for c in range(1, K + 1)}
fn_c = {c: 0 for c in range(1, K + 1)}

# para AP aproximado: lista de (score, is_tp) por clase
ranked_c: Dict[int, List[Tuple[float, int]]] = {c: [] for c in range(1, K + 1)}
gt_count_c = {c: 0 for c in range(1, K + 1)}  # total GT por clase en subset

@torch.no_grad()
def predict_image(model, img_path: Path):
    img = Image.open(img_path).convert("RGB")
    img_t = F.to_tensor(img).to(DEVICE)
    out = model([img_t])[0]
    return out

for img_id in tqdm(eval_ids, desc="eval_infer"):
    file_name = img_id_to_file[img_id]
    img_path = VAL_IMG_DIR / file_name

    out = predict_image(model, img_path)

    boxes = out["boxes"].cpu().tolist()
    scores = out["scores"].cpu().tolist()
    labels = out["labels"].cpu().tolist()  # interno 1..K (según checkpoint)

    # agrupar preds por clase
    preds_by_class: Dict[int, List[Tuple[List[float], float]]] = {c: [] for c in range(1, K + 1)}
    for b, s, l in zip(boxes, scores, labels):
        l = int(l)
        if l < 1 or l > K:
            continue
        if s >= SCORE_THRESHOLD:
            preds_by_class[l].append((b, float(s)))

    # GT por clase
    gt_for_img = gt_by_image.get(img_id, {})
    for c in range(1, K + 1):
        gts = gt_for_img.get(c, [])
        gt_count_c[c] += len(gts)
        preds = preds_by_class.get(c, [])

        tpi, fpi, fni, ranked = match_predictions_to_gt(preds, gts, IOU_THRESHOLD)
        tp_c[c] += tpi
        fp_c[c] += fpi
        fn_c[c] += fni
        ranked_c[c].extend(ranked)

def safe_div(a: float, b: float) -> float:
    return 0.0 if b == 0 else a / b

metrics_by_class = {}
for c in range(1, K + 1):
    tp = tp_c[c]
    fp = fp_c[c]
    fn = fn_c[c]
    prec = safe_div(tp, tp + fp)
    rec = safe_div(tp, tp + fn)
    f1 = safe_div(2 * prec * rec, prec + rec) if (prec + rec) > 0 else 0.0

    metrics_by_class[c] = {
        "name": internal_to_name.get(c, f"class_{c}"),
        "tp": tp, "fp": fp, "fn": fn,
        "precision": prec,
        "recall": rec,
        "f1": f1,
        "gt_count": gt_count_c[c],
        "pred_count": tp + fp,
    }

# global micro
TP = sum(tp_c.values())
FP = sum(fp_c.values())
FN = sum(fn_c.values())
P_micro = safe_div(TP, TP + FP)
R_micro = safe_div(TP, TP + FN)
F1_micro = safe_div(2 * P_micro * R_micro, P_micro + R_micro) if (P_micro + R_micro) > 0 else 0.0

print("Global micro metrics:")
print("TP:", TP, "FP:", FP, "FN:", FN)
print("Precision:", P_micro)
print("Recall   :", R_micro)
print("F1       :", F1_micro)

print("\nPor clase:")
for c in range(1, K + 1):
    m = metrics_by_class[c]
    print(m["name"], "| P:", round(m["precision"], 4), "R:", round(m["recall"], 4), "F1:", round(m["f1"], 4), "GT:", m["gt_count"])


eval_infer: 100%|██████████| 300/300 [7:13:53<00:00, 86.78s/it]     

Global micro metrics:
TP: 900 FP: 877 FN: 346
Precision: 0.5064715813168261
Recall   : 0.7223113964686998
F1       : 0.5954349983460139

Por clase:
person | P: 0.5 R: 0.7399 F1: 0.5967 GT: 1111
car | P: 0.5726 R: 0.5826 F1: 0.5776 GT: 115
airplane | P: 0.6875 R: 0.55 F1: 0.6111 GT: 20





In [None]:
"""
Esta celda:
- Calcula AP aproximado por clase usando la lista rankeada (score, is_tp).
- AP aquí es aproximado (no COCO mAP), pero sirve para comparar versiones y registrar en MLflow.
"""

def average_precision_from_ranked(ranked: List[Tuple[float, int]], total_gt: int) -> float:
    if total_gt == 0:
        return 0.0
    if not ranked:
        return 0.0

    ranked_sorted = sorted(ranked, key=lambda x: x[0], reverse=True)
    tp_running = 0
    fp_running = 0

    precisions = []
    recalls = []

    for _, is_tp in ranked_sorted:
        if is_tp == 1:
            tp_running += 1
        else:
            fp_running += 1
        prec = safe_div(tp_running, tp_running + fp_running)
        rec = safe_div(tp_running, total_gt)
        precisions.append(prec)
        recalls.append(rec)

    # AP por integración tipo "step": suma de precision en puntos donde recall incrementa
    ap = 0.0
    prev_rec = 0.0
    for p, r in zip(precisions, recalls):
        if r > prev_rec:
            ap += p * (r - prev_rec)
            prev_rec = r
    return ap

ap_by_class = {}
for c in range(1, K + 1):
    ap = average_precision_from_ranked(ranked_c[c], gt_count_c[c])
    ap_by_class[c] = ap

mAP = sum(ap_by_class.values()) / max(1, K)

print("AP@0.5 aproximado por clase:")
for c in range(1, K + 1):
    print(internal_to_name.get(c, f"class_{c}"), "AP:", round(ap_by_class[c], 4))

print("\nmAP@0.5 aproximado:", round(mAP, 4))


AP@0.5 aproximado por clase:
person AP: 0.6368
car AP: 0.4877
airplane AP: 0.55

mAP@0.5 aproximado: 0.5581


In [None]:
"""
Esta celda:
- Configura MLflow en SQLite.
- Crea un run de evaluación.
- Loggea parámetros y métricas globales y por clase.
- Loggea artefactos: checkpoint, reporte JSON.
- Registra el modelo en Model Registry:
  - nombre estable: frcnn_coco_cpu_person_car_airplane
  - crea una nueva versión apuntando al artefacto del run.
"""

MLFLOW_DB = (PROJECT_ROOT / "mlflow.db").resolve()
mlflow.set_tracking_uri(f"sqlite:///{MLFLOW_DB.as_posix()}")

EXPERIMENT_NAME = "object_detection_coco_cpu"
mlflow.set_experiment(EXPERIMENT_NAME)

client = MlflowClient()

registered_model_name = "frcnn_coco_cpu_person_car_airplane"

# crear registered model si no existe
existing = [m.name for m in client.search_registered_models()]
if registered_model_name not in existing:
    client.create_registered_model(registered_model_name)

eval_run_name = f"eval_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

report = {
    "checkpoint_path": str(BEST_CKPT_PATH),
    "eval_max_images": EVAL_MAX_IMAGES,
    "score_threshold": SCORE_THRESHOLD,
    "iou_threshold": IOU_THRESHOLD,
    "global": {
        "tp": TP, "fp": FP, "fn": FN,
        "precision_micro": P_micro,
        "recall_micro": R_micro,
        "f1_micro": F1_micro,
        "mAP50_approx": mAP,
    },
    "by_class": metrics_by_class,
    "ap50_approx_by_class": {internal_to_name.get(c, str(c)): ap_by_class[c] for c in ap_by_class},
}

report_path = (MODELS_DIR / f"eval_report_{eval_run_name}.json").resolve()
with open(report_path, "w", encoding="utf-8") as f:
    json.dump(report, f, indent=2, ensure_ascii=False)

with mlflow.start_run(run_name=eval_run_name) as run:
    run_id = run.info.run_id

    # params
    mlflow.log_param("eval_max_images", EVAL_MAX_IMAGES)
    mlflow.log_param("score_threshold", SCORE_THRESHOLD)
    mlflow.log_param("iou_threshold", IOU_THRESHOLD)
    mlflow.set_tag("stage", "eval")
    mlflow.set_tag("checkpoint_used", BEST_CKPT_PATH.name)
    mlflow.set_tag("classes", ",".join(TARGET_CLASSES))
    mlflow.set_tag("registered_model_name", registered_model_name)

    # global metrics
    mlflow.log_metric("precision_micro", P_micro)
    mlflow.log_metric("recall_micro", R_micro)
    mlflow.log_metric("f1_micro", F1_micro)
    mlflow.log_metric("mAP50_approx", mAP)

    # per class metrics
    for c in range(1, K + 1):
        cname = internal_to_name.get(c, f"class_{c}")
        m = metrics_by_class[c]
        mlflow.log_metric(f"{cname}_precision", m["precision"])
        mlflow.log_metric(f"{cname}_recall", m["recall"])
        mlflow.log_metric(f"{cname}_f1", m["f1"])
        mlflow.log_metric(f"{cname}_ap50_approx", ap_by_class[c])

    # artifacts
    mlflow.log_artifact(str(report_path), artifact_path="reports")
    mlflow.log_artifact(str(BEST_CKPT_PATH), artifact_path="model_ckpt")
    mlflow.log_artifact(str(PROJECT_CONFIG_PATH), artifact_path="artifacts")
    mlflow.log_artifact(str(LABELMAP_PATH), artifact_path="artifacts")

    # registrar modelo desde el artefacto en este run
    model_uri = f"runs:/{run_id}/model_ckpt/{BEST_CKPT_PATH.name}"
    mv = client.create_model_version(
        name=registered_model_name,
        source=model_uri,
        run_id=run_id
    )

    # tags/version notes
    client.set_model_version_tag(registered_model_name, mv.version, "eval_max_images", str(EVAL_MAX_IMAGES))
    client.set_model_version_tag(registered_model_name, mv.version, "score_threshold", str(SCORE_THRESHOLD))
    client.set_model_version_tag(registered_model_name, mv.version, "iou_threshold", str(IOU_THRESHOLD))
    client.set_model_version_tag(registered_model_name, mv.version, "mAP50_approx", str(mAP))

print("Evaluación registrada en MLflow.")
print("Registered model:", registered_model_name)
print("Checkpoint registrado:", BEST_CKPT_PATH.name)


2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.schemas
2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.tables
2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.types
2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.constraints
2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.defaults
2026/02/02 08:53:49 INFO alembic.runtime.plugins: setup plugin alembic.autogenerate.comments
2026/02/02 08:53:49 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/02/02 08:53:49 INFO mlflow.store.db.utils: Updating database tables
2026/02/02 08:53:49 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/02/02 08:53:49 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2026/02/02 08:53:50 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2026/02/02 08:53:50 INFO alembic.runtime

Evaluación registrada en MLflow.
Registered model: frcnn_coco_cpu_person_car_airplane
Checkpoint registrado: best_frcnn_cpu_base_train_20260201_083448.pt
