# üßë‚Äçüíª Introducci√≥n a MLFLow (Parte I): Entrenamiento y Registro de Modelos de Regresi√≥n Log√≠stica.
Integrantes: Tob√≠as Romero **(2021214011)** y Jenifer Roa **(2022214006)**
---

## 1. Importaci√≥n de librer√≠as.

In [11]:
import mlflow
from mlflow.tracking import MlflowClient
import mlflow.sklearn
from mlflow.models import infer_signature

import warnings
warnings.filterwarnings('ignore')

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    classification_report,
    confusion_matrix,
    roc_curve
)

## 2. Configuraci√≥n inicial de MLFlow.

In [3]:
mlflow.set_tracking_uri("file:./mlruns")

experiment_name = "Logistic_Regression_Classification"
mlflow.set_experiment(experiment_name)

print(f"‚úì Tracking URI: {mlflow.get_tracking_uri()}")
print(f" Artifacto: {mlflow.get_experiment_by_name(experiment_name).artifact_location}")

2025/10/30 18:36:21 INFO mlflow.tracking.fluent: Experiment with name 'Logistic_Regression_Classification' does not exist. Creating a new experiment.


‚úì Tracking URI: file:./mlruns
 Artifacto: file:C:/Users/Usuario/PycharmProjects/MLFlowLaboratory/mlruns/995893522778280439


In [12]:
descripcion = """
Experimento base para clasificaci√≥n con LogisticRegression.
Incluye baseline, m√©tricas (accuracy, F1) y comparaci√≥n con normalizaci√≥n.
"""
tags_exp = {
    "owner": "Tob√≠as Romero",
    "dataset": "Iris v1",
    "curso": "Laboratorio MLflow",
    "model_family": "LogisticRegression",
}

client = MlflowClient()
exp = mlflow.get_experiment_by_name(experiment_name)

if exp and getattr(exp, "lifecycle_stage", None) == "deleted":
    client.restore_experiment(exp.experiment_id)
    exp = mlflow.get_experiment_by_name(experiment_name)

if exp is None:
    exp_id = client.create_experiment(experiment_name, tags=tags_exp)
else:
    exp_id = exp.experiment_id
    for k, v in tags_exp.items():
        client.set_experiment_tag(exp_id, k, v)

client.set_experiment_tag(exp_id, "mlflow.note.content", descripcion)

exp_actualizado = mlflow.get_experiment(exp_id)
print("‚úì Experimento:", exp_actualizado.name, "| ID:", exp_actualizado.experiment_id)
print("‚úì Tags del experimento:", exp_actualizado.tags)
print("‚úì Descripci√≥n:", exp_actualizado.tags.get("mlflow.note.content", "(sin descripci√≥n)"))

‚úì Experimento: Logistic_Regression_Classification | ID: 995893522778280439
‚úì Tags del experimento: {'curso': 'Laboratorio MLflow', 'dataset': 'Iris v1', 'mlflow.experimentKind': 'custom_model_development', 'mlflow.note.content': '\nExperimento base para clasificaci√≥n con LogisticRegression.\nIncluye baseline, m√©tricas (accuracy, F1) y comparaci√≥n con normalizaci√≥n.\n', 'model_family': 'LogisticRegression', 'owner': 'Tob√≠as Romero'}
‚úì Descripci√≥n: 
Experimento base para clasificaci√≥n con LogisticRegression.
Incluye baseline, m√©tricas (accuracy, F1) y comparaci√≥n con normalizaci√≥n.



## 3. Carga y exploraci√≥n inicial de datos.

In [4]:
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name='target')

In [5]:
print(f"Dataset: Breast Cancer Wisconsin")
print(f"N√∫mero de muestras: {X.shape[0]}")
print(f"N√∫mero de caracter√≠sticas: {X.shape[1]}")
print(f"Clases: {data.target_names}")
print(f"Distribuci√≥n de clases:\n{y.value_counts()}")
print()

Dataset: Breast Cancer Wisconsin
N√∫mero de muestras: 569
N√∫mero de caracter√≠sticas: 30
Clases: ['malignant' 'benign']
Distribuci√≥n de clases:
target
1    357
0    212
Name: count, dtype: int64



In [6]:
print("Primeras filas del dataset:")
print(X.head())
print()
print("Estad√≠sticas descriptivas:")
print(X.describe())
print()

Primeras filas del dataset:
   mean radius  mean texture  mean perimeter  mean area  mean smoothness  \
0        17.99         10.38          122.80     1001.0          0.11840   
1        20.57         17.77          132.90     1326.0          0.08474   
2        19.69         21.25          130.00     1203.0          0.10960   
3        11.42         20.38           77.58      386.1          0.14250   
4        20.29         14.34          135.10     1297.0          0.10030   

   mean compactness  mean concavity  mean concave points  mean symmetry  \
0           0.27760          0.3001              0.14710         0.2419   
1           0.07864          0.0869              0.07017         0.1812   
2           0.15990          0.1974              0.12790         0.2069   
3           0.28390          0.2414              0.10520         0.2597   
4           0.13280          0.1980              0.10430         0.1809   

   mean fractal dimension  ...  worst radius  worst texture  wor

## 3.1 Divisi√≥n del dataset.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

print(f"Conjunto de entrenamiento: {X_train.shape[0]} muestras")
print(f"Conjunto de prueba: {X_test.shape[0]} muestras")
print()

Conjunto de entrenamiento: 455 muestras
Conjunto de prueba: 114 muestras



## 4. Funci√≥n auxiliar para evualuar y visualizar.

In [9]:
def evaluate_and_visualize_model(model, X_test, y_test, run_name):
    """
    Eval√∫a el modelo y genera visualizaciones
    """
    # Predicciones
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]

    # Calcular m√©tricas
    metrics = {
        'accuracy': accuracy_score(y_test, y_pred),
        'precision': precision_score(y_test, y_pred),
        'recall': recall_score(y_test, y_pred),
        'f1_score': f1_score(y_test, y_pred),
        'roc_auc': roc_auc_score(y_test, y_pred_proba)
    }

    print(f"\nM√©tricas del modelo ({run_name}):")
    print("-" * 50)
    for metric_name, value in metrics.items():
        print(f"{metric_name}: {value:.4f}")

    # Reporte de clasificaci√≥n
    print("\nReporte de Clasificaci√≥n:")
    print(classification_report(y_test, y_pred, target_names=['Maligno', 'Benigno']))

    # Matriz de confusi√≥n
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))

    # Matriz de confusi√≥n
    cm = confusion_matrix(y_test, y_pred)
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
                xticklabels=['Maligno', 'Benigno'],
                yticklabels=['Maligno', 'Benigno'])
    axes[0].set_title(f'Matriz de Confusi√≥n - {run_name}')
    axes[0].set_ylabel('Valor Real')
    axes[0].set_xlabel('Predicci√≥n')

    # Curva ROC
    fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
    axes[1].plot(fpr, tpr, color='darkorange', lw=2,
                 label=f'ROC curve (AUC = {metrics["roc_auc"]:.2f})')
    axes[1].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
    axes[1].set_xlim([0.0, 1.0])
    axes[1].set_ylim([0.0, 1.05])
    axes[1].set_xlabel('Tasa de Falsos Positivos')
    axes[1].set_ylabel('Tasa de Verdaderos Positivos')
    axes[1].set_title(f'Curva ROC - {run_name}')
    axes[1].legend(loc="lower right")
    axes[1].grid(alpha=0.3)

    plt.tight_layout()

    return metrics, fig

## 5. Primer experimento.

In [13]:
with mlflow.start_run(run_name="logistic_regression_default") as run:
    try:
        train_df = X_train.copy()
        train_df["target"] = y_train.reset_index(drop=True)

        test_df = X_test.copy()
        test_df["target"] = y_test.reset_index(drop=True)

        # Crear objetos Dataset desde pandas
        ds_train = mlflow.data.from_pandas(
            train_df, source="sklearn.breast_cancer", name="breast_cancer_train_v1"
        )
        ds_test = mlflow.data.from_pandas(
            test_df, source="sklearn.breast_cancer", name="breast_cancer_test_v1"
        )

    except Exception as e:
        mlflow.set_tag("dataset", "Breast Cancer Wisconsin")
        print("Aviso: no se pudo usar mlflow.data; se dej√≥ tag 'dataset'. Error:", e)

    mlflow.log_input(ds_train, context="training")
    mlflow.log_input(ds_test, context="test")

    print(f"\n Run ID: {run.info.run_id}")
    print(f"Run Name: logistic_regression_default")

    # crear pipeline
    print("\nCreando pipeline de preprocesamiento y modelo...")

    # Definir hiperpar√°metros
    hyperparameters = {
        'C': 1.0,
        'penalty': 'l2',
        'solver': 'lbfgs',
        'max_iter': 1000,
        'random_state': 42
    }

    # Crear el pipeline
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression(**hyperparameters))
    ])

    # Entrenar el modelo
    print("Entrenando el modelo...")
    pipeline.fit(X_train, y_train)
    print("Modelo entrenado exitosamente")

    # evaluar modelo
    metrics, fig = evaluate_and_visualize_model(
        pipeline, X_test, y_test,
        "Modelo Default"
    )

    # registrar en mlflow

    # Registrar hiperpar√°metros
    print("\n Registrando hiperpar√°metros en MLflow...")
    mlflow.log_params(hyperparameters)

    # Registrar m√©tricas
    print("Registrando m√©tricas en MLflow...")
    mlflow.log_metrics(metrics)

    # Registrar informaci√≥n adicional
    mlflow.log_param("test_size", 0.2)
    mlflow.log_param("dataset", "breast_cancer")
    mlflow.log_param("n_features", X_train.shape[1])
    mlflow.log_param("n_samples_train", X_train.shape[0])
    mlflow.log_param("n_samples_test", X_test.shape[0])

    # Guardar la visualizaci√≥n como artifact
    print("Guardando visualizaciones...")
    plt.savefig("confusion_matrix_roc_default.png", dpi=100, bbox_inches='tight')
    mlflow.log_artifact("confusion_matrix_roc_default.png")
    plt.close()

    # Guardar el modelo con firma
    print("Guardando modelo en MLflow...")
    signature = infer_signature(X_train, pipeline.predict(X_train))

    mlflow.sklearn.log_model(
        sk_model=pipeline,
        artifact_path="model",
        signature=signature,
        input_example=X_train.iloc[:5],
        registered_model_name="breast_cancer_classifier_v1"
    )

    # Agregar tags
    print("Agregando tags y metadatos...")
    mlflow.set_tags({
        "model_type": "Logistic Regression",
        "framework": "scikit-learn",
        "dataset": "Breast Cancer Wisconsin",
        "preprocessing": "StandardScaler",
        "author": "Tob√≠as Romero",
        "version": "1.0",
        "purpose": "baseline_model"
    })

    # Agregar descripci√≥n del run
    mlflow.set_tag("mlflow.note.content",
                   "Modelo baseline de regresi√≥n log√≠stica con par√°metros por defecto. "
                   "Utiliza regularizaci√≥n L2 con C=1.0 y solver lbfgs. "
                   "Este modelo sirve como punto de referencia para comparaciones futuras.")

    print("\n Experimento 1 completado y registrado en MLflow")
    print(f" Run ID: {run.info.run_id}")

print()


 Run ID: d4334c10258c462b895f1787e2146b1e
Run Name: logistic_regression_default

Creando pipeline de preprocesamiento y modelo...
Entrenando el modelo...
Modelo entrenado exitosamente

M√©tricas del modelo (Modelo Default):
--------------------------------------------------
accuracy: 0.9825
precision: 0.9861
recall: 0.9861
f1_score: 0.9861
roc_auc: 0.9954

Reporte de Clasificaci√≥n:
              precision    recall  f1-score   support

     Maligno       0.98      0.98      0.98        42
     Benigno       0.99      0.99      0.99        72

    accuracy                           0.98       114
   macro avg       0.98      0.98      0.98       114
weighted avg       0.98      0.98      0.98       114


 Registrando hiperpar√°metros en MLflow...
Registrando m√©tricas en MLflow...
Guardando visualizaciones...




Guardando modelo en MLflow...
Agregando tags y metadatos...

 Experimento 1 completado y registrado en MLflow
 Run ID: d4334c10258c462b895f1787e2146b1e



Registered model 'breast_cancer_classifier_v1' already exists. Creating a new version of this model...
Created version '2' of model 'breast_cancer_classifier_v1'.


## 5.1 Segundo Experimento.

In [15]:
with mlflow.start_run(run_name="logistic_regression_optimized") as run:
    try:
        train_df = X_train.copy()
        train_df["target"] = y_train.reset_index(drop=True)

        test_df = X_test.copy()
        test_df["target"] = y_test.reset_index(drop=True)

        # Crear objetos Dataset desde pandas
        ds_train = mlflow.data.from_pandas(
            train_df, source="sklearn.breast_cancer", name="breast_cancer_train_v1"
        )
        ds_test = mlflow.data.from_pandas(
            test_df, source="sklearn.breast_cancer", name="breast_cancer_test_v1"
        )
    except Exception as e:
        mlflow.set_tag("dataset", "Breast Cancer Wisconsin")
        print("Aviso: no se pudo usar mlflow.data; se dej√≥ tag 'dataset'. Error:", e)

    mlflow.log_input(ds_train, context="training")
    mlflow.log_input(ds_test, context="test")

    print(f"\n Run ID: {run.info.run_id}")
    print(f"Run Name: logistic_regression_optimized")

    hyperparameters_v2 = {
        'C': 0.1,
        'penalty': 'l2',
        'solver': 'saga',
        'max_iter': 2000,
        'random_state': 42
    }

    print("\nCreando pipeline...")
    pipeline_v2 = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression(**hyperparameters_v2))
    ])

    # Entrenar el modelo
    print("Entrenando el modelo optimizado...")
    pipeline_v2.fit(X_train, y_train)
    print("Modelo entrenado exitosamente")

    metrics_v2, fig_v2 = evaluate_and_visualize_model(
        pipeline_v2, X_test, y_test,
        "Modelo Optimizado"
    )

    # Registrar en MLflow
    print("\n Registrando hiperpar√°metros en MLflow...")
    mlflow.log_params(hyperparameters_v2)

    print("Registrando m√©tricas en MLflow...")
    mlflow.log_metrics(metrics_v2)

    mlflow.log_param("test_size", 0.2)
    mlflow.log_param("dataset", "breast_cancer")
    mlflow.log_param("n_features", X_train.shape[1])
    mlflow.log_param("n_samples_train", X_train.shape[0])
    mlflow.log_param("n_samples_test", X_test.shape[0])

    print("Guardando visualizaciones...")
    plt.savefig("confusion_matrix_roc_optimized.png", dpi=100, bbox_inches='tight')
    mlflow.log_artifact("confusion_matrix_roc_optimized.png")
    plt.close()

    print("Guardando modelo en MLflow...")
    signature = infer_signature(X_train, pipeline_v2.predict(X_train))

    mlflow.sklearn.log_model(
        sk_model=pipeline_v2,
        artifact_path="model",
        signature=signature,
        input_example=X_train.iloc[:5],
        registered_model_name="breast_cancer_classifier_v2"
    )

    print("Agregando tags y metadatos...")
    mlflow.set_tags({
        "model_type": "Logistic Regression",
        "framework": "scikit-learn",
        "dataset": "Breast Cancer Wisconsin",
        "preprocessing": "StandardScaler",
        "author": "Tob√≠as",
        "version": "2.0",
        "purpose": "optimized_model",
        "optimization": "increased_regularization"
    })

    mlflow.set_tag("mlflow.note.content",
                   "Modelo optimizado con mayor regularizaci√≥n (C=0.1) y solver SAGA. "
                   "Se busca reducir el overfitting y mejorar la generalizaci√≥n del modelo. "
                   "Los resultados se comparan con el modelo baseline.")

    print("\n Experimento 2 completado y registrado en MLflow")
    print(f"Run ID: {run.info.run_id}")

print()


 Run ID: 87867caebb90452dbc9356fa857eac55
Run Name: logistic_regression_optimized

Creando pipeline...
Entrenando el modelo optimizado...
Modelo entrenado exitosamente

M√©tricas del modelo (Modelo Optimizado):
--------------------------------------------------
accuracy: 0.9737
precision: 0.9726
recall: 0.9861
f1_score: 0.9793
roc_auc: 0.9957

Reporte de Clasificaci√≥n:
              precision    recall  f1-score   support

     Maligno       0.98      0.95      0.96        42
     Benigno       0.97      0.99      0.98        72

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114


 Registrando hiperpar√°metros en MLflow...
Registrando m√©tricas en MLflow...
Guardando visualizaciones...




Guardando modelo en MLflow...
Agregando tags y metadatos...

 Experimento 2 completado y registrado en MLflow
Run ID: 87867caebb90452dbc9356fa857eac55



Successfully registered model 'breast_cancer_classifier_v2'.
Created version '1' of model 'breast_cancer_classifier_v2'.
