# üìì MLflow - Ejemplos b√°sicos
# 
En esta notebook veremos ejemplos sencillos de:
- MLflow Tracking
- MLflow Projects
- MLflow Models
- MLflow Model Registry

## Instalaci√≥n y configuraci√≥n inicial
Instalaci√≥n de MLflow (descomentar si no lo tienes instalado)
!pip install mlflow


In [18]:
# ========================
# MLflow - Ejemplos b√°sicos
# ========================

# Instalaci√≥n (si es necesario)
# !pip install mlflow

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
import os
import yaml
import pandas as pd
# Crear carpeta para artefactos si no existe
os.makedirs("outputs", exist_ok=True)


## üß© 1. MLflow Tracking
 
### üìÑ Descripci√≥n
Con **MLflow Tracking** registramos:
 - Par√°metros (por ejemplo, hiperpar√°metros de un modelo).
 - M√©tricas (precisi√≥n, error, R2, etc.).
 - Artefactos (modelos entrenados, im√°genes, datasets).
 
Todo queda guardado en un historial organizado para an√°lisis y comparaci√≥n.


## Nombre del proyecto y url de MLFlow local

## Datos de entrenamiento

In [19]:
# Datos ficticios
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# 2. Guardar los datasets como CSV
os.makedirs("data", exist_ok=True)
pd.DataFrame(X_train, columns=["x"]).to_csv("data/X_train.csv", index=False)
pd.DataFrame(X_test, columns=["x"]).to_csv("data/X_test.csv", index=False)
pd.DataFrame(y_train, columns=["y"]).to_csv("data/y_train.csv", index=False)
pd.DataFrame(y_test, columns=["y"]).to_csv("data/y_test.csv", index=False)


## Experimento 1

In [None]:
mlflow.set_tracking_uri("http://localhost:5050")  # o la IP donde est√© el Docker
mlflow.set_experiment("Experimento 1")

<Experiment: artifact_location='/mlflow/mlruns/10', creation_time=1747185790994, experiment_id='10', last_update_time=1747185790994, lifecycle_stage='active', name='Experimento 1', tags={}>

In [5]:
# Ejemplo sencillo de Tracking

#mlflow.end_run()
# Empezar una corrida (run)
with mlflow.start_run(run_name="linear_regression_example 2"):
    # Modelo
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Par√°metro (en este caso no hay hiperpar√°metros, as√≠ que lo simulamos)
    mlflow.log_param("model_type", "LinearRegression")
    
    # M√©trica
    score = model.score(X_test, y_test)
    mlflow.log_metric("r2_score", score)
    
    # Guardar el modelo
    mlflow.sklearn.log_model(model, artifact_path="model")
    
    print(f"Modelo guardado con R2: {score:.2f}")




Modelo guardado con R2: 1.00
üèÉ View run linear_regression_example 2 at: http://localhost:5000/#/experiments/10/runs/7cfb317442bc4315a9651f90d8ca81a6
üß™ View experiment at: http://localhost:5000/#/experiments/10


In [21]:
from sklearn.linear_model import Ridge

with mlflow.start_run(run_name="ridge_regression_example 3"):
    alpha = 0.7
    model = Ridge(alpha=alpha)
    model.fit(X_train, y_train)

    mlflow.log_param("model_type", "Ridge")
    mlflow.log_param("alpha", alpha)

    score = model.score(X_test, y_test)
    mlflow.log_metric("r2_score", score)

    mlflow.sklearn.log_model(model, artifact_path="model")
    print(f"Modelo Ridge guardado con R2: {score:.2f}")
    X_train_path = "data/X_train.csv"
    X_test_path = "data/X_test.csv"
    y_train_path = "data/y_train.csv"
    y_test_path = "data/y_test.csv"
    mlflow.log_artifact(X_train_path, artifact_path="datasets")
    mlflow.log_artifact(X_test_path, artifact_path="datasets")
    mlflow.log_artifact(y_train_path, artifact_path="datasets")
    mlflow.log_artifact(y_test_path, artifact_path="datasets")

mlflow.sklearn.log_model(model, "model", input_example=X_test[:1])
    



Modelo Ridge guardado con R2: 1.00
üèÉ View run ridge_regression_example 3 at: http://localhost:5000/#/experiments/10/runs/52756365581e406f90847cfd2e6c03d8
üß™ View experiment at: http://localhost:5000/#/experiments/10


Downloading artifacts: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 7/7 [00:00<00:00, 1750.86it/s]


<mlflow.models.model.ModelInfo at 0x28def876e20>

## Experimento 2

In [22]:
mlflow.set_experiment("Experimento 2")

<Experiment: artifact_location='/mlflow/mlruns/11', creation_time=1747185998181, experiment_id='11', last_update_time=1747185998181, lifecycle_stage='active', name='Experimento 2', tags={}>

In [23]:
import mlflow
import numpy as np

mlflow.end_run()  
with mlflow.start_run(run_name="training_with_curve"):
    for epoch in range(10):
        # Simulaci√≥n de loss que baja
        loss = np.exp(-epoch / 5)

        # Logue√°s la m√©trica con `step` para que MLflow genere la curva
        mlflow.log_metric("loss", loss, step=epoch)


üèÉ View run lyrical-steed-120 at: http://localhost:5000/#/experiments/10/runs/27e00108940b411daf1e34619414d9a3
üß™ View experiment at: http://localhost:5000/#/experiments/10
üèÉ View run training_with_curve at: http://localhost:5000/#/experiments/11/runs/e3be1cd6b71547799d64f117ef7aa324
üß™ View experiment at: http://localhost:5000/#/experiments/11


## Experimento 3

In [24]:
mlflow.set_experiment("Experimento 3")

mlflow.set_tag("model_name", "RandomForest")
mlflow.set_tag("experiment", "baseline")

models = {"LinearRegression": LinearRegression(), "Ridge": Ridge(alpha=0.5)}



In [25]:
mlflow.end_run()
for name, model in models.items():
    with mlflow.start_run(run_name=name):
        mlflow.set_tag("model_name", name)

        for epoch in range(10):
            loss = np.exp(-epoch / 5)  # Ejemplo de p√©rdida
            mlflow.log_metric("loss", loss, step=epoch)

        mlflow.sklearn.log_model(model, artifact_path="model")



üèÉ View run melodic-koi-59 at: http://localhost:5000/#/experiments/12/runs/2723526c437d4545a3523ca99314fa90
üß™ View experiment at: http://localhost:5000/#/experiments/12




üèÉ View run LinearRegression at: http://localhost:5000/#/experiments/12/runs/7e7e7eda5ed64ff587742b041ab1ccf0
üß™ View experiment at: http://localhost:5000/#/experiments/12




üèÉ View run Ridge at: http://localhost:5000/#/experiments/12/runs/338b62080dcf4d6492ef3ea3ae6e3b2e
üß™ View experiment at: http://localhost:5000/#/experiments/12


## Seleccionar el mejor modelo

In [26]:
mlflow.set_experiment("Seleccionar mejor modelo 2")

import mlflow
import mlflow.sklearn
import numpy as np
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Datos ficticios
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Modelos a evaluar
models = {"LinearRegression": LinearRegression(), "Ridge": Ridge(alpha=0.5)}

best_model = None
best_score = np.inf  # Mejor score encontrado (por ejemplo, R2 o MSE)

# Comenzamos los experimentos para cada modelo
for name, model in models.items():
    with mlflow.start_run(run_name=name):
        mlflow.set_tag("model_name", name)

        # Entrenamiento
        model.fit(X_train, y_train)

        # Predicciones y m√©tricas
        y_pred = model.predict(X_test)
        score = mean_squared_error(y_test, y_pred)  # Usamos MSE como ejemplo
        
        # Registro de par√°metros y m√©tricas
        mlflow.log_param("model_type", name)
        mlflow.log_metric("mse", score)
        
        # Guardamos el modelo
        mlflow.sklearn.log_model(model, artifact_path="model")
        
        # Evaluamos cu√°l es el mejor modelo
        if score < best_score:  # MSE m√°s bajo es mejor
            best_score = score
            best_model = model
            mlflow.log_param("best_model", True)  # Marcamos el modelo ganador

# Ahora puedes registrar el mejor modelo global
if best_model:
    with mlflow.start_run(run_name="Best_Model"):
        mlflow.log_param("best_model_name", best_model.__class__.__name__)
        mlflow.sklearn.log_model(best_model, artifact_path="best_model")
        print( best_model.__class__.__name__)
best_model_to_save=best_model



üèÉ View run LinearRegression at: http://localhost:5000/#/experiments/15/runs/6ef2323d7e4b4bb0baa811bd9b70ca5f
üß™ View experiment at: http://localhost:5000/#/experiments/15




üèÉ View run Ridge at: http://localhost:5000/#/experiments/15/runs/618c7ee328af4d0eb47103804258207d
üß™ View experiment at: http://localhost:5000/#/experiments/15




LinearRegression
üèÉ View run Best_Model at: http://localhost:5000/#/experiments/15/runs/f247da6cb4764815ad27f2afffaa63fc
üß™ View experiment at: http://localhost:5000/#/experiments/15


## HYPERPARAMETER TUNING

In [27]:
import mlflow
import mlflow.sklearn
from mlflow import log_param, log_metric, log_artifact

from sklearn.datasets import load_diabetes
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

import json
import joblib

# Cargar los datos
X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Definir el modelo y el espacio de b√∫squeda
model = Ridge()
param_grid = {
    'alpha': [0.1, 1.0, 10.0],
    'solver': ['auto', 'svd']
}

# Configurar GridSearchCV
grid_search = GridSearchCV(model, param_grid, cv=3, scoring="neg_mean_squared_error", return_train_score=True)

# Iniciar experimento en MLflow
mlflow.set_experiment("Ridge Regression GridSearch")
with mlflow.start_run(run_name="GridSearchCV") as run:
    # Guardar el grid como par√°metro
    mlflow.log_param("param_grid", param_grid)
    
    # Guardar el grid como archivo JSON
    with open("param_grid.json", "w") as f:
        json.dump(param_grid, f)
    mlflow.log_artifact("param_grid.json")
    
    # Ejecutar GridSearch
    grid_search.fit(X_train, y_train)
        
    # Loguear cada intento como sub-run
    for i, params in enumerate(grid_search.cv_results_["params"]):
        with mlflow.start_run(run_name=f"trial_{i}", nested=True):
            mlflow.log_params(params)
            mlflow.log_metric("mean_test_score", grid_search.cv_results_["mean_test_score"][i])
            mlflow.log_metric("mean_train_score", grid_search.cv_results_["mean_train_score"][i])
    
    # Evaluar el mejor modelo
    best_model = grid_search.best_estimator_
    y_pred = best_model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mlflow.log_metric("test_mse", mse)
    
    # Guardar el modelo y objeto grid_search completo
    mlflow.sklearn.log_model(best_model, "best_model")
    joblib.dump(grid_search, "grid_search.pkl")
    mlflow.log_artifact("grid_search.pkl")


üèÉ View run trial_0 at: http://localhost:5000/#/experiments/14/runs/c1250daca7714a0695e29ae989ceb82b
üß™ View experiment at: http://localhost:5000/#/experiments/14
üèÉ View run trial_1 at: http://localhost:5000/#/experiments/14/runs/85a41343c09b4f569d1b0da0f35bf3be
üß™ View experiment at: http://localhost:5000/#/experiments/14
üèÉ View run trial_2 at: http://localhost:5000/#/experiments/14/runs/ec4c45717c0b422bac64ebf5f565a642
üß™ View experiment at: http://localhost:5000/#/experiments/14
üèÉ View run trial_3 at: http://localhost:5000/#/experiments/14/runs/facd69bbb7554f0390b84c9de0b92634
üß™ View experiment at: http://localhost:5000/#/experiments/14
üèÉ View run trial_4 at: http://localhost:5000/#/experiments/14/runs/7146d18a507245b3b7e39123ed8fc074
üß™ View experiment at: http://localhost:5000/#/experiments/14
üèÉ View run trial_5 at: http://localhost:5000/#/experiments/14/runs/f62b2e3420c1415ca1d44a32bfd40033
üß™ View experiment at: http://localhost:5000/#/experiments/1



üèÉ View run GridSearchCV at: http://localhost:5000/#/experiments/14/runs/29a9327e163e4421a10bcefc24f7b677
üß™ View experiment at: http://localhost:5000/#/experiments/14


## üß© 2. MLflow Projects
 
### üìÑ Descripci√≥n
**MLflow Projects** define un est√°ndar para empaquetar proyectos de ML, haci√©ndolos:
 - Reproducibles (cualquiera puede correrlo igual).
 - Ejecutables localmente o en la nube.
 - Versionables junto al c√≥digo.
 
<!-- Utiliza un archivo `MLproject` (YAML) para describir dependencias y comandos de entrada. -->


In [23]:
# Creamos un archivo MLproject para definir el proyecto

project_yaml = """
name: simple_linear_regression
conda_env: conda.yaml

entry_points:
  main:
    parameters:
      alpha: {type: float, default: 0.5}
    command: "python train.py --alpha {alpha}"
"""

with open("MLproject", "w") as f:
    f.write(project_yaml)

# Creamos un entorno conda de ejemplo
conda_yaml = """
name: simple-mlflow-env
dependencies:
  - python=3.8
  - scikit-learn
  - pip
  - pip:
      - mlflow
"""

with open("conda.yaml", "w") as f:
    f.write(conda_yaml)

print("Archivo MLproject y conda.yaml creados üéØ")


Archivo MLproject y conda.yaml creados üéØ


## üß© 3. MLflow Models
 
### üìÑ Descripci√≥n
**MLflow Models** permite:
 - Guardar modelos entrenados en formatos est√°ndar.
 - Cargarlos f√°cilmente para predicci√≥n o despliegue.
 - Exportarlos a m√∫ltiples plataformas (Docker, REST API, mobile).

## üß© 4. MLflow Model Registry
 
### üìÑ Descripci√≥n
**MLflow Model Registry** gestiona:
 - Versiones de modelos.
 - Etapas del ciclo de vida (Staging, Production, Archived).
 - Aprobaciones y revisiones de modelos.
 
 **Importante:** Para usarlo realmente se necesita un Tracking Server conectado a una base de datos.
 Aqu√≠ simulamos un ejemplo sencillo en local.

In [35]:
import mlflow
from mlflow.tracking import MlflowClient

# Paso 1: Especificar el run_id y la ruta del modelo dentro del run
run_id = "b420b6df5a92430f801f5b9804d804dd"
model_path = "best_model"  # o como lo hayas llamado: "model", "sk_model", etc.

# Paso 2: Registrar el modelo
model_uri = f"runs:/{run_id}/{model_path}"
model_name = "ridge_model_v1"

mlflow.register_model(model_uri=model_uri, name=model_name)
model_name

Registered model 'ridge_model_v1' already exists. Creating a new version of this model...
2025/05/14 21:50:34 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: ridge_model_v1, version 8
Created version '8' of model 'ridge_model_v1'.


'ridge_model_v1'

In [None]:
# Buscar si el experimento ya existe
experiment_name = "Experimento 1"

experiment = mlflow.get_experiment_by_name(experiment_name)

if experiment is None:
    experiment_id = mlflow.create_experiment(experiment_name)
    print(f"Experimento '{experiment_name}' creado üéØ")
else:
    experiment_id = experiment.experiment_id
    print(f"Experimento '{experiment_name}' ya exist√≠a, usando id {experiment_id} ‚úîÔ∏è")

# Seteamos el experimento como activo
mlflow.set_experiment(experiment_name)

# Registramos el modelo
with mlflow.start_run(run_name="registry_test_run"):
    mlflow.sklearn.log_model(model, "model", registered_model_name="LinearRegressionModel")



Traceback (most recent call last):
  File "c:\Users\guill\OneDrive\Documentos\simplegit\ITBA\ITBA\lib\site-packages\mlflow\store\tracking\file_store.py", line 329, in search_experiments
    exp = self._get_experiment(exp_id, view_type)
  File "c:\Users\guill\OneDrive\Documentos\simplegit\ITBA\ITBA\lib\site-packages\mlflow\store\tracking\file_store.py", line 427, in _get_experiment
    meta = FileStore._read_yaml(experiment_dir, FileStore.META_DATA_FILE_NAME)
  File "c:\Users\guill\OneDrive\Documentos\simplegit\ITBA\ITBA\lib\site-packages\mlflow\store\tracking\file_store.py", line 1373, in _read_yaml
    return _read_helper(root, file_name, attempts_remaining=retries)
  File "c:\Users\guill\OneDrive\Documentos\simplegit\ITBA\ITBA\lib\site-packages\mlflow\store\tracking\file_store.py", line 1366, in _read_helper
    result = read_yaml(root, file_name)
  File "c:\Users\guill\OneDrive\Documentos\simplegit\ITBA\ITBA\lib\site-packages\mlflow\utils\file_utils.py", line 310, in read_yaml
    r

Experimento 'Experimento 1' ya exist√≠a, usando id 921763340117685197 ‚úîÔ∏è


Registered model 'LinearRegressionModel' already exists. Creating a new version of this model...
Created version '3' of model 'LinearRegressionModel'.


## Buscar modelos

In [30]:
import mlflow
# 
# Cliente de MLflow
client = mlflow.tracking.MlflowClient()
# B√∫squeda de modelos con filtros
models = client.search_registered_models(filter_string="name LIKE '%'")
for model in models:
    print(model.name)

LinearRegressionModel
ModeloGanador
linear_regression_with_hyperparameter
ridge_model_v1



## Usar el modelo

In [None]:
# Datos ficticios
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
 
import mlflow

# Cargar desde una etapa espec√≠fica (si usas ciclo de vida de MLflow)
loaded_model = mlflow.pyfunc.load_model("models:/ridge_model_v1/7")  # Otras opciones: staging, archived

# Usar el modelo para predicciones
data= X_train[[0]]
predictions = loaded_model.predict(data)
predictions

array([3.82860968])

In [37]:
%%writefile app.py

import uvicorn
import mlflow
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List

# Crear la aplicaci√≥n FastAPI
app = FastAPI(title="Modelo ML API", description="API para servir predicciones del modelo de MLflow")

# Ac√° va la configuraci√≥n de CORS
from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # O limit√° a ["http://localhost:8000"] si quer√©s
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Cargar el modelo al iniciar la aplicaci√≥n
# Reemplaza "models:/nombre_modelo/version" con tu ruta real
model = mlflow.pyfunc.load_model("models:/ridge_model_v1/8")

# Definir el esquema de la solicitud
class PredictionRequest(BaseModel):
    features: List[List[float]]

# Definir el esquema de la respuesta
class PredictionResponse(BaseModel):
    predictions: List

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    try:
        # Convertir las caracter√≠sticas a un array numpy
        features = np.array(request.features)
        # Realizar la predicci√≥n
        predictions = model.predict(features).tolist()
        # Devolver las predicciones
        return PredictionResponse(predictions=predictions)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
def health():
    return {"status": "ok"}



Overwriting app.py


In [None]:
import uvicorn
import mlflow
import numpy as np
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List

uvicorn.run("app:app", host="localhost", port=7000, reload=True)
#curl -X POST "http://localhost:7000/predict" -H "Content-Type: application/json" -d "{\"features\": [[1.2], [3.5]]}"


INFO:     Will watch for changes in these directories: ['c:\\Users\\guill\\OneDrive\\Documentos\\simplegit\\ITBA']
INFO:     Uvicorn running on http://localhost:7000 (Press CTRL+C to quit)
INFO:     Started reloader process [2084] using WatchFiles
