# Introduction to MLFlow 

MLFlow its an open source platform that tries to tackle the Machine Learning specific steps not included in the traditional software development lifecycle.

MLFlow covers: 
- Experiment tracking
- Reproducibility
- Deployment
- Model Registry

4 major components: 
- **MLFlow Tracking**: record metrics and parameters from training runs, query data from experiments and store models, artifacts and code.
- **MLFlow Models**: Standardize models for deployment, build customized models
- **Model Registry**: Store and version ML models
- **Model Projects**: Package ML code for reproducibility and repeatability

In MLFlow an **experiment** is a record of a model training round. They can be created and deleted and its tags, set. 



In [1]:
import mlflow 

mlflow.create_experiment('My first experiment')

MlflowException: Experiment 'My first experiment' already exists.

In [2]:
mlflow.set_experiment_tag('scikit-learn', 'lr')
mlflow.set_experiment('My first experiment')

<Experiment: artifact_location='file:///Users/el_fer/Repos/my_ds_notebooks/060%20MLOps/mlruns/234909114283278316', creation_time=1728050461180, experiment_id='234909114283278316', last_update_time=1728050461180, lifecycle_stage='active', name='My first experiment', tags={}>

# MLFlow tracking

- Model Metrics
- Parameters
- Code
- Other artifacts

MLFlow is organized in training runs 

New run == new model training 

A run in placed within an experiment 


In [3]:
run = mlflow.start_run()

run.info

<RunInfo: artifact_uri='file:///Users/el_fer/Repos/my_ds_notebooks/060%20MLOps/mlruns/234909114283278316/e1b435f69a034b2493f3a8bf946991b2/artifacts', end_time=None, experiment_id='234909114283278316', lifecycle_stage='active', run_id='e1b435f69a034b2493f3a8bf946991b2', run_name='adorable-crow-180', run_uuid='e1b435f69a034b2493f3a8bf946991b2', start_time=1728055219734, status='RUNNING', user_id='el_fer'>

## Logging to MLFlow Tracking

### Metrics 

In [6]:
mlflow.log_metric('accuracy', 0.9)
mlflow.log_metrics({'accuracy': 0.9, 'loss': 0.5})

### Parameters 

In [7]:
mlflow.log_param('n_jobs', 1)
mlflow.log_params({'n_jobs': 1, 'fir_intercept': False})

### Artifacts

In [9]:
# mlflow.log_artifact('file.py')
# mlflow.log_artifacts('./directory/')

In [11]:
mlflow.end_run()

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import mlflow
import mlflow.sklearn

# Crear un conjunto de datos pequeño
data = {
    'experience': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'salary': [45000, 50000, 60000, 65000, 70000, 80000, 85000, 90000, 100000, 105000]
}
df = pd.DataFrame(data)

# Definir las características (X) y el objetivo (y)
X = df[['experience']]
y = df['salary']

# Dividir los datos en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Configurar el experimento de MLflow
mlflow.set_experiment("Salary Prediction Linear Regression")

with mlflow.start_run():
    # Crear y entrenar el modelo
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Hacer predicciones
    y_pred = model.predict(X_test)

    # Calcular métricas de rendimiento
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    # Registrar el modelo y las métricas en MLflow
    mlflow.log_param("test_size", 0.2)
    mlflow.log_param("random_state", 42)
    
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("r2_score", r2)

    mlflow.sklearn.log_model(model, "linear_regression_model")

    print(f"Mean Squared Error: {mse}")
    print(f"R2 Score: {r2}")



Mean Squared Error: 4280618.311533848
R2 Score: 0.9931510107015459


In [14]:
!mlflow ui

[2024-10-04 17:41:01 +0200] [14407] [INFO] Starting gunicorn 22.0.0
[2024-10-04 17:41:01 +0200] [14407] [INFO] Listening at: http://127.0.0.1:5000 (14407)
[2024-10-04 17:41:01 +0200] [14407] [INFO] Using worker: sync
[2024-10-04 17:41:01 +0200] [14408] [INFO] Booting worker with pid: 14408
[2024-10-04 17:41:01 +0200] [14409] [INFO] Booting worker with pid: 14409
[2024-10-04 17:41:01 +0200] [14410] [INFO] Booting worker with pid: 14410
[2024-10-04 17:41:01 +0200] [14411] [INFO] Booting worker with pid: 14411
^C
[2024-10-04 17:56:40 +0200] [14407] [INFO] Handling signal: int
[2024-10-04 17:56:40 +0200] [14409] [INFO] Worker exiting (pid: 14409)
[2024-10-04 17:56:40 +0200] [14411] [INFO] Worker exiting (pid: 14411)
[2024-10-04 17:56:40 +0200] [14408] [INFO] Worker exiting (pid: 14408)
[2024-10-04 17:56:40 +0200] [14410] [INFO] Worker exiting (pid: 14410)


## Querying runs

In [18]:
runs = mlflow.search_runs()
runs

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.r2_score,metrics.mse,params.test_size,params.random_state,tags.mlflow.source.type,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.runName
0,e34fb6f162854422b658cf484142a391,428983509010758787,FINISHED,file:///Users/el_fer/Repos/my_ds_notebooks/060...,2024-10-04 15:40:35.779000+00:00,2024-10-04 15:40:38.407000+00:00,0.993151,4280618.0,0.2,42,LOCAL,/opt/anaconda3/envs/MLFlow/lib/python3.11/site...,el_fer,"[{""run_id"": ""e34fb6f162854422b658cf484142a391""...",gifted-slug-720


In [21]:
r_squared_filter = "metrics.r2_score > .70"

# Search runs
mlflow.search_runs(experiment_names=["Salary Prediction Linear Regression", "Unicorn Other Experiments"], 
                   filter_string=r_squared_filter, 
                   order_by=["metrics.r2_score DESC"])



Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.r2_score,metrics.mse,params.test_size,params.random_state,tags.mlflow.source.type,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.runName
0,e34fb6f162854422b658cf484142a391,428983509010758787,FINISHED,file:///Users/el_fer/Repos/my_ds_notebooks/060...,2024-10-04 15:40:35.779000+00:00,2024-10-04 15:40:38.407000+00:00,0.993151,4280618.0,0.2,42,LOCAL,/opt/anaconda3/envs/MLFlow/lib/python3.11/site...,el_fer,"[{""run_id"": ""e34fb6f162854422b658cf484142a391""...",gifted-slug-720


# MLFlow Models

Flavors to ease and uniform dealing with models from kears, tensorflow, pytorch, scikit-learn, xgboost, spark...

Some flavors support **Autolog**, like the sklearn one.

In [22]:
mlflow.sklearn.autolog()

# Crear un conjunto de datos pequeño
data = {
    'experience': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'salary': [45000, 50000, 60000, 65000, 70000, 80000, 85000, 90000, 100000, 105000]
}
df = pd.DataFrame(data)

# Definir las características (X) y el objetivo (y)
X = df[['experience']]
y = df['salary']

# Dividir los datos en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Iniciar el seguimiento del experimento de MLflow
with mlflow.start_run():
    # Crear y entrenar el modelo
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Hacer predicciones
    y_pred = model.predict(X_test)

    # Calcular métricas de rendimiento
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    # Los parámetros, métricas y el modelo se registran automáticamente con autolog
    print(f"Mean Squared Error: {mse}")
    print(f"R2 Score: {r2}")



Mean Squared Error: 4280618.311533848
R2 Score: 0.9931510107015459


In [23]:
mlflow.search_runs()

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.training_mean_absolute_error,metrics.training_score,metrics.training_r2_score,metrics.training_root_mean_squared_error,...,params.fit_intercept,params.test_size,params.random_state,tags.mlflow.source.type,tags.estimator_name,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.log-model.history,tags.mlflow.runName,tags.estimator_class
0,26d2147d74ed4018b902d56c51fec053,428983509010758787,FINISHED,file:///Users/el_fer/Repos/my_ds_notebooks/060...,2024-10-04 16:38:11.790000+00:00,2024-10-04 16:38:13.729000+00:00,948.275862,0.995862,0.995862,1137.147065,...,True,,,LOCAL,LinearRegression,/opt/anaconda3/envs/MLFlow/lib/python3.11/site...,el_fer,"[{""run_id"": ""26d2147d74ed4018b902d56c51fec053""...",nervous-calf-942,sklearn.linear_model._base.LinearRegression
1,e34fb6f162854422b658cf484142a391,428983509010758787,FINISHED,file:///Users/el_fer/Repos/my_ds_notebooks/060...,2024-10-04 15:40:35.779000+00:00,2024-10-04 15:40:38.407000+00:00,,,,,...,,0.2,42.0,LOCAL,,/opt/anaconda3/envs/MLFlow/lib/python3.11/site...,el_fer,"[{""run_id"": ""e34fb6f162854422b658cf484142a391""...",gifted-slug-720,
