# Introduction to MLFlow 

MLFlow its an open source platform that tries to tackle the Machine Learning specific steps not included in the traditional software development lifecycle.

MLFlow covers: 
- Experiment tracking
- Reproducibility
- Deployment
- Model Registry

4 major components: 
- **MLFlow Tracking**: record metrics and parameters from training runs, query data from experiments and store models, artifacts and code.
- **MLFlow Models**: Standardize models for deployment, build customized models
- **Model Registry**: Store and version ML models
- **Model Projects**: Package ML code for reproducibility and repeatability

In MLFlow an **experiment** is a record of a model training round. They can be created and deleted and its tags, set. 



In [None]:
import mlflow 

mlflow.create_experiment('My first experiment')

In [None]:
mlflow.get_tracking_uri()

In [None]:
mlflow.set_experiment_tag('scikit-learn', 'lr')
mlflow.set_experiment('My first experiment')

# MLFlow tracking

- Model Metrics
- Parameters
- Code
- Other artifacts

MLFlow is organized in training runs 

New run == new model training 

A run in placed within an experiment 


In [None]:
run = mlflow.start_run()

run.info

## Logging to MLFlow Tracking

### Metrics 

In [None]:
mlflow.log_metric('accuracy', 0.9)
mlflow.log_metrics({'accuracy': 0.9, 'loss': 0.5})

### Parameters 

In [None]:
mlflow.log_param('n_jobs', 1)
mlflow.log_params({'n_jobs': 1, 'fir_intercept': False})

### Artifacts

In [None]:
# mlflow.log_artifact('file.py')
# mlflow.log_artifacts('./directory/')

In [None]:
mlflow.end_run()

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import mlflow
import mlflow.sklearn

# Crear un conjunto de datos pequeño
data = {
    'experience': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'salary': [45000, 50000, 60000, 65000, 70000, 80000, 85000, 90000, 100000, 105000]
}
df = pd.DataFrame(data)

# Definir las características (X) y el objetivo (y)
X = df[['experience']]
y = df['salary']

# Dividir los datos en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Configurar el experimento de MLflow
mlflow.set_experiment("Salary Prediction Linear Regression")

with mlflow.start_run():
    # Crear y entrenar el modelo
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Hacer predicciones
    y_pred = model.predict(X_test)

    # Calcular métricas de rendimiento
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    # Registrar el modelo y las métricas en MLflow
    mlflow.log_param("test_size", 0.2)
    mlflow.log_param("random_state", 42)
    
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("r2_score", r2)

    mlflow.sklearn.log_model(model, "linear_regression_model")

    print(f"Mean Squared Error: {mse}")
    print(f"R2 Score: {r2}")

In [None]:
!mlflow ui

## Querying runs

In [None]:
runs = mlflow.search_runs()
runs

In [None]:
r_squared_filter = "metrics.r2_score > .70"

# Search runs
mlflow.search_runs(experiment_names=["Salary Prediction Linear Regression", "Unicorn Other Experiments"], 
                   filter_string=r_squared_filter, 
                   order_by=["metrics.r2_score DESC"])

# MLFlow Models

Flavors to ease and uniform dealing with models from kears, tensorflow, pytorch, scikit-learn, xgboost, spark...

Some flavors support **Autolog**, like the sklearn one.

In [None]:
mlflow.sklearn.autolog()

# Crear un conjunto de datos pequeño
data = {
    'experience': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'salary': [45000, 50000, 60000, 65000, 70000, 80000, 85000, 90000, 100000, 105000]
}
df = pd.DataFrame(data)

# Definir las características (X) y el objetivo (y)
X = df[['experience']]
y = df['salary']

# Dividir los datos en conjunto de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Iniciar el seguimiento del experimento de MLflow
with mlflow.start_run():
    # Crear y entrenar el modelo
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Hacer predicciones
    y_pred = model.predict(X_test)

    # Calcular métricas de rendimiento
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    # Los parámetros, métricas y el modelo se registran automáticamente con autolog
    print(f"Mean Squared Error: {mse}")
    print(f"R2 Score: {r2}")

In [None]:
mlflow.search_runs()

# MLFlow Rest API

MLFlow has rest APIs to interact with the different elements

## Model API 

Enables the interaction with models (save, load flavor, log)


In [None]:
mlflow.sklearn.save_model(model, "my first model")

In [None]:
# Log model to MLflow Tracking
mlflow.sklearn.log_model(model, "lr_tracking")

# Get the last run
run = mlflow.last_active_run()

# Get the run_id of the above run
run_id = run.info.run_id

# Load model from MLflow Tracking
model = mlflow.sklearn.load_model(f"runs:/{run_id}/lr_tracking")

## Custom models

MLFlow supports custom models not covered by all the pre-existing flavors.

# Serving 

MLFlow can provide endpoints to serve models

# MLFlow Model Registry

- Model: model logged to MLFlow Tracking
- Registered model: with a version and eligible to a stage
- Model Version: invrements each new registered model
- Model Stage: can be assigned one of: none, staging, production, archived

Working with the model registry can be done via the client or via the ui 



In [None]:
from mlflow import MlflowClient

client = MlflowClient()

In [None]:
client

In [None]:
client.create_registered_model(name='Unicorn')

In [None]:
client.search_registered_models(filter_string='name like "Unicorn"')

# Registering Models 

Registering models adds versioning, track changes, improves collaboration, etc.

In [None]:
mlflow.register_model(model_uri, name)

# MLFlow Projects 

This section provides an introduction to MLflow Projects, a component of MLflow designed to simplify the machine learning lifecycle. MLflow Projects package code into reusable, reproducible units, enabling easier collaboration and execution across various environments, including local machines and the cloud. 

At its core, a Project is a directory containing ML code, described by a YAML file called MLproject, which defines the project’s name, entry points, and environment. An example of an MLflow Project is presented, including the setup of Python environments and dependencies, the use of autologging for metrics tracking, and the training of a linear regression model.