# Mlflow

<img src='./images/xd-logo.png' width='300px' align='right' style="padding: 15px">

By the end of this notebook, you will be able to:

- Recognize the components of MLflow
- Use MLflow to track experiments
- Use MLflow to log metrics, parameters, and artifacts
- Use MLflow to log models
- Use the MLflow UI to visualize experiments
- Use the MLflow API to query experiments

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It consists of the following components.

<img src='./images/learn-core-components.png' width='400px' align='center' style="padding: 15px">
>

Source: [MLflow](https://mlflow.org/docs/latest/index.html)

## Tracking Server

The core component of MLflow is the tracking server which logs and stores the parameters, metrics, and artifacts of machine learning experiments. In our demo, we will run a local tracking server. In a production environment, you would access a managed tracking server.

<img src='./images/tracking-setup-local-server.png' width='400px' align='center' style="padding: 15px">

Source: [MLflow](https://mlflow.org/docs/latest/index.html)

To start the tracking server, run the following command in your terminal.

```bash
mlflow server --host 0.0.0.0 --port 8080
```

Congratulations! You have started the MLflow tracking server. Now, let's start tracking some experiments. 📈

⚠️ Remember to keep the tracking server running in the background while you run the code in this notebook.


Now that you have started the tracking server, let's start tracking some experiments. 📈
We will use the MLflow API to log metrics, parameters, and artifacts.
First, let's train a logistic regression model on the iris dataset.

In [None]:
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


# Load the Iris dataset
X, y = datasets.load_breast_cancer(return_X_y=True)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


Now we are going to log our experiment using the MLflow API. 
First, we set the tracking URI to the local tracking server.

In [None]:
import mlflow

# we set the tracking server uri to the remote server
mlflow.set_tracking_uri(uri="http://0.0.0.0:8080")

MLflow uses the notion of experiments and runs to organize the machine learning lifecycle. An experiment is a collection of runs. A run is a single execution of a machine learning workflow. We will create an experiment called "MLflow Tracking Demo" and log the parameters, metrics, and artifacts of the run.

<img src='./images/tracking-basics.png' width='600px' align='center' style="padding: 15px">

Source: [MLflow](https://mlflow.org/docs/latest/index.html)

In [None]:
import mlflow
from mlflow.models import infer_signature


# Create a new MLflow Experiment
mlflow.set_experiment("MLflow Tracking Demo")

# Start an MLflow run
with mlflow.start_run() as run:

    # Define the model hyperparameters
    params = {
        "solver": "lbfgs",
        "max_iter": 10000,
        "random_state": 42,
    }
    # Log the model hyperparameters
    mlflow.log_params(params)

    # Train the model
    lr = LogisticRegression(**params)
    lr.fit(X_train, y_train)

    # Infer the model signature
    signature = infer_signature(X_train, lr.predict(X_train))

    # Obtain predictions
    y_pred = lr.predict(X_test)

    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)

    # Log the loss metric
    mlflow.log_metric("accuracy", accuracy)

    # Set a tag that we can use to remind ourselves what this run was for
    mlflow.set_tag("Training Info", "Basic LR model for iris data")

    # Log the model
    model_info = mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path="iris_model",
        signature=signature,
        input_example=X_train,
        registered_model_name="baseline_clf",
    )

🧑‍🏫 Exercise

1. Modify the code above to track the following metrics:
    - Accuracy
    - Precision
    - Recall
    - F1 Score
  
:bulb: Look up the [API documentation](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_metrics) to log multiple metrics in a single call.

## Logging model Artifacts

In addition to logging metrics and parameters, we can also log artifacts - any files, such as images, plots, or models, that are relevant to the experiment. 

:bulb: You generally log artifacts after the run has been client using the MLflow client API.

In [None]:
from mlflow.client import MlflowClient

# Create an instance of the MLflow client
client = MlflowClient()

Let's now log a confusion matrix as an artifact.

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Computing the confusion matrix
cm = confusion_matrix(y_test, y_pred, labels=[0, 1])

# Creating a figure object and axes for the confusion matrix
fig, ax = plt.subplots(figsize=(8, 6))

# Plotting the confusion matrix using the created axes
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[0, 1])
disp.plot(cmap=plt.cm.Blues, ax=ax)

# Setting the title of the plot
ax.set_title('Confusion Matrix')

# Now 'fig' can be used with MLFlow's log_figure function
client.log_figure(run.info.run_id, figure=fig, artifact_file="confusion_matrix.png")

# Showing the plot here for demonstration
plt.show()

## 🧑‍🏫 Review your experiment in the UI

To review the model and its details, follow these step-by-step instructions:

+ **Step 1: Go to the Tracking server**

    - Open the Mlflow tab.


+ **Step 2: Locate Your Experiment:**

    - Find the experiment name you specified in your MLflow run.

+ **Step 3: Review Run Details:**

  - Click on the experiment name to view the runs within that experiment.
  - Locate the specific run you want to review.

+ **Step 4: Reviewing Artifacts and Metrics:**

  - Click on the run to see detailed information.
  - Navigate to the "Artifacts" tab to view logged artifacts.
  - Navigate to the "Metrics" tab to view logged metrics.

+ **Step 5: Viewing Confusion Matrix Image:**

  - If you logged the confusion matrix as an artifact, you can find it in the "Artifacts" tab.
  - You may find a file named "confusion_matrix.png" (or the specified artifact file name).
  - Download or view the confusion matrix image.

+ **Step 6: View models in the UI:**
  - You can find details about the logged model under the **Models** tab.
  - Look for the model name you specified in your MLflow run (e.g., "baseline_clf").

+ **Explore Additional Options:**

  - You can explore other tabs and options in the MLflow UI to gather more insights, such as "Parameters," "Tags," and "Source."

These instructions will guide you through reviewing and exploring the tracked models using the MLflow UI, providing valuable insights into the experiment results and registered models.

## Auto Logging

Auto logging is a powerful feature that allows you to log metrics, parameters, and models without the need for explicit log statements but just a single ```mlflow.autolog()``` call at the top of your ML code. 

This feature is available for popular ML libraries like scikit-learn, TensorFlow, and PyTorch. See the [documentation](https://mlflow.org/docs/latest/tracking/autolog.html#supported-libraries) for the list of supported libraries.

In [None]:
# Using the sklearn auto-logging feature

# Enable auto-logging
mlflow.sklearn.autolog()

# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)

# Evaluate the model
lr.score(X_test, y_test)

## Conclusion

In this notebook, you learned how to use MLflow to track experiments, log metrics, parameters, and artifacts, and log models. 
You also learned how to use the MLflow UI to visualize experiments and query experiments using the MLflow API. 

You are now prepared to contribute to more informed and reproducible machine learning workflow using MLflow. 🚀