# MLflow logging API example (Python) (MLflow 3.0)

This notebook illustrates how to use the MLflow logging API to start an MLflow run and log the model, model parameters, evaluation metrics, and other artifacts to the logged model and run. The easiest way to get started using MLflow tracking with Python is to use the MLflow [`autolog()` API](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging). If you need more control over the metrics logged for each training run, or want to log additional artifacts such as tables or plots, you can use the `mlflow.log_metric()`, and `mlflow.log_artifact()` APIs demonstrated in this notebook. 

This tutorial leverages features from MLflow 3.0. For more details, see "Get started with MLflow 3.0" ([AWS](https://docs.databricks.com/aws/en/mlflow/mlflow-3-install)|[Azure](https://learn.microsoft.com/en-us/azure/databricks/mlflow/mlflow-3-install)|[GCP](https://docs.databricks.com/gcp/en/mlflow/mlflow-3-install))

This notebook creates a Random Forest model on a simple dataset and uses the MLflow Tracking API to log the model and selected model parameters and metrics.

In [0]:
# Upgrade to the latest MLflow version to use MLflow 3.0 features
%pip install mlflow>=3.0 --upgrade
dbutils.library.restartPython()

Import the required libraries.

In [0]:
import mlflow
import mlflow.sklearn
import pandas as pd
import matplotlib.pyplot as plt

from numpy import savetxt

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

Import the dataset from scikit-learn and create the training and test datasets. 

In [0]:
db = load_diabetes()
X = db.data
y = db.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

The diabetes dataset from scikit-learn consists of 442 samples with 10 features, used to predict disease progression. The features are:

- **age**: Age of the patient (standardized)
- **sex**: Sex of the patient (standardized)
- **bmi**: Body mass index (standardized)
- **bp**: Average blood pressure (standardized)
- **s1**: Serum 1 (TC, total cholesterol, standardized)
- **s2**: Serum 2 (LDL, low-density lipoproteins, standardized)
- **s3**: Serum 3 (HDL, high-density lipoproteins, standardized)
- **s4**: Serum 4 (TCH, total cholesterol/HDL ratio, standardized)
- **s5**: Serum 5 (LTG, possibly log of serum triglycerides, standardized)
- **s6**: Serum 6 (GLU, blood sugar level, standardized)

The target variable is a quantitative measure of disease progression one year after baseline.

- **Low progression**: Indicates less severe worsening of diabetes symptoms or slower disease advancement.
- **High progression**: Indicates more severe worsening of diabetes symptoms or faster disease advancement.

This notebook demonstrates a **regression** task, where the goal is to predict a continuous value (disease progression) from input features. Regression learns the relationship between features and the target from training data and can make predictions for new, unseen data. This is different from **interpolation**, which estimates unknown values strictly within the range of known data points without learning a model from data.

Regression models, such as RandomForestRegressor, can also **extrapolate**, meaning they can make predictions for input feature values that fall outside the range of the training data. However, the accuracy of extrapolated predictions is often lower because the model has not seen such data during training. In contrast, interpolation only estimates values within the range of the observed data and does not generalize beyond it.

In [0]:
pd.DataFrame(data=db.data, columns=db.feature_names).assign(target=db.target)

Create a random forest model and log the model, model parameters, evaluation metrics, and other artifacts using `mlflow.log_param()`, `mlflow.log_metric()`, `mlflow.log_model()`, and `mlflow.log_artifact()`. These functions let you control exactly which parameters and metrics are logged, and also let you log other artifacts of the run such as tables and plots.

In [0]:
with mlflow.start_run():
  
  # Set the model parameters. 
  n_estimators = 100
  max_depth = 6
  max_features = 3
  params = {
    "n_estimators": n_estimators,
    "max_depth": max_depth,
    "max_features": max_features
  }

  # Log the model parameters used for this run.
  mlflow.log_params(params)
  
  # Create and train model.
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)

  # Log the model create by this run, creating a Logged Model which inherits the parameters
  logged_model = mlflow.sklearn.log_model(rf, name="random-forest-model", input_example=X_train)
  
  # Use the model to make predictions on the test dataset.
  predictions = rf.predict(X_test)
  
  # Define a metric to use to evaluate the model.
  mse = mean_squared_error(y_test, predictions)
    
  # Log the value of the metric from this run, linking to the logged model
  mlflow.log_metric("mse", mse)
  
  # Save the table of predicted values
  savetxt('predictions.csv', predictions, delimiter=',')
  
  # Log the saved table as an artifact
  mlflow.log_artifact("predictions.csv")
  
  # Convert the residuals to a pandas dataframe to take advantage of graphics capabilities
  df = pd.DataFrame(data = predictions - y_test)
  
  # Create a plot of residuals
  plt.plot(df)
  plt.xlabel("Observation")
  plt.ylabel("Residual")
  plt.title("Residuals")

  # Save the plot and log it as an artifact
  plt.savefig("residuals_plot.png")
  mlflow.log_artifact("residuals_plot.png") 

To view the results, click the **Experiments** icon <img src="https://docs.databricks.com/_static/images/icons/experiment.png"/> in the right sidebar. This sidebar displays the parameters and metrics for each run of this notebook.

Click the name of the run to open the Runs page in a new tab. This page shows all of the information that was logged from the run. Select the **Artifacts** tab to find the logged model and plot.

From the experiments page, switch to the **Models** tab to view the logged model that was created, along with all relevant metadata such as parameters and metrics.

For more information, see "MLflow experiments" ([AWS](https://docs.databricks.com/applications/mlflow/experiments.html)|[Azure](https://docs.microsoft.com/azure/databricks/applications/mlflow/experiments)|[GCP](https://docs.gcp.databricks.com/applications/mlflow/experiments.html)).