# MLflow Tracking

[MLflow Tracking](https://mlflow.org/docs/latest/tracking/)

In [0]:
%pip install mlflow-skinny

In [0]:
%restart_python


## Train Model

This is to support our exploration of MLflow Tracking and gives us a model to work with ❤️

This notebook uses [scikit-learn](https://scikit-learn.org/) that is an open source machine learning library that supports supervised and unsupervised learning.

⚠️ Please note that scikit-learn _also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities_ that are also provided in Databricks (and MLflow).


I asked a LLM (Copilot):

> Give me step-by-step instructions for a python code for email classification (spam vs non-spam) using scikit-learn.
>
> Please generate a CSV file with 1000 labeled emails with 70% spam and the others non-spam


Borrowed from [Automatic Logging with MLflow Tracking](https://mlflow.org/docs/latest/tracking/autolog#step-2---insert-mlflowautolog-in-your-code) with ❤️

In [0]:
import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
# MLflow triggers logging automatically upon model fitting
rf.fit(X_train, y_train)

## Custom Logging

Support for custom logging includes:

* Model Hyperparameters
* Model (Performance) Metrics
* Model Artifacts

### Model Hyperparameters

* `mlflow.log_params`
* Model Parameters

### Model (Performance) Metrics

* `mlflow.log_metric`
* Custom model metrics that Data Scientists care about (that you, a dear Data Engineer, might not even hear before 🤷‍♂️)
* Not only built-in metrics, but any metric you want to track

### Model Artifacts

* `mlflow.sklearn.log_model` for trained models
* `mlflow.log_artifact` for additional model artifacts

## Auto(matic) Logging

[Automatic Logging with MLflow Tracking](https://mlflow.org/docs/latest/tracking/autolog)

* Allows you to log parameters, metrics, models, environment, and data lineage without explicit log statements.
* Call [mlflow.autolog](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.html#mlflow.autolog) before your training code.