# Autologging
### Key Features of MLflow Autologging:
Automatic Logging: When enabled, MLflow automatically logs various details, such as:

- Hyperparameters (parameters used during model training).
- Metrics (like accuracy, loss, etc.) during training and validation.
- Model artifacts (the trained model itself).


Support for Multiple Libraries: Autologging is available for several popular machine learning libraries, including:

1. Scikit-learn
2. TensorFlow
3. Keras
4. PyTorch
5. XGBoost
6. LightGBM

and more.
- Minimal Code Changes: With autologging, you typically need to add just a single line of code to enable tracking, which greatly reduces the overhead associated with manual logging.

# Example 1 sklearn

In [1]:
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris


mlflow.set_experiment("Autolog Iris_Classification_Experiment")

# Enable autologging
mlflow.sklearn.autolog()

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a model
with mlflow.start_run():
    model = LogisticRegression(max_iter=200)
    model.fit(X_train, y_train)
    
    # No need for manual logging; it's handled by autologging


  from pandas.core.computation.check import NUMEXPR_INSTALLED
  from pandas.core import (


In [2]:
import mlflow
logged_model = 'runs:/48c5723dc52c4158a3ab0793e7c7193d/model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(X_test)

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0])

# Example 2 keras

In [3]:
import mlflow
import mlflow.keras
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist


mlflow.set_experiment("Autolog_Mnist_Experiment")

# Enable autologging for Keras
mlflow.keras.autolog()

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize pixel values

# Define a simple neural network model
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model within an MLflow run
with mlflow.start_run():
    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# After training, you can access the logged data in the MLflow UI


TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

# Example 3 Tenserflow

In [4]:
import mlflow
import mlflow.tensorflow
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import fashion_mnist

# Enable autologging for TensorFlow
mlflow.tensorflow.autolog()

# Load the Fashion MNIST dataset
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize pixel values

# Define a simple neural network model
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model within an MLflow run
with mlflow.start_run():
    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# After training, you can access the logged data in the MLflow UI


TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates