
## MLflow Integration for Model Training and Tracking

In this notebook, we're integrating MLflow into a machine learning workflow to track and manage experiments effectively. We're focusing on a text classification task using the DistilBert model, emphasizing the importance of experiment tracking, model management, and operational efficiency - core themes of our course.


### Objective:

- Dynamically set up and log parameters to MLflow
- Understand the purpose and application of each step in the context of MLflow and MLOps principles


### Environment Setup

Ensure all necessary libraries are installed and imported for our workflow.


In [None]:
!pip install mlflow==2.7.1
!pip install datasets===2.14.6
# !pip install transformers[torch]==4.41.0
!pip install transformers==4.43.3
!pip install scikit-learn==1.3.0
!pip install tqdm==4.65.0

Collecting mlflow==2.7.1
  Downloading mlflow-2.7.1-py3-none-any.whl.metadata (12 kB)
Collecting cloudpickle<3 (from mlflow==2.7.1)
  Downloading cloudpickle-2.2.1-py3-none-any.whl.metadata (6.9 kB)
Collecting databricks-cli<1,>=0.8.7 (from mlflow==2.7.1)
  Downloading databricks_cli-0.18.0-py2.py3-none-any.whl.metadata (4.0 kB)
Collecting protobuf<5,>=3.12.0 (from mlflow==2.7.1)
  Downloading protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting pytz<2024 (from mlflow==2.7.1)
  Downloading pytz-2023.4-py2.py3-none-any.whl.metadata (22 kB)
Collecting packaging<24 (from mlflow==2.7.1)
  Downloading packaging-23.2-py3-none-any.whl.metadata (3.2 kB)
Collecting importlib-metadata!=4.7.0,<7,>=3.7.0 (from mlflow==2.7.1)
  Downloading importlib_metadata-6.11.0-py3-none-any.whl.metadata (4.9 kB)
Collecting alembic!=1.10.0,<2 (from mlflow==2.7.1)
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting docker<7,>=4.0.0 (from mlflow==2.7.1)
  Downlo

Collecting datasets===2.14.6
  Downloading datasets-2.14.6-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2023.10.0,>=2023.1.0 (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets===2.14.6)
  Downloading fsspec-2023.10.0-py3-none-any.whl.metadata (6.8 kB)
Downloading datasets-2.14.6-py3-none-any.whl (493 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2023.10.0-py3-none-any.whl (166 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m166.4/166.4 kB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, datasets
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
[31m

In [None]:
!pip install boto3 awscli



In [None]:
!aws configure


### Imports

Import the necessary libraries, focusing on MLflow for tracking, PyTorch for model training, and Transformers for our NLP model.

In [None]:
!pip install transformers==4.43.3

Collecting transformers==4.43.3
  Using cached transformers-4.43.3-py3-none-any.whl.metadata (43 kB)
Collecting tokenizers<0.20,>=0.19 (from transformers==4.43.3)
  Using cached tokenizers-0.19.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Using cached transformers-4.43.3-py3-none-any.whl (9.4 MB)
Using cached tokenizers-0.19.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.21.1
    Uninstalling tokenizers-0.21.1:
      Successfully uninstalled tokenizers-0.21.1
  Attempting uninstall: transformers
    Found existing installation: transformers 4.52.2
    Uninstalling transformers-4.52.2:
      Successfully uninstalled transformers-4.52.2
Successfully installed tokenizers-0.19.1 transformers-4.43.3


In [None]:
!pip show transformers

Name: transformers
Version: 4.43.3
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers


In [None]:
import os
import mlflow
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
from tqdm import tqdm
from torch.utils.data import DataLoader
from datasets import load_dataset
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer, AdamW


* 'schema_extra' has been renamed to 'json_schema_extra'


### Configuration Parameters as an Object

By defining parameters as a dictionary, we can easily iterate through them when logging to MLflow. This method streamlines the process and adheres to best practices in code maintainability and scalability.


In [None]:
params = {
    'model_name': 'distilbert-base-cased',
    'learning_rate': 5e-5,
    'batch_size': 16,
    'num_epochs': 1,
    'dataset_name': 'ag_news',
    'task_name': 'sequence_classification',
    'log_steps': 100,
    'max_seq_length': 128,
    'output_dir': 'models/distilbert-base-uncased-ag_news',
}


### MLflow Setup

Setting up MLflow is crucial for tracking our experiments, parameters, and results, allowing us to manage and compare different runs effectively - a practice that aligns with the MLOps goal of systematic and efficient model management.

In [None]:
mlflow.set_tracking_uri("http://ec2-174-129-55-125.compute-1.amazonaws.com:5000/")
mlflow.set_experiment(f"{params['task_name']}")

2025/05/29 10:39:27 INFO mlflow.tracking.fluent: Experiment with name 'sequence_classification' does not exist. Creating a new experiment.


<Experiment: artifact_location='s3://mlflow-bucket-25-v2/115485546785635486', creation_time=1748515167563, experiment_id='115485546785635486', last_update_time=1748515167563, lifecycle_stage='active', name='sequence_classification', tags={}>

### Load and Preprocess Dataset

We're using a well-known NLP dataset to ensure reproducibility and comparability. The preprocessing step is crucial for converting raw text into a format that our model can understand, highlighting the importance of data preparation in the ML pipeline.

In [None]:
# Load and preprocess dataset
dataset = load_dataset(params['dataset_name'])
tokenizer = DistilBertTokenizer.from_pretrained(params['model_name'])

def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True, max_length=params['max_seq_length'])


train_dataset = dataset["train"].shuffle().select(range(20_000)).map(tokenize, batched=True)
test_dataset = dataset["test"].shuffle().select(range(2_000)).map(tokenize, batched=True)

# Set format for PyTorch and create data loaders
train_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
test_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

train_loader = DataLoader(train_dataset, batch_size=params['batch_size'], shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=params['batch_size'], shuffle=False)

# get the labels
labels = dataset["train"].features['label'].names

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [None]:
labels

['World', 'Sports', 'Business', 'Sci/Tech']


### Model Initialization

Initializing the model is a foundational step, showcasing the practical application of a pre-trained NLP model for a specific task - reflecting the course's focus on real-world applicability of machine learning models.

In [None]:
model = DistilBertForSequenceClassification.from_pretrained(params['model_name'], num_labels=len(labels))
model.config.id2label = {i: label for i, label in enumerate(labels)}
params['id2label'] = model.config.id2label

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

### Optimizer Setup

Choosing the right optimizer and learning rate is vital for effective model training. It demonstrates the importance of hyperparameter tuning, a key concept in achieving optimal model performance.

In [None]:
optimizer = AdamW(model.parameters(), lr=params['learning_rate'])



### Evaluation Function

Evaluating the model on a separate test set helps us understand its performance on unseen data, highlighting the concept of generalization which is crucial for real-world applications.

In [None]:
def evaluate_model(model, dataloader, device):
    model.eval()  # Set model to evaluation mode
    predictions, true_labels = [], []

    with torch.no_grad():
        for batch in dataloader:
            inputs, masks, labels = batch['input_ids'].to(device), batch['attention_mask'].to(device), batch['label'].to(device)

            # Forward pass, calculate logit predictions
            outputs = model(inputs, attention_mask=masks)
            logits = outputs.logits
            _, predicted_labels = torch.max(logits, dim=1)

            predictions.extend(predicted_labels.cpu().numpy())
            true_labels.extend(labels.cpu().numpy())

    # Calculate Evaluation Metrics
    accuracy = accuracy_score(true_labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(true_labels, predictions, average='macro')

    return accuracy, precision, recall, f1


### Training Loop

The training loop is where the actual model training happens. Logging metrics and parameters at each step is crucial for tracking the model's progress, understanding its behavior, and making informed decisions - core aspects of the MLOps lifecycle.

In [None]:
# Start MLflow Run
with mlflow.start_run(run_name=f"{params['model_name']}-{params['dataset_name']}") as run:

    # Log all parameters at once
    mlflow.log_params(params)

    with tqdm(total=params['num_epochs'] * len(train_loader), desc=f"Epoch [1/{params['num_epochs']}] - (Loss: N/A) - Steps") as pbar:
        for epoch in range(params['num_epochs']):
            running_loss = 0.0
            for i, batch in enumerate(train_loader, 0):
                inputs, masks, labels = batch['input_ids'].to(device), batch['attention_mask'].to(device), batch['label'].to(device)

                optimizer.zero_grad()
                outputs = model(inputs, attention_mask=masks, labels=labels)
                loss = outputs.loss
                loss.backward()
                optimizer.step()

                running_loss += loss.item()
                if i and i % params['log_steps'] == 0:
                    avg_loss = running_loss / params['log_steps']

                    pbar.set_description(f"Epoch [{epoch + 1}/{params['num_epochs']}] - (Loss: {avg_loss:.3f}) - Steps")
                    mlflow.log_metric("loss", avg_loss, step=epoch * len(train_loader) + i)

                    running_loss = 0.0
                pbar.update(1)

            # Evaluate Model
            accuracy, precision, recall, f1 = evaluate_model(model, test_loader, device)
            print(f"Epoch {epoch + 1} Metrics: Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1: {f1:.4f}")

            # Log metrics to MLflow
            mlflow.log_metrics({'accuracy': accuracy, 'precision': precision, 'recall': recall, 'f1': f1}, step=epoch)


    #Log model to MLflow through built-in PyTorch method
    mlflow.pytorch.log_model(model, "model")

    # Log model to MLflow through custom method
    os.makedirs(params['output_dir'], exist_ok=True)
    model.save_pretrained(params['output_dir'])
    tokenizer.save_pretrained(params['output_dir'])

    mlflow.log_artifacts(params['output_dir'], artifact_path="model")

    model_uri = f"runs:/{run.info.run_id}/model"
    mlflow.register_model(model_uri, "agnews-transformer")

print('Finished Training')

Epoch [1/1] - (Loss: 0.260) - Steps: 100%|██████████| 1250/1250 [03:40<00:00,  5.68it/s]

Epoch 1 Metrics: Accuracy: 0.9000, Precision: 0.8998, Recall: 0.8986, F1: 0.8987



Successfully registered model 'agnews-transformer'.
2025/05/29 10:50:46 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: agnews-transformer, version 1


Finished Training


Created version '1' of model 'agnews-transformer'.
