# Training
This notebook does model training tracked by MLflow.
To understand the dataset, see notebook 1.
To understand model setup, see notebook 2 and the `models` module.

## Code version

In [1]:
!echo "==latest commit==" && git log -1 && echo "" && echo "==modified files==" && git status --short -uno

==latest commit==
[33mcommit ac1c1a57f609e9c652b2b8c2bc7686d35843ed76[m[33m ([m[1;36mHEAD[m[33m -> [m[1;32mmain[m[33m)[m
Author: Edvard Lindel√∂f <edvardlindelof@gmail.com>
Date:   Sun Dec 28 11:49:33 2025 +0100

    add swedish_sentiment_{train,val,test}.jsonl

==modified files==


## Import dependencies and data

In [2]:
from os import environ
import json
import mlflow
import logging

from models import swedish_classifier

# suppress warning about using the cpu version
logging.getLogger("mlflow.utils.requirements_utils").setLevel(logging.ERROR)

MLFLOW_TRACKING_URL = environ["MLFLOW_TRACKING_URI"]

def _load_docs(split):
    with open(f"../data/swedish_sentiment_{split}.jsonl") as f:
        docs = [json.loads(l) for l in f.read().split("\n") if l]
    return docs

train_docs, val_docs, test_docs = (_load_docs(s) for s in ["train", "val", "test"])
len(train_docs), len(val_docs), len(test_docs)

(50, 50, 50)

## Train BERT

In [3]:
tokenizer = swedish_classifier.tokenizer()
model = swedish_classifier.model()

mlflow.set_tracking_uri(MLFLOW_TRACKING_URL)
mlflow.set_experiment("swedish-sentiment-classification")
with mlflow.start_run():
    trainer, train_output, test_metrics = swedish_classifier.train(
        tokenizer,
        model,
        train_docs,
        val_docs,
        test_docs,
        learn_rate=1e-5,
        report_to=["mlflow"],
    )
    mlflow.log_metric("n_training_samples", len(train_docs))
    mlflow.log_metrics(test_metrics)
    model_info = mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        name="bert-classifier",
        task="text-classification",
    )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at KB/bert-base-swedish-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2025/12/28 12:06:00 INFO mlflow.tracking.fluent: Experiment with name 'swedish-sentiment-classification' does not exist. Creating a new experiment.


Epoch,Training Loss,Validation Loss,Validation Accuracy,Validation Roc Auc
1,No log,0.655533,0.68,0.678571
2,No log,0.597131,0.76,0.86039
3,No log,0.532033,0.82,0.913961
4,No log,0.460159,0.82,0.926948
5,No log,0.380936,0.86,0.949675
6,No log,0.309104,0.86,0.959416
7,No log,0.279529,0.88,0.964286
8,No log,0.29443,0.9,0.964286
9,No log,0.331307,0.88,0.964286


{'test_accuracy': 0.88, 'test_roc_auc': 0.9452495974235104}


Device set to use cpu


üèÉ View run aged-eel-672 at: http://localhost:5000/#/experiments/1/runs/1c82a8d6b50640dda624dcac9501053c
üß™ View experiment at: http://localhost:5000/#/experiments/1


### Register model
Let's register this first model directly.
This can also be done manually [here](http://localhost:5000/#/experiments/1/models) (or the corresponding page for a later model), by navigating to "bert-classifier" then "Register model".

In [4]:
mlflow.register_model(model_info.model_uri, "served-model")

Successfully registered model 'served-model'.
2025/12/28 12:07:41 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: served-model, version 1
Created version '1' of model 'served-model'.


<ModelVersion: aliases=[], creation_timestamp=1766920061584, current_stage='None', deployment_job_state=<ModelVersionDeploymentJobState: current_task_name='', job_id='', job_state='DEPLOYMENT_JOB_CONNECTION_STATE_UNSPECIFIED', run_id='', run_state='DEPLOYMENT_JOB_RUN_STATE_UNSPECIFIED'>, description='', last_updated_timestamp=1766920061584, metrics=None, model_id=None, name='served-model', params=None, run_id='1c82a8d6b50640dda624dcac9501053c', run_link='', source='models:/m-676e1df4ae944bfa9323b4a46d432caf', status='READY', status_message=None, tags={}, user_id='', version='1'>

## Train LoRA-BERT

In [5]:
tokenizer = swedish_classifier.tokenizer()
model = swedish_classifier.lora_model(lora_r=4)

mlflow.set_tracking_uri(MLFLOW_TRACKING_URL)
mlflow.set_experiment("swedish-sentiment-classification")
with mlflow.start_run():
    trainer, train_output, test_metrics = swedish_classifier.train(
        tokenizer,
        model,
        train_docs,
        val_docs,
        test_docs,
        learn_rate=1e-4,
        report_to=["mlflow"],
    )
    mlflow.log_param("lora_r", model.active_peft_config.r)
    mlflow.log_metric("n_training_samples", len(train_docs))
    mlflow.log_metrics(test_metrics)
    # should in theory only have to save adapter weights, but we save whole model for compatibility with mlflow
    model = trainer.model.merge_and_unload()
    model_info = mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        name="bert-classifier",
        task="text-classification",
    )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at KB/bert-base-swedish-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Validation Accuracy,Validation Roc Auc
1,No log,0.711266,0.44,0.702922
2,No log,0.681306,0.58,0.722403
3,No log,0.669275,0.58,0.725649
4,No log,0.659742,0.64,0.73539
5,No log,0.653762,0.66,0.74513
6,No log,0.647609,0.64,0.756494
7,No log,0.640742,0.68,0.769481
8,No log,0.633715,0.68,0.780844
9,No log,0.629443,0.68,0.788961
10,No log,0.626926,0.68,0.810065


{'test_accuracy': 0.88, 'test_roc_auc': 0.9742351046698874}


Device set to use cpu


üèÉ View run fun-goose-479 at: http://localhost:5000/#/experiments/1/runs/fd58edeb6df84e419c827afc24e6a260
üß™ View experiment at: http://localhost:5000/#/experiments/1
