# Training
This notebook contains the same model training routines as notebook 3, but was run with more training data available.

## Code version

In [1]:
!echo "==latest commit==" && git log -1 && echo "" && echo "==modified files==" && git status --short -uno

==latest commit==
[33mcommit 40e8d89e10917cf83733f75b32e2694d12f604b5[m[33m ([m[1;36mHEAD[m[33m -> [m[1;32mmain[m[33m, [m[1;31morigin/main[m[33m)[m
Author: Edvard Lindel√∂f <edvardlindelof@gmail.com>
Date:   Sun Dec 28 12:16:19 2025 +0100

    add more training data

==modified files==


## Import dependencies and data

In [2]:
from os import environ
import json
import mlflow
import logging

from models import swedish_classifier

# suppress warning about using the cpu version
logging.getLogger("mlflow.utils.requirements_utils").setLevel(logging.ERROR)

MLFLOW_TRACKING_URL = environ["MLFLOW_TRACKING_URI"]

def _load_docs(split):
    with open(f"../data/swedish_sentiment_{split}.jsonl") as f:
        docs = [json.loads(l) for l in f.read().split("\n") if l]
    return docs

train_docs, val_docs, test_docs = (_load_docs(s) for s in ["train", "val", "test"])
len(train_docs), len(val_docs), len(test_docs)

(100, 50, 50)

## Train BERT

In [3]:
tokenizer = swedish_classifier.tokenizer()
model = swedish_classifier.model()

mlflow.set_tracking_uri(MLFLOW_TRACKING_URL)
mlflow.set_experiment("swedish-sentiment-classification")
with mlflow.start_run():
    trainer, train_output, test_metrics = swedish_classifier.train(
        tokenizer,
        model,
        train_docs,
        val_docs,
        test_docs,
        learn_rate=1e-5,
        report_to=["mlflow"],
    )
    mlflow.log_metric("n_training_samples", len(train_docs))
    mlflow.log_metrics(test_metrics)
    model_info = mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        name="bert-classifier",
        task="text-classification",
    )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at KB/bert-base-swedish-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Validation Accuracy,Validation Roc Auc
1,No log,0.611155,0.72,0.831169
2,No log,0.498594,0.82,0.86526
3,No log,0.386073,0.86,0.910714
4,No log,0.27854,0.88,0.946429
5,No log,0.282648,0.9,0.957792
6,No log,0.346059,0.9,0.959416


{'test_accuracy': 0.92, 'test_roc_auc': 0.9484702093397746}


Device set to use cpu


üèÉ View run invincible-hawk-489 at: http://localhost:5000/#/experiments/1/runs/779fa50602f3459e9d990bf80a886674
üß™ View experiment at: http://localhost:5000/#/experiments/1


## Train LoRA-BERT

In [4]:
tokenizer = swedish_classifier.tokenizer()
model = swedish_classifier.lora_model(lora_r=4)

mlflow.set_tracking_uri(MLFLOW_TRACKING_URL)
mlflow.set_experiment("swedish-sentiment-classification")
with mlflow.start_run():
    trainer, train_output, test_metrics = swedish_classifier.train(
        tokenizer,
        model,
        train_docs,
        val_docs,
        test_docs,
        learn_rate=1e-4,
        report_to=["mlflow"],
    )
    mlflow.log_param("lora_r", model.active_peft_config.r)
    mlflow.log_metric("n_training_samples", len(train_docs))
    mlflow.log_metrics(test_metrics)
    # should in theory only have to save adapter weights, but we save whole model for compatibility with mlflow
    model = trainer.model.merge_and_unload()
    model_info = mlflow.transformers.log_model(
        transformers_model={"model": model, "tokenizer": tokenizer},
        name="bert-classifier",
        task="text-classification",
    )

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at KB/bert-base-swedish-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Validation Accuracy,Validation Roc Auc
1,No log,0.701263,0.54,0.451299
2,No log,0.692866,0.52,0.519481
3,No log,0.678852,0.6,0.597403
4,No log,0.664992,0.64,0.652597
5,No log,0.651544,0.68,0.689935
6,No log,0.631086,0.74,0.743506
7,No log,0.615174,0.72,0.782468
8,No log,0.582516,0.76,0.855519
9,No log,0.549139,0.8,0.887987
10,No log,0.51565,0.84,0.902597


{'test_accuracy': 0.92, 'test_roc_auc': 0.9710144927536232}


Device set to use cpu


üèÉ View run aged-skunk-804 at: http://localhost:5000/#/experiments/1/runs/d00559859c81490d938d6cc2ff9d1458
üß™ View experiment at: http://localhost:5000/#/experiments/1
