# Model C – Lightweight Transformer Cross-Encoder

This notebook implements the Model C architecture described in the project brief: a lightweight transformer trained from scratch to act as a cross-encoder for the science NLI task. The solution adheres to the specification constraints (no pretrained checkpoints, transformer depth within compute limits) and is designed for transparency and reproducibility.

## Environment Setup
Run the cell below once in a fresh runtime to install the Python packages required by this notebook.

In [None]:
%pip install --quiet numpy torch --index-url https://download.pytorch.org/whl/cpu
# tqdm installs automatically with torch on most setups, but include explicitly for completeness.
%pip install --quiet tqdm

## Workflow Overview
- Load the provided NLI JSON splits and normalise them into record-based structures.
- Build a vocabulary from the training split only and construct learned token, segment, and position embeddings with explicit / markers.
- Instantiate a shallow multi-head self-attention encoder that processes the concatenated premise–hypothesis sequence.
- Optimise with cross-entropy loss on entailment vs neutral, monitor validation metrics, and log per-class accuracies.
- Provide utilities for test-set inference and exporting the training history for inclusion in the project submission log.

In [1]:
from __future__ import annotations

import copy
import json
import random
import re
from collections import Counter
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Tuple

import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset

try:
    from tqdm.auto import tqdm
except ImportError:  # pragma: no cover
    tqdm = lambda x, **kwargs: x

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def set_seed(seed: int) -> None:
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False


@dataclass
class ModelCConfig:
    data_dir: Path = Path("..")
    train_file: str = "train.json"
    val_file: str = "validation.json"
    test_file: str = "test.json"
    max_length: int = 256
    min_freq: int = 2
    embedding_dim: int = 256
    num_heads: int = 4
    num_layers: int = 3
    feedforward_dim: int = 512
    dropout: float = 0.1
    batch_size: int = 32
    num_epochs: int = 8
    lr: float = 2e-4
    weight_decay: float = 1e-4
    max_grad_norm: float = 1.0
    warmup_ratio: float = 0.1
    seed: int = 2025

    @property
    def device(self) -> str:
        return "cuda" if torch.cuda.is_available() else "cpu"


config = ModelCConfig()
set_seed(config.seed)
print(f"Using device: {config.device}")
print(f"Configuration: {asdict(config)}")

Using device: cpu
Configuration: {'data_dir': WindowsPath('..'), 'train_file': 'train.json', 'val_file': 'validation.json', 'test_file': 'test.json', 'max_length': 256, 'min_freq': 2, 'embedding_dim': 256, 'num_heads': 4, 'num_layers': 3, 'feedforward_dim': 512, 'dropout': 0.1, 'batch_size': 32, 'num_epochs': 8, 'lr': 0.0002, 'weight_decay': 0.0001, 'max_grad_norm': 1.0, 'warmup_ratio': 0.1, 'seed': 2025}


In [3]:
TOKEN_PATTERN = re.compile(r"[A-Za-z0-9]+(?:'[A-Za-z0-9]+)?|[^\w\s]")


def basic_tokenize(text: str) -> List[str]:
    text = text.strip().lower()
    return TOKEN_PATTERN.findall(text)


def load_split(file_path: Path) -> List[Dict[str, Optional[str]]]:
    with file_path.open() as stream:
        payload = json.load(stream)
    keys = sorted(payload["premise"].keys(), key=int)
    label_data = payload.get("label")
    records: List[Dict[str, Optional[str]]] = []
    for key in keys:
        label: Optional[str] = None
        if label_data is not None:
            if isinstance(label_data, dict):
                label = label_data.get(key)
            else:
                label = label_data[int(key)]
        records.append(
            {
                "id": key,
                "premise": payload["premise"][key],
                "hypothesis": payload["hypothesis"][key],
                "label": label,
            }
        )
    return records


def preprocess_records(records: List[Dict[str, Optional[str]]]) -> None:
    for item in records:
        item["premise_tokens"] = basic_tokenize(item["premise"])
        item["hypothesis_tokens"] = basic_tokenize(item["hypothesis"])

In [4]:
data_root = config.data_dir
train_records = load_split(data_root / config.train_file)
val_records = load_split(data_root / config.val_file)
test_records = load_split(data_root / config.test_file)

for split_name, split_records in [("train", train_records), ("validation", val_records), ("test", test_records)]:
    preprocess_records(split_records)
    print(f"{split_name.title()} split: {len(split_records)} samples")

label_counts = Counter(rec["label"] for rec in train_records if rec["label"] is not None)
print("Label distribution (train):", label_counts)


def describe_lengths(records: List[Dict[str, Optional[str]]], split_name: str) -> None:
    lengths = [len(rec["premise_tokens"]) + len(rec["hypothesis_tokens"]) for rec in records]
    avg_len = sum(lengths) / len(lengths)
    print(
        f"{split_name.title()} token count (premise+hypothesis) – avg: {avg_len:.1f}, "
        f"max: {max(lengths)}, 95th percentile: {np.percentile(lengths, 95):.0f}"
    )


describe_lengths(train_records, "train")
describe_lengths(val_records, "validation")
describe_lengths(test_records, "test")

Train split: 23088 samples
Validation split: 1304 samples
Test split: 2126 samples
Label distribution (train): Counter({'neutral': 14618, 'entails': 8470})
Train token count (premise+hypothesis) – avg: 34.3, max: 14657, 95th percentile: 54
Validation token count (premise+hypothesis) – avg: 33.7, max: 81, 95th percentile: 56
Test token count (premise+hypothesis) – avg: 33.4, max: 74, 95th percentile: 54


In [5]:
class Vocabulary:
    def __init__(self, min_freq: int = 1):
        self.min_freq = min_freq
        self.pad_token = "<pad>"
        self.unk_token = "<unk>"
        self.cls_token = "<cls>"
        self.sep_token = "<sep>"
        self.special_tokens = [self.pad_token, self.unk_token, self.cls_token, self.sep_token]
        self.token_to_idx: Dict[str, int] = {}
        self.idx_to_token: List[str] = []

    def build(self, records: Iterable[Dict[str, Optional[str]]]) -> None:
        counter = Counter()
        for record in records:
            counter.update(record["premise_tokens"])
            counter.update(record["hypothesis_tokens"])
        tokens = [token for token, freq in counter.items() if freq >= self.min_freq]
        ordered_tokens = sorted(tokens)
        self.idx_to_token = list(self.special_tokens) + ordered_tokens
        self.token_to_idx = {token: idx for idx, token in enumerate(self.idx_to_token)}

    def __len__(self) -> int:
        return len(self.idx_to_token)

    @property
    def pad_id(self) -> int:
        return self.token_to_idx[self.pad_token]

    @property
    def cls_id(self) -> int:
        return self.token_to_idx[self.cls_token]

    @property
    def sep_id(self) -> int:
        return self.token_to_idx[self.sep_token]

    @property
    def unk_id(self) -> int:
        return self.token_to_idx[self.unk_token]

    def encode(self, tokens: List[str]) -> List[int]:
        return [self.token_to_idx.get(token, self.unk_id) for token in tokens]


vocab = Vocabulary(min_freq=config.min_freq)
vocab.build(train_records)
print(f"Vocabulary size: {len(vocab)}")

Vocabulary size: 11449


In [6]:
LABEL_TO_ID = {"entails": 0, "neutral": 1}
ID_TO_LABEL = {idx: label for label, idx in LABEL_TO_ID.items()}


class NLIDataset(Dataset):
    def __init__(self, records: List[Dict[str, Optional[str]]], vocab: Vocabulary, config: ModelCConfig):
        self.records = records
        self.vocab = vocab
        self.config = config
        self.has_labels = all(record["label"] is not None for record in records)

    def __len__(self) -> int:
        return len(self.records)

    def _build_input(self, record: Dict[str, Optional[str]]) -> Tuple[List[int], List[int], List[bool]]:
        tokens = [self.vocab.cls_token]
        segments = [0]
        tokens.extend(record["premise_tokens"])
        segments.extend([1] * len(record["premise_tokens"]))
        tokens.append(self.vocab.sep_token)
        segments.append(0)
        tokens.extend(record["hypothesis_tokens"])
        segments.extend([2] * len(record["hypothesis_tokens"]))

        input_ids = self.vocab.encode(tokens)
        attention_mask = [True] * len(input_ids)
        max_len = self.config.max_length
        if len(input_ids) > max_len:
            input_ids = input_ids[:max_len]
            segments = segments[:max_len]
            attention_mask = attention_mask[:max_len]
        else:
            pad_needed = max_len - len(input_ids)
            if pad_needed > 0:
                input_ids += [self.vocab.pad_id] * pad_needed
                segments += [0] * pad_needed
                attention_mask += [False] * pad_needed
        return input_ids, segments, attention_mask

    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        record = self.records[idx]
        input_ids, segment_ids, attention_mask = self._build_input(record)
        label_idx = LABEL_TO_ID[record["label"]] if self.has_labels else -1
        return {
            "input_ids": torch.tensor(input_ids, dtype=torch.long),
            "segment_ids": torch.tensor(segment_ids, dtype=torch.long),
            "attention_mask": torch.tensor(attention_mask, dtype=torch.bool),
            "labels": torch.tensor(label_idx, dtype=torch.long),
            "sample_id": record["id"],
        }


train_dataset = NLIDataset(train_records, vocab, config)
val_dataset = NLIDataset(val_records, vocab, config)
test_dataset = NLIDataset(test_records, vocab, config)

train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=config.batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=config.batch_size, shuffle=False)

print(
    f"Train batches: {len(train_loader)} | Validation batches: {len(val_loader)} | Test batches: {len(test_loader)}"
)

Train batches: 722 | Validation batches: 41 | Test batches: 67


### Transformer Cross-Encoder

In [7]:
class TransformerCrossEncoder(nn.Module):
    def __init__(self, vocab_size: int, config: ModelCConfig, pad_id: int):
        super().__init__()
        self.config = config
        self.token_embeddings = nn.Embedding(vocab_size, config.embedding_dim, padding_idx=pad_id)
        self.segment_embeddings = nn.Embedding(3, config.embedding_dim)
        self.position_embeddings = nn.Embedding(config.max_length, config.embedding_dim)
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=config.embedding_dim,
            nhead=config.num_heads,
            dim_feedforward=config.feedforward_dim,
            dropout=config.dropout,
            activation="gelu",
            batch_first=True,
        )
        self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=config.num_layers)
        self.layer_norm = nn.LayerNorm(config.embedding_dim)
        self.dropout = nn.Dropout(config.dropout)
        self.classifier = nn.Sequential(
            nn.Linear(config.embedding_dim, config.embedding_dim),
            nn.GELU(),
            nn.Dropout(config.dropout),
            nn.Linear(config.embedding_dim, len(LABEL_TO_ID)),
        )
        self._init_parameters()

    def _init_parameters(self) -> None:
        nn.init.normal_(self.token_embeddings.weight, mean=0.0, std=0.02)
        nn.init.normal_(self.segment_embeddings.weight, mean=0.0, std=0.02)
        nn.init.normal_(self.position_embeddings.weight, mean=0.0, std=0.02)
        for module in self.classifier:
            if isinstance(module, nn.Linear):
                nn.init.xavier_uniform_(module.weight)
                if module.bias is not None:
                    nn.init.zeros_(module.bias)

    def forward(
        self, input_ids: torch.Tensor, segment_ids: torch.Tensor, attention_mask: torch.Tensor
    ) -> torch.Tensor:
        batch_size, seq_len = input_ids.size()
        position_ids = torch.arange(seq_len, device=input_ids.device).unsqueeze(0).expand(batch_size, seq_len)

        embeddings = (
            self.token_embeddings(input_ids)
            + self.segment_embeddings(segment_ids)
            + self.position_embeddings(position_ids)
        )
        embeddings = self.layer_norm(embeddings)
        key_padding_mask = ~attention_mask.to(dtype=torch.bool)
        encoded = self.encoder(embeddings, src_key_padding_mask=key_padding_mask)
        cls_representation = encoded[:, 0, :]
        logits = self.classifier(self.dropout(cls_representation))
        return logits

### Optimisation and Evaluation

In [8]:
class WarmupLinearScheduler:
    def __init__(self, optimizer: torch.optim.Optimizer, warmup_steps: int, total_steps: int):
        self.optimizer = optimizer
        self.warmup_steps = max(1, warmup_steps)
        self.total_steps = max(1, total_steps)
        self.num_steps = 0
        self.base_lrs = [group["lr"] for group in optimizer.param_groups]

    def step(self) -> None:
        self.num_steps += 1
        for param_group, base_lr in zip(self.optimizer.param_groups, self.base_lrs):
            if self.num_steps <= self.warmup_steps:
                lr = base_lr * self.num_steps / self.warmup_steps
            else:
                decay_steps = self.total_steps - self.warmup_steps
                lr = base_lr * max(0.0, (self.total_steps - self.num_steps) / max(1, decay_steps))
            param_group["lr"] = lr


def forward_batch(model: nn.Module, batch: Dict[str, torch.Tensor], device: str) -> Tuple[torch.Tensor, torch.Tensor]:
    input_ids = batch["input_ids"].to(device)
    segment_ids = batch["segment_ids"].to(device)
    attention_mask = batch["attention_mask"].to(device)
    labels = batch["labels"].to(device)
    logits = model(input_ids=input_ids, segment_ids=segment_ids, attention_mask=attention_mask)
    return logits, labels


def train_one_epoch(
    model: nn.Module,
    data_loader: DataLoader,
    optimizer: torch.optim.Optimizer,
    scheduler: Optional[WarmupLinearScheduler],
    criterion: nn.Module,
    device: str,
    max_grad_norm: float,
) -> Tuple[float, float]:
    model.train()
    total_loss = 0.0
    total_correct = 0
    total_examples = 0
    progress = tqdm(data_loader, desc="Train", leave=False)
    for batch in progress:
        optimizer.zero_grad()
        logits, labels = forward_batch(model, batch, device)
        if (labels < 0).all():
            continue
        valid_mask = labels >= 0
        logits = logits[valid_mask]
        labels = labels[valid_mask]
        loss = criterion(logits, labels)
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)
        optimizer.step()
        if scheduler is not None:
            scheduler.step()
        batch_size = labels.size(0)
        total_loss += loss.item() * batch_size
        preds = logits.argmax(dim=-1)
        total_correct += (preds == labels).sum().item()
        total_examples += batch_size
        if hasattr(progress, "set_postfix"):
            progress.set_postfix({"loss": total_loss / total_examples, "acc": total_correct / total_examples})
    return total_loss / total_examples, total_correct / total_examples


def evaluate_model(
    model: nn.Module, data_loader: DataLoader, criterion: nn.Module, device: str
) -> Dict[str, object]:
    model.eval()
    total_loss = 0.0
    total_correct = 0
    total_examples = 0
    label_totals: Counter = Counter()
    label_correct: Counter = Counter()
    all_preds: List[int] = []
    all_labels: List[int] = []
    with torch.no_grad():
        for batch in data_loader:
            logits, labels = forward_batch(model, batch, device)
            valid_mask = labels >= 0
            logits = logits[valid_mask]
            labels = labels[valid_mask]
            if labels.numel() == 0:
                continue
            batch_size = labels.size(0)
            loss = criterion(logits, labels)
            total_loss += loss.item() * batch_size
            preds = logits.argmax(dim=-1)
            total_correct += (preds == labels).sum().item()
            total_examples += batch_size
            label_list = labels.cpu().tolist()
            pred_list = preds.cpu().tolist()
            label_totals.update(label_list)
            for label, pred in zip(label_list, pred_list):
                if label == pred:
                    label_correct.update([label])
            all_labels.extend(label_list)
            all_preds.extend(pred_list)
    avg_loss = total_loss / total_examples
    accuracy = total_correct / total_examples
    per_label_accuracy = {
        ID_TO_LABEL[label_id]: label_correct[label_id] / label_totals[label_id]
        for label_id in label_totals
    }
    return {
        "loss": avg_loss,
        "accuracy": accuracy,
        "per_label_accuracy": per_label_accuracy,
        "predictions": all_preds,
        "targets": all_labels,
    }


def train_model(
    model: nn.Module, train_loader: DataLoader, val_loader: DataLoader, config: ModelCConfig
) -> Tuple[List[Dict[str, object]], float]:
    device = config.device
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(model.parameters(), lr=config.lr, weight_decay=config.weight_decay)
    total_steps = len(train_loader) * config.num_epochs
    warmup_steps = int(total_steps * config.warmup_ratio)
    scheduler = WarmupLinearScheduler(optimizer, warmup_steps, total_steps) if total_steps > 0 else None

    history: List[Dict[str, object]] = []
    best_state = copy.deepcopy(model.state_dict())
    best_val_acc = 0.0

    for epoch in range(1, config.num_epochs + 1):
        train_loss, train_acc = train_one_epoch(
            model, train_loader, optimizer, scheduler, criterion, device, config.max_grad_norm
        )
        val_metrics = evaluate_model(model, val_loader, criterion, device)
        history.append(
            {
                "epoch": epoch,
                "train_loss": train_loss,
                "train_acc": train_acc,
                "val_loss": val_metrics["loss"],
                "val_acc": val_metrics["accuracy"],
                "val_per_label_acc": val_metrics["per_label_accuracy"],
            }
        )
        if val_metrics["accuracy"] > best_val_acc:
            best_val_acc = val_metrics["accuracy"]
            best_state = copy.deepcopy(model.state_dict())
        print(
            f"Epoch {epoch:02d} | train_loss={train_loss:.4f} acc={train_acc:.4f} "
            f"| val_loss={val_metrics['loss']:.4f} acc={val_metrics['accuracy']:.4f}"
        )
        print(f"  Validation per-label accuracy: {val_metrics['per_label_accuracy']}")

    model.load_state_dict(best_state)
    return history, best_val_acc


def predict_labels(model: nn.Module, data_loader: DataLoader, device: str) -> Tuple[List[str], List[int]]:
    model.eval()
    sample_ids: List[str] = []
    predictions: List[int] = []
    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch["input_ids"].to(device)
            segment_ids = batch["segment_ids"].to(device)
            attention_mask = batch["attention_mask"].to(device)
            logits = model(input_ids=input_ids, segment_ids=segment_ids, attention_mask=attention_mask)
            preds = logits.argmax(dim=-1).cpu().tolist()
            predictions.extend(preds)
            sample_ids.extend(batch["sample_id"])
    return sample_ids, predictions

### Training Run

In [9]:
model = TransformerCrossEncoder(vocab_size=len(vocab), config=config, pad_id=vocab.pad_id)
training_history, best_val_acc = train_model(model, train_loader, val_loader, config)
print(f"Best validation accuracy: {best_val_acc:.4f}")

  output = torch._nested_tensor_from_mask(


Epoch 01 | train_loss=0.5601 acc=0.7096 | val_loss=0.5715 acc=0.6948
  Validation per-label accuracy: {'neutral': 0.7527047913446677, 'entails': 0.6377473363774734}


                                                                               

Epoch 02 | train_loss=0.3593 acc=0.8488 | val_loss=0.7814 acc=0.6748
  Validation per-label accuracy: {'neutral': 0.8918083462132921, 'entails': 0.4611872146118721}


                                                                               

Epoch 03 | train_loss=0.2238 acc=0.9144 | val_loss=0.8391 acc=0.6917
  Validation per-label accuracy: {'neutral': 0.8562596599690881, 'entails': 0.5296803652968036}


                                                                               

Epoch 04 | train_loss=0.1265 acc=0.9560 | val_loss=1.3095 acc=0.6802
  Validation per-label accuracy: {'neutral': 0.8361669242658424, 'entails': 0.5266362252663622}


                                                                                

Epoch 05 | train_loss=0.0649 acc=0.9808 | val_loss=1.7408 acc=0.6687
  Validation per-label accuracy: {'neutral': 0.8670788253477589, 'entails': 0.4733637747336377}


                                                                                

Epoch 06 | train_loss=0.0309 acc=0.9913 | val_loss=1.8952 acc=0.7025
  Validation per-label accuracy: {'neutral': 0.7619783616692427, 'entails': 0.6438356164383562}


                                                                                 

Epoch 07 | train_loss=0.0119 acc=0.9971 | val_loss=2.5904 acc=0.6910
  Validation per-label accuracy: {'neutral': 0.7758887171561051, 'entails': 0.6073059360730594}


                                                                                 

Epoch 08 | train_loss=0.0051 acc=0.9987 | val_loss=2.9635 acc=0.6887
  Validation per-label accuracy: {'neutral': 0.7959814528593508, 'entails': 0.5829528158295282}
Best validation accuracy: 0.7025


In [10]:
criterion = nn.CrossEntropyLoss()
val_metrics = evaluate_model(model, val_loader, criterion, config.device)
print("Validation metrics:")
print(val_metrics)

if test_dataset.has_labels:
    test_metrics = evaluate_model(model, test_loader, criterion, config.device)
    print("Test metrics:")
    print(test_metrics)
else:
    print("Test split has no ground-truth labels; generating predictions only.")

Validation metrics:
{'loss': 1.8951918625392796, 'accuracy': 0.7024539877300614, 'per_label_accuracy': {'neutral': 0.7619783616692427, 'entails': 0.6438356164383562}, 'predictions': [1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1

In [11]:
history_path = Path("modelC_history.json")
history_path.write_text(json.dumps(training_history, indent=2))
print(f"Training history saved to {history_path.resolve()}"
)

Training history saved to G:\UWA Master Of Information Technology\Semester 3 2025\CITS4012 Natural Language Processing\Project\CITS4012-Group-Project\modelC_history.json


In [12]:
sample_ids, prediction_ids = predict_labels(model, test_loader, config.device)
prediction_labels = [ID_TO_LABEL[idx] for idx in prediction_ids]
output_path = Path("modelC_test_predictions.json")
output_payload = [{"id": sid, "prediction": label} for sid, label in zip(sample_ids, prediction_labels)]
output_path.write_text(json.dumps(output_payload, indent=2))
print(f"Test predictions saved to {output_path.resolve()}")

Test predictions saved to G:\UWA Master Of Information Technology\Semester 3 2025\CITS4012 Natural Language Processing\Project\CITS4012-Group-Project\modelC_test_predictions.json
