# Last Phase: Testing our created models

Now that we have created quite a few models it's useful now to see how they can compare to each other. We used the evaluation dataset not to train the model, but to instead validate the development process. It's not a good idea to evaluate different models ones against each other with the same evaluation dataset, this because the validation set was not used directly for training, but has influenced the model, in a way "seeing" the data.

By using a separate test data that has not been used at all during training we could get a better estimate on how different models will behave in the real world, on data never seen before.

In [1]:
import sys
sys.path.insert(0, './src')

import torch
from torch.utils.data import DataLoader
import numpy as np
import os
import albumentations as A

from utils import get_evals, set_cuda_and_seed, load_checkpoint
from model import UNET
from dataset import SN6Dataset

import warnings
from rasterio.errors import NotGeoreferencedWarning
warnings.filterwarnings("ignore", category=NotGeoreferencedWarning) # Masks are not georeferences, so we can ignore this warning
warnings.filterwarnings("ignore", category=UserWarning) # This will throw a warning message about cudnn, this is normal (https://github.com/pytorch/pytorch/pull/125790)


MEAN = [0, 0, 0]
STD = [1.0, 1.0, 1.0]
NUM_WORKERS = 4
VAL_BATCH_SIZE = 2

DATASET_PATH = "data/train/AOI_11_Rotterdam/"
OUTPUT_PATH = "output/"
DATA_PATH = OUTPUT_PATH + "data/"
CHECKPOINT_PATH = OUTPUT_PATH + "checkpoints/"
GRAPH_PATH = OUTPUT_PATH + "graphs/"
TEST_PATH = OUTPUT_PATH + "test/"


INFO:albumentations.check_version:A new version of Albumentations is available: 1.4.8 (you have 1.4.7). Upgrade using: pip install --upgrade albumentations


In [2]:
device = set_cuda_and_seed()

test_transforms = A.Compose([
    A.Normalize(mean=MEAN, std=STD, max_pixel_value=1.0),
    A.Resize(320, 320)
])

test_dataset = SN6Dataset(DATASET_PATH, "test", transform=test_transforms)



Using PyTorch version: 2.3.0+cu121  Device: cuda


We'll create a function that iterates on the models inside  checkpoints. Once we loaded each one we'll be calculating the evaluations on that

In [3]:
model = UNET(in_channels=3, out_channels=1).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = torch.nn.BCEWithLogitsLoss()

print(os.listdir(CHECKPOINT_PATH))
for checkpoint in os.listdir(CHECKPOINT_PATH):
    if checkpoint.startswith("best") and checkpoint.endswith(".pth"):
        print("found best checkpoint", checkpoint)
        test_loader = DataLoader(test_dataset, batch_size=VAL_BATCH_SIZE, pin_memory=True, shuffle=False, num_workers=NUM_WORKERS)
        load_checkpoint(CHECKPOINT_PATH + checkpoint, model, optimizer, criterion)
        checkpoint_name = checkpoint.split(".")[0]
        folder_path = os.path.join(TEST_PATH, checkpoint_name)
        os.makedirs(folder_path, exist_ok=True)
        test_loss, precision, recall, f1, accuracy = get_evals(test_loader, model, criterion,  device, True, folder_path)
        print(f"Test_loss{test_loss}, Precision: {precision}, Recall: {recall}, F1: {f1}, Accuracy: {accuracy} ")

['checkpoint.pth', 'checkpoint_2.pth', 'best_6 - LR1e-4 TrainBatch8 Epoch80 Resize - NoAugmentation.pth', 'best_5 - LR1e-4 TrainBatch8 Epoch100 Resize.pth', 'checkpoint_5 - LR1e-4 TrainBatch8 Epoch100 Resize.pth', 'checkpoint_5 - LR1e-4 TrainBatch8 Epoch24 Resize.pth', 'best_69 - LR1e-3 TrainBatch8 Epoch24 noScheduler.pth', 'checkpoint_2 - LR1e-3 TrainBatch8 Epoch24 noScheduler.pth', 'best_4 - LR1e-4 TrainBatch8 Epoch24.pth', 'best.pth', 'best_2 - LR1e-3 TrainBatch8 Epoch24 noScheduler.pth', 'best_5 - LR1e-4 TrainBatch8 Epoch24 Resize.pth', 'best_3 - LR1e-3 TrainBatch8 Epoch24.pth', 'checkpoint_4 - LR1e-4 TrainBatch8 Epoch24.pth', 'checkpoint_6 - LR1e-4 TrainBatch8 Epoch80 Resize - NoAugmentation.pth', 'best_2.pth', 'checkpoint_3 - LR1e-3 TrainBatch8 Epoch24.pth', 'checkpoint_69 - LR1e-3 TrainBatch8 Epoch24 noScheduler.pth']
found best checkpoint best_6 - LR1e-4 TrainBatch8 Epoch80 Resize - NoAugmentation.pth
loading checkpoint
Saving predictions to: output/test/best_6 - LR1e-4 TrainBa