# Analyze logs

This is a straightforward methodology. Based on the training logs we will select the best performing and stable model in the same time (no overfitting to training dataset and observable large deterioration on the validation set). To open the **TensorBoard** it is enough to run the following.

In [1]:
from footvid.utils.env import check_repository_path


REPOSITORY_PATH = check_repository_path().resolve()
MODELS_PATH = REPOSITORY_PATH.joinpath("models")
LOGS_PATH = REPOSITORY_PATH.joinpath("logs")

In [None]:
!tensorboard --logdir {str(LOGS_PATH)} --host localhost

# Model selection

From the TensorBoard logs we can clearly see that the best performing model is the **cnn-top2-layers-fine-tuning** from the 11th epoch. This model will be used for final predictions. The structure of the models directory should look like follows:

In [2]:
!tree -d ../models/

[01;34m../models/[00m
├── [01;34mcnn-top2-layers-fine-tuning[00m
├── [01;34mfcl-resnet-fine-tuning[00m
└── [01;34mfull-fine-tuning[00m

3 directories


In [3]:
import torch
from footvid.models import ResNet


DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")


checkpoint = torch.load(
    MODELS_PATH.joinpath(
        "cnn-top2-layers-fine-tuning", "checkpoint.23-09-2020.13_08_02.pth"
    ),
    map_location=DEVICE,
)

model = ResNet(output_size=1)
model.to(DEVICE)
model.load_state_dict(checkpoint["model"])
model.eval()

ResNet(
  (resnet50_conv): Sequential(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (downsample): Sequential(
 

Proper data structure should look like the tree below:

In [4]:
!tree -d ../data/processed/

[01;34m../data/processed/[00m
├── [01;34mtest[00m
│   └── [01;34mno-label[00m
├── [01;34mtrain[00m
│   ├── [01;34mneg[00m
│   └── [01;34mpos[00m
└── [01;34mvalid[00m
    ├── [01;34mneg[00m
    └── [01;34mpos[00m

8 directories


In [5]:
from torch.utils.data import DataLoader
from torchvision import datasets

from footvid.preprocessing import TEST_TRANSFORMS


test_images = datasets.ImageFolder(
    root=REPOSITORY_PATH.joinpath("data", "processed", "test"),
    transform=TEST_TRANSFORMS,
)

test_dataloader = DataLoader(
    dataset=test_images,  batch_size=64, shuffle=False, num_workers=2
)

In [None]:
from tqdm import tqdm


outputs = []
with torch.no_grad():
    for input_batch, _ in tqdm(test_dataloader):
        input_batch = input_batch.to(DEVICE)
        output_batch = model(input_batch)
        outputs.append(output_batch.cpu().detach().numpy())
outputs = np.concatenate(outputs, axis=0)
outputs = outputs.flatten()

In [None]:
from pathlib import Path


image_names = [Path(tup[0]).name for tup in test_images.imgs]

Sigmoid needs to be applied before returning the final scores, because the output of our model is a vector composed of logits, so to transform it to the class predictions, the sigmoid function needs to be used.

In [None]:
import pandas as pd


test_predictions_df = pd.DataFrame({"filename": image_names, "score": 1 / (1 + np.exp(-outputs))})
test_predictions_df.to_csv(MODELS_PATH.joinpath("test-predictions.csv"), index=False)