In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import json
from collections import Counter
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset, random_split

import torchvision
import torchvision.transforms as transforms
import torchsummary

from tqdm import tqdm
from tqdm.notebook import tqdm

from torchviz import make_dot

from torchmetrics import Recall, Precision, F1Score
from torchmetrics.classification import BinaryRecall, BinaryPrecision, BinaryF1Score

In [2]:
base_path = "C:\\Users\\johnn\\Desktop\\Repos\\Capstonfire\\"

# Experimenting with our first model

In the previous notebook (".\dummy_cnn.ipynb"), we created and trained our first model using only images of the FIRE dataset. This model will be our baseline model from which we will compare future models in regards to their performance.

For our first experiment, let's try and use the Adam optimizer and see how if the model improves.

In order to streamline new experiments, we created a Python module where we added the FireDataset class, the SimpleCNN class with our neural network architecture, the training cycle, and the metrics calculations. This module is in ".\capstonfire_utils.py" file. We also passed the dataset spliter into dataloaders and the accuracy and loss plot functions.

In [3]:
from capstonfire_utils import FireDataset, SimpleCNN, split_dataset_into_dataloaders, train_model, calculate_metrics, plot_accuracy, plot_loss

Let's check if they work.

In [4]:
model_name = "dummy_cnn"

In [5]:
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])

In [7]:
dataset = FireDataset(os.path.join(base_path, "FIRE Dataset"), transforms=transform)

train_loader, val_loader, test_loader = split_dataset_into_dataloaders(dataset, 50, 0.7, 0.2, 0.1)

In [8]:
model = SimpleCNN()

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0003, momentum=0.9)

In [10]:
history = train_model(model, criterion, optimizer, base_path, model_name, 20, train_loader, val_loader)

CUDA is available. Using GPU.


In [12]:
test_path = os.path.join(base_path, "test dataset")

custom_test_loader = DataLoader(FireDataset(test_path, transforms=transform), batch_size=50, shuffle=False)

In [13]:
calculate_metrics(model, custom_test_loader)

CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.68
Precision on the test set: 0.68
F1 Score on the test set: 0.68


Done. For now, let's test the training with a better optimizer. We will use Adam for this. Since Adam trains far faster than simple gradient descent, we will train with only 5 epochs, to avoid overfitting. 

In [14]:
adam_model_name = "adam_dummy_cnn"

In [15]:
adam_model = SimpleCNN()
adam_optimizer = optim.Adam(adam_model.parameters(), lr=0.0003)
criterion = nn.CrossEntropyLoss()

In [16]:
adam_history = train_model(adam_model, criterion, adam_optimizer, base_path, adam_model_name, 5, train_loader, val_loader)

CUDA is available. Using GPU.


  0%|          | 0/5 [00:00<?, ?it/s]

Epoch [1/5], Train Loss: 0.5423, Train Accuracy: 0.7329, Validation Loss: 0.4512, Validation Accuracy: 0.7250
Epoch [2/5], Train Loss: 0.3503, Train Accuracy: 0.7957, Validation Loss: 0.3399, Validation Accuracy: 0.9000
Epoch [3/5], Train Loss: 0.2662, Train Accuracy: 0.8943, Validation Loss: 0.2848, Validation Accuracy: 0.9200
Epoch [4/5], Train Loss: 0.2441, Train Accuracy: 0.9257, Validation Loss: 0.3219, Validation Accuracy: 0.8900
Epoch [5/5], Train Loss: 0.2193, Train Accuracy: 0.9457, Validation Loss: 0.2676, Validation Accuracy: 0.9350
Finished Training


In [17]:
calculate_metrics(adam_model, custom_test_loader)

CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.63
Precision on the test set: 0.63
F1 Score on the test set: 0.63


Looks like it got worse. The reduced epoch number might have made a large effect. Let's up them to 10.

In [18]:
adam_model_name = "adam_dummy_cnn_10"
adam_model = SimpleCNN()
adam_optimizer = optim.Adam(adam_model.parameters(), lr=0.0003)
criterion = nn.CrossEntropyLoss()
adam_history = train_model(adam_model, criterion, adam_optimizer, base_path, adam_model_name, 10, train_loader, val_loader)

CUDA is available. Using GPU.


  0%|          | 0/10 [00:00<?, ?it/s]

Epoch [1/10], Train Loss: 0.3517, Train Accuracy: 0.8071, Validation Loss: 0.2805, Validation Accuracy: 0.9150
Epoch [2/10], Train Loss: 0.1786, Train Accuracy: 0.9286, Validation Loss: 0.2986, Validation Accuracy: 0.8500
Epoch [3/10], Train Loss: 0.1550, Train Accuracy: 0.9329, Validation Loss: 0.3035, Validation Accuracy: 0.8450
Epoch [4/10], Train Loss: 0.1333, Train Accuracy: 0.9371, Validation Loss: 0.1731, Validation Accuracy: 0.9350
Epoch [5/10], Train Loss: 0.0938, Train Accuracy: 0.9543, Validation Loss: 0.1612, Validation Accuracy: 0.9450
Epoch [6/10], Train Loss: 0.0664, Train Accuracy: 0.9771, Validation Loss: 0.1594, Validation Accuracy: 0.9400
Epoch [7/10], Train Loss: 0.0607, Train Accuracy: 0.9771, Validation Loss: 0.1538, Validation Accuracy: 0.9550
Epoch [8/10], Train Loss: 0.0546, Train Accuracy: 0.9800, Validation Loss: 0.1511, Validation Accuracy: 0.9450
Epoch [9/10], Train Loss: 0.0530, Train Accuracy: 0.9771, Validation Loss: 0.1468, Validation Accuracy: 0.9550
E

In [19]:
calculate_metrics(adam_model, custom_test_loader)

CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.68
Precision on the test set: 0.68
F1 Score on the test set: 0.68


Alright, looks like the problem is the lack of epochs. Let's try next with 20 then.

In [20]:
adam_model_name = "adam_dummy_cnn_20"
adam_model = SimpleCNN()
adam_optimizer = optim.Adam(adam_model.parameters(), lr=0.0003)
criterion = nn.CrossEntropyLoss()
adam_history = train_model(adam_model, criterion, adam_optimizer, base_path, adam_model_name, 20, train_loader, val_loader)

CUDA is available. Using GPU.


  0%|          | 0/20 [00:00<?, ?it/s]

Epoch [1/20], Train Loss: 0.4167, Train Accuracy: 0.7929, Validation Loss: 0.3122, Validation Accuracy: 0.9100
Epoch [2/20], Train Loss: 0.2228, Train Accuracy: 0.9100, Validation Loss: 0.2749, Validation Accuracy: 0.9100
Epoch [3/20], Train Loss: 0.1522, Train Accuracy: 0.9400, Validation Loss: 0.2552, Validation Accuracy: 0.9150
Epoch [4/20], Train Loss: 0.1319, Train Accuracy: 0.9471, Validation Loss: 0.2128, Validation Accuracy: 0.9300
Epoch [5/20], Train Loss: 0.1288, Train Accuracy: 0.9443, Validation Loss: 0.1478, Validation Accuracy: 0.9350
Epoch [6/20], Train Loss: 0.0889, Train Accuracy: 0.9686, Validation Loss: 0.1481, Validation Accuracy: 0.9500
Epoch [7/20], Train Loss: 0.0840, Train Accuracy: 0.9671, Validation Loss: 0.1470, Validation Accuracy: 0.9500
Epoch [8/20], Train Loss: 0.0730, Train Accuracy: 0.9757, Validation Loss: 0.2117, Validation Accuracy: 0.9200
Epoch [9/20], Train Loss: 0.0757, Train Accuracy: 0.9714, Validation Loss: 0.1537, Validation Accuracy: 0.9450
E

In [21]:
calculate_metrics(adam_model, custom_test_loader)
calculate_metrics(model, custom_test_loader)

CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.75
Precision on the test set: 0.75
F1 Score on the test set: 0.75
CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.68
Precision on the test set: 0.68
F1 Score on the test set: 0.68


Well, same number of epochs and the Adam model is worse at predicting. Looks like it actually is overfitted. Perhaps the loss function profile is too simple to be using stochastic methods and Stochastic Gradient Descent is good enough.

Looks like SDG is preferable over Adam.

Let's now test with resized images, check if anything is different.

In [23]:
resized_fire_dataset = FireDataset(os.path.join(base_path, "fire_dataset"), transforms=transform)

In [25]:
resized_train_loader, resized_val_loader, resized_test_loader = split_dataset_into_dataloaders(resized_fire_dataset, 50, 0.7, 0.2, 0.1)

In [26]:
resized_model_name = "resized_dummy_cnn"
resized_model = SimpleCNN()
optimizer = optim.SGD(resized_model.parameters(), lr=0.0003, momentum=0.9)
criterion = nn.CrossEntropyLoss()

In [27]:
resize_history = train_model(resized_model, criterion, optimizer, base_path, resized_model_name, 20, resized_train_loader, resized_val_loader)

CUDA is available. Using GPU.


  0%|          | 0/20 [00:00<?, ?it/s]

Epoch [1/20], Train Loss: 0.6936, Train Accuracy: 0.4929, Validation Loss: 0.5983, Validation Accuracy: 0.7500
Epoch [2/20], Train Loss: 0.5903, Train Accuracy: 0.7457, Validation Loss: 0.5779, Validation Accuracy: 0.7500
Epoch [3/20], Train Loss: 0.5635, Train Accuracy: 0.7457, Validation Loss: 0.5553, Validation Accuracy: 0.7500
Epoch [4/20], Train Loss: 0.5462, Train Accuracy: 0.7457, Validation Loss: 0.5386, Validation Accuracy: 0.7500
Epoch [5/20], Train Loss: 0.5286, Train Accuracy: 0.7457, Validation Loss: 0.5206, Validation Accuracy: 0.7500
Epoch [6/20], Train Loss: 0.5094, Train Accuracy: 0.7457, Validation Loss: 0.5004, Validation Accuracy: 0.7500
Epoch [7/20], Train Loss: 0.4862, Train Accuracy: 0.7457, Validation Loss: 0.4776, Validation Accuracy: 0.7500
Epoch [8/20], Train Loss: 0.4630, Train Accuracy: 0.7457, Validation Loss: 0.4505, Validation Accuracy: 0.7500
Epoch [9/20], Train Loss: 0.4331, Train Accuracy: 0.7457, Validation Loss: 0.4253, Validation Accuracy: 0.7550
E

In [28]:
calculate_metrics(resized_model, custom_test_loader)
calculate_metrics(model, custom_test_loader)

CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.63
Precision on the test set: 0.63
F1 Score on the test set: 0.63
CUDA is available. Using GPU.


  0%|          | 0/60 [00:00<?, ?it/s]

Recall on the test set: 0.68
Precision on the test set: 0.68
F1 Score on the test set: 0.68


Looks like the model predicts best when the images have better quality, which makes sense: images are already resized when entering the model, which already oversimplifies some fire patterns. Doing so twice only makes it worse.

Guess it makes sense to check if dynamic resizing makes the algorithm better. Dynamic resizing, for this use case, is resizing to an aspect ratio of 3:2 for images in landscape (width > height) and 2:3 for images in portrait (width < height).

Let's create a new Dataset class and train the regular model like before. This new Dataset class will not have a transformer input since the transformers.Resize() transformation will have a different input depending on the image dimensions.

In [29]:
class ResizeFireDataset(Dataset):
    def __init__(self, root_dir):
        self.root_dir = root_dir
        self.transform_resize : transforms.Resize= ...
        self.transform_totensor = transforms.ToTensor()
        self.classes = sorted(os.listdir(root_dir))
        # Store the folder names in a dictionary as the class names alongside the class numeric label
        self.class_to_idx = {} 
        for i, cls in enumerate(self.classes):
            self.class_to_idx[cls] = i

        self.data = self._load_data()

    def _load_data(self):
        data = []
        for class_name in self.classes: # Fetch the folders through the class name
            class_path = os.path.join(self.root_dir, class_name)
            for filename in os.listdir(class_path): # Fetch the images inside each folders
                img_path = os.path.join(class_path, filename) # Obtain the name of the current image
                data.append((img_path, self.class_to_idx[class_name])) # Add the image to a list paired with its class' numeric label
        return data

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx): # data[idx]
        img_path, label = self.data[idx]
        img = Image.open(img_path).convert("RGB")
        img_size = img.size
        resize_dimensions = (384, 256) if img_size[0]> img_size[1] else (256, 384)

        self.transform_resize = transforms.Resize(resize_dimensions)
        img = self.transform_resize(img)
        img = self.transform_totensor(img)

        return img, label

In [30]:
model_name = "dynamic_resize_dummy_cnn"

In [32]:
dynamic_resize_dataset = ResizeFireDataset(os.path.join(base_path, "FIRE Dataset"))

dynamic_train_loader, dynamic_val_loader, dynamic_test_loader = split_dataset_into_dataloaders(dynamic_resize_dataset, 50, 0.7, 0.2, 0.1)

In [33]:
dynamic_resize_model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0003, momentum=0.9)

In [34]:
history = train_model(dynamic_resize_model, criterion, optimizer, base_path, model_name, 20, dynamic_train_loader, dynamic_val_loader)

CUDA is available. Using GPU.


  0%|          | 0/20 [00:00<?, ?it/s]

RuntimeError: stack expects each tensor to be equal size, but got [3, 384, 256] at entry 0 and [3, 256, 384] at entry 8

Oh... "RuntimeError: stack expects each tensor to be equal size, but got [3, 384, 256] at entry 0 and [3, 256, 384] at entry 28". 

Guess dynamic resizing is out of the question. We could transform portrait images into landscape using transforms.Rotate, but fire images rotated 90º are not exactly real life examples of fires and smokes.

We will throw this idea out and move to creating a binary classification algorithm.

In [35]:
binary_model_name = "binary_dummy_cnn"

In [36]:
binary_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor()
])

binary_dataset = FireDataset(os.path.join(base_path, "FIRE Dataset"), binary_transforms)

binary_train_loader, binary_val_loader, binary_test_loader = split_dataset_into_dataloaders(binary_dataset, 50, 0.7, 0.2, 0.1)

In [37]:
# Change the output to one class
class SimpleBinaryCNN(nn.Module):
    def __init__(self):
        super(SimpleBinaryCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=5) # to capture basic patterns from the image
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)# to capture basic patterns from the previous patterns (results in capturing more complex patterns from the original image)
        self.fc1 = nn.Linear(74420, 50) # 1000 = 20 * 50 * 1 -> conv2.output * batch_size * ?
        self.fc2 = nn.Linear(50, 1) # DNN > WNN; also 1 classes

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = x.view(-1, 74420)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        #return F.log_softmax(x, dim=1)
        return x

In [38]:
binary_model = SimpleBinaryCNN()

In [39]:
binary_criterion = nn.BCELoss()
optimizer = optim.SGD(binary_model.parameters(), lr=0.0003, momentum=0.9)

Since this is a binary classification, the training and testing loops need to be altered in the prediction part.

In [40]:
# Within the training loop
"""
    # Ensure labels have the same size as the output
    labels_val = labels.view(-1, 1).float()
    loss_val = criterion(outputs_val, labels_val.float())

    # Change here
    predicted_val = (torch.sigmoid(outputs_val) > 0.5).float()
    
"""

'\n    # Ensure labels have the same size as the output\n    labels_val = labels.view(-1, 1).float()\n    loss_val = criterion(outputs_val, labels_val.float())\n\n    # Change here\n    predicted_val = (torch.sigmoid(outputs_val) > 0.5).float()\n    \n'

Hmm, hold on. So binary classification expects all data that enters the loop to be of the positive class. However, there are images that are not fire that look like fire, such as clouds. It would make more sense to maintain the training for a multi-class classification algorithm with two classes over just giving fire images to the dataset for training, so we can have more control over cases that look like fire but are not.

That's about it the tests we wanted to do. Let's now try to improve the model with more data (open "improved_dummy_cnn.ipynb" file)

EDIT: BCELoss also accepts images that are of the negative class. However, future improvements of this model would consider adding further labels, such as smoke and controlled fires. As such, maintaing the CrossEntropyLoss criterion allows for improved upscalling.