Natalie Cardoso
### Homework 3 Part 3
For Part 3, please implement and train a network to predict what species are shown in an image.
In particular, **the trained network should decide if an image shows zebras and decide if an image
shows giraffes**. These decisions should be made independently, allowing the possibility of an image
having both zebras and giraffes. One way to do this might be to have a separate network for each
species, but this does not allow learning of common information. Instead you can form a **combined
loss function, using BCEWithLogitsLoss**:

https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

This is just a combination of the cross-entropy loss values for zebras and giraffes separately.

Rather than building a network from PyTorch primitives, please **use a pre-trained network
(architecture and weights)** as a backbone and **add a fully-connected network on top. Train only
this fully-connected network. Use the Dataset class you created for Part 2.**

Submit a Jupyter notebook that **shows your neural network model (class), the training and
testing functionality, and your final results**. Please also show **test images** (at least 5 and maybe up
to 10) that were classified correctly and images that were classified incorrectly, both for zebras and
for giraffes. Doing this is important to help understand when and why the network succeeds and
fails, and it should be a regular part of your work.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, utils, models
from torchsummary import summary
import torch.optim as optim
import pandas as pd
import numpy as np
import os
import sys
import matplotlib.pyplot as plt
from PIL import Image

In [2]:
from google.colab import drive, files
drive.mount('/content/drive') # force_remount=True
%cd /content/drive/MyDrive/AI_Cons/

Mounted at /content/drive
/content/drive/MyDrive/AI_Cons


In [3]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

Using cpu device


I chose ResNet because it's an image classification neural network that is trained on zebras previously. It is a well-known and trusted
with extensive documentation online.

#### Loading in my pretrained network - ResNet

In [4]:
model = models.resnet18(pretrained=True).to(device)

# Freezing the base model layers to prevent retraining
for param in model.parameters():
    param.requires_grad = False

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 135MB/s]


Notes on the Resnet Model:
- Requires input images dimensions (256,256)
- My additional fully connected layer needs dimensions (2048,2) - 2 for the 2 classes (zebra, giraffe)
- Image preprocessing requires:
  1. (224,224) center crop
  2. image is normalized with mean = 255*[0.485, 0.456, 0.406] and
  std = 255*[0.229, 0.224, 0.225]
  3. transpose it from HWC to CHW layout
- Post-processing involves calculating the softmax probability scores for each class

#### Creating the New Fully Connected Layer


In [5]:
classes = 2
model.fc = nn.Linear(512, classes).to(device)
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

#### Defining the Loss Function

In [6]:
loss_fn = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) # may want to adjust and add weight_decay for better accuracy

#### Copy of the Homework 2 Custom Dataset

In [7]:
class MyDataset(Dataset):
    def __init__(self,
                 csv_file,      # images could be provided with in a series of directories
                 root_dir,     # images could be provided as a list as well
                 transform = transforms.ToTensor()):  # provide transformation to apply to each image
      """
      Organize the images and the associated labels into two lists.  Potentially create additional
      lists if more complicated information is need.  Important note: images are NOT
      read and stored in this initializer.  They are read in __getitem__ as needed.
      """
      self.csv_file = csv_file
      self.root_dir = root_dir
      %cd $self.root_dir
      self.images = pd.read_csv(self.csv_file)
      # Record the transform that may need to be applied.
      self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        '''
        Return a tuple with the data, ground truth label, and any other data
        associated with a single image.
        '''
        img_name = self.images.iloc[idx, 0]
        img_path = os.path.join(self.root_dir, img_name)
        im = Image.open(img_name)

        if self.transform is not None:
            im = self.transform(im)

        isGiraffe = self.images.iloc[idx, 1]
        isZebra = self.images.iloc[idx, 2]
        if isGiraffe and isZebra:
          label = [1, 1] # 'Both'
        elif isGiraffe:
          label = [1, 0] # 'Giraffe'
        elif isZebra:
          label = [0, 1] # 'Zebra'

        label = torch.tensor(label, dtype=torch.float32)

        return im, label

Initialize the Datasets & Perform any Necessary Preprocessing for ResNet

In [8]:
image_transforms = transforms.Compose([#transforms.Grayscale(num_output_channels=1), \
                                      transforms.ToTensor(), transforms.Resize((224, 224)), \
                                       transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])

dataset = MyDataset(csv_file='/content/drive/MyDrive/AI_Cons/hw3_data/metadata.csv',
                    root_dir='/content/drive/MyDrive/AI_Cons/hw3_data/images/',
                    transform=image_transforms)

train_dataset, val_dataset, test_dataset = torch.utils.data.random_split(dataset, [0.7, 0.15, 0.15], generator=torch.Generator())

/content/drive/MyDrive/AI_Cons/hw3_data/images


Create DataLoaders for our custom Dataset class

In [9]:
batch_size = 32

train_dataloader = DataLoader(train_dataset, batch_size=batch_size)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size)

Train the Model on my Dataset (my layer only!) & Testing

In [10]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(train_dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 10 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [11]:
def test(dataloader, model, loss_fn, incorrect_examples, correct_examples):
    size = len(test_dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():  # Why do we do this? - we don't want to adjust the gradient & makes it faster
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()

            fix = torch.tensor((), dtype=torch.float32)
            for val in pred.argmax(1):
              if val == 0:
                item = torch.tensor([1,0], dtype=torch.float32).unsqueeze(0)
                fix = torch.cat((fix, item), dim=0)
              elif val == 1:
                item = torch.tensor([0,1], dtype=torch.float32).unsqueeze(0)
                fix = torch.cat((fix, item), dim=0)
              else:
                item = torch.tensor([1,1], dtype=torch.float32).unsqueeze(0)
                fix = torch.cat((fix, item), dim=0)
            fix = fix.to(device)
            correct += ((fix == y).type(torch.float).sum().item()) / 2        # needed to adjust the sum after doubling the size of the tensor

            if torch.all(torch.eq(fix, y)) and len(correct_examples) < 6:
                correct_examples.append(X.cpu())
            if (not torch.all(torch.eq(fix, y))) and len(incorrect_examples) < 6:
                incorrect_examples.append(X.cpu())

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
    return correct_examples, incorrect_examples

Note: This needs to be rerun the GPU disconnected my run 15mins before turn in. It was too slow to complete any reasonable amount of epochs before turn in without the GPU.

In [12]:
epochs = 50

for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    incorrect_examples = []
    correct_examples = []
    correct_examples, incorrect_examples = test(test_dataloader, model, loss_fn, incorrect_examples, correct_examples)
print("Done!\n")

Epoch 1
-------------------------------




loss: 0.726073  [    0/ 3464]
loss: 0.150003  [  320/ 3464]


KeyboardInterrupt: ignored

#### Attempt at printing out the Images
Wish I could troubleshoot this further, but I'm at the deadline and this was the best I could produce.

In [None]:

for i in range(10):
    transform = transforms.ToPILImage(mode='RGB')
    if i < 6:
        T = correct_examples[i][0]
        implot = plt.imshow(T.permute(1, 2, 0))
        print("Correct:")
        plt.show()
    if i > 5:
        T = correct_examples[i-6][0]
        implot = plt.imshow(T.permute(1, 2, 0))
        print("Incorrect:")
        plt.show()
