<a href="https://colab.research.google.com/github/pdelfidali/SNDL/blob/main/HomeAssignment3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Home assignment 3 - Transfer Learning - Cats vs Dogs
## Piotr del Fidali | Kamil Zaremba

We decided to use transfer learning to classify cats and dogs. We will use the dataset from [Kaggle](https://www.kaggle.com/datasets/chetankv/dogs-cats-images), it contains 10000 cat images and 10000 dog images, divided into 80% training and 20 % test data sets. We will use Google Colab with GPU for faster training process.

In [None]:
import tensorflow as tf

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))


Found GPU at: /device:GPU:0


In [None]:
from google.colab import files
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"piotrdelfidali","key":"00f2a9bfc77e9e7b8b3a2341bc997be7"}'}

In [None]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle datasets download -d chetankv/dogs-cats-images
!unzip dogs-cats-images.zip

# Keras
In Keras there is ImageDataGenerator class that will be used. It rescales, resizes and labels the input images from given directory. We create one generator for each train and test data. We will use two base models:
- VGG16 which is a deep convolutional neural network architecture known for its simplicity, featuring 16 layers with convolutional layers stacked on top of each other in increasing depth.
- InceptionV3 which is a more complex architecture that utilizes inception modules, combining filters of different sizes in the same network layer to capture information at various scales.
On top of each base model, we add four layers. The First one is Flatten, we convert the multidimensional input layer and convert into a one-dimensional array. The Next three are Dense layers, the final one output size is 1 as we are doing binary classification.

In [None]:
from keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1. / 255)
validation_datagen = ImageDataGenerator(rescale=1. / 255)

In [None]:
train_generator = train_datagen.flow_from_directory('/content/dataset/training_set',
                                                    batch_size = 32,
                                                    class_mode = 'binary',
                                                    target_size = (224, 224))


validation_generator =  validation_datagen.flow_from_directory( '/content/dataset/test_set',
                                                          batch_size  = 32,
                                                          class_mode  = 'binary',
                                                          target_size = (224, 224))

Found 8000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.


In [None]:
from keras.models import Sequential
from keras.layers import Input, Dense, Flatten
from keras.applications.vgg16 import VGG16

keras_vgg16 = Sequential()
keras_vgg16.add(VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3)))
keras_vgg16.add(Flatten())
keras_vgg16.add(Dense(128, activation="relu"))
keras_vgg16.add(Dense(64, activation="relu"))
keras_vgg16.add(Dense(1, activation="sigmoid"))

keras_vgg16.layers[0].trainable = False

keras_vgg16.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 7, 7, 512)         14714688  
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 dense (Dense)               (None, 128)               3211392   
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 1)                 65        
                                                                 
Total params: 17934401 (68.41 MB)
Trainable param

In [None]:
keras_vgg16.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
r = keras_vgg16.fit(
        train_generator,
        epochs=50,
        validation_data=validation_generator)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [None]:
keras_vgg16.evaluate(validation_generator)



[0.8609157800674438, 0.9380000233650208]

For InceptionV3 we are doing the same model as for a VGG16 model, replacing only a base model.

In [None]:
from keras.applications import InceptionV3

keras_inception = Sequential()
keras_inception.add(InceptionV3(weights="imagenet", include_top=False, input_shape=(224, 224, 3)))
keras_inception.add(Flatten())
keras_inception.add(Dense(128, activation="relu"))
keras_inception.add(Dense(64, activation="relu"))
keras_inception.add(Dense(1, activation="sigmoid"))

keras_inception.layers[0].trainable = False

keras_inception.summary()

In [None]:
keras_inception.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
r = keras_inception.fit(
        train_generator,
        epochs=50,
        validation_data=validation_generator)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [None]:
keras_inception.evaluate(validation_generator)



[0.2124653160572052, 0.9890000224113464]

For both models results are very good, after only 50 epochs, model using VGG16 has an accuracy of 0.93800. The model using InceptionV3 did even better and after same number of epochs has an accuracy of 0.98900. The second model is not only better in terms of accuracy, but also it was trained a lot faster.

# PyTorch
For PyTorch we use only the Inception3 as a base model. We used the same network structure as in a Keras scenario. The implementation was a bit harder as we had to write more code, but it also allows to make a lot of tuning and custom logic for a specific task. Inception3 in PyTorch takes other input sizes (299x299) than in the Keras.

In [None]:
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader

train_transform = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

valid_transform = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

train_dataset = datasets.ImageFolder('../data/dataset/training_set', transform=train_transform)
valid_dataset = datasets.ImageFolder('../data/dataset/test_set', transform=valid_transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
valid_loader = DataLoader(valid_dataset, batch_size=32, shuffle=False)

In [None]:
class InceptionV3Model(nn.Module):
    def __init__(self):
        super(InceptionV3Model, self).__init__()
        self.base_model = models.inception_v3(pretrained=True)
        for param in self.base_model.parameters():
            param.requires_grad = False
        self.base_model.fc = nn.Sequential(
            nn.Linear(2048, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        if self.training:
            x, aux = self.base_model(x)
            return x
        else:
            x = self.base_model(x)
            return x

pytorch_inceptionv3 = InceptionV3Model()

Downloading: "https://download.pytorch.org/models/inception_v3_google-0cc3c7bd.pth" to C:\Users\piotr/.cache\torch\hub\checkpoints\inception_v3_google-0cc3c7bd.pth
100.0%


In [None]:

# Loss and Optimizer
criterion = nn.BCELoss()
optimizer = torch.optim.RMSprop(pytorch_inceptionv3.parameters(), lr=0.001)

# Training Loop
for epoch in range(10):
    pytorch_inceptionv3.train()
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = pytorch_inceptionv3(inputs)
        loss = criterion(outputs, labels.float().unsqueeze(1))
        loss.backward()
        optimizer.step()

    pytorch_inceptionv3.eval()
    valid_loss = 0.0
    for inputs, labels in valid_loader:
        outputs = pytorch_inceptionv3(inputs)
        loss = criterion(outputs, labels.float().unsqueeze(1))
        valid_loss += loss.item()

    print(f"Epoch {epoch+1}, Validation Loss: {valid_loss / len(valid_loader)}")


Epoch 1, Validation Loss: 0.1084659102387608
Epoch 2, Validation Loss: 0.0755010639568643
Epoch 3, Validation Loss: 0.08579183453398329
Epoch 4, Validation Loss: 0.10519748406543855
Epoch 5, Validation Loss: 0.05948747944323316
Epoch 6, Validation Loss: 0.056588030926557994
Epoch 7, Validation Loss: 0.05583596926566864
Epoch 8, Validation Loss: 0.07319517400381821
Epoch 9, Validation Loss: 0.06331603878015091
Epoch 10, Validation Loss: 0.05585289882895138


In [None]:
pytorch_inceptionv3.eval()
accuracy = 0.0
total = 0

for inputs, labels in valid_loader:
    outputs = pytorch_inceptionv3(inputs)
    predicted = outputs.round()
    total += labels.size(0)
    accuracy += (predicted == labels.unsqueeze(1)).sum().item()

print(f"Accuracy: {accuracy / total}")

Accuracy: 0.98


Similar to the implementation in Keras accuracy of the model using Inception3 accuracy is 98%. The training process took much more time.
To summarize, transfer learning can help to create models much faster than with the traditional approach. The Keras and PyTorch give almost the same value of accuracy metric. The pros of Keras are that is faster to develop and faster to train, while PyTorch gives more development options and modifications in network logic.
For this specific task, the InceptionV3 model did better in both time of execution and accuracy than the VGG16 model.