# Comparing vision models

![Banner](https://miro.medium.com/max/1920/1*oB3S5yHHhvougJkPXuc8og.gif)

In this notebook, we are going to compare a few models on the dog vs cats classification. We will *not* be working for some state of the art results or implement some cutting edge technique, but are going to compare a few models. The models will be compared on a set of fixed hyper-parameters which could be tuned and tweak as per the requirement.

We are going to compare the following models:
1. [Custom plain convolution model](#Custom-Model)
2. [VGG-16](#VGG-16)
3. [VGG-19](#VGG-19)
3. [ResNet50](#ResNet50)
4. [ResNet101](#ResNet101)

## Unloading the datasets

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
np.random.seed(2020)

In [None]:
!mkdir train test
!unzip -q /kaggle/input/dogs-vs-cats-redux-kernels-edition/train.zip -d train
!unzip -q /kaggle/input/dogs-vs-cats-redux-kernels-edition/test.zip -d test

## Setting up the data

First of all, we are going to setup the data paths and files into required variables.

In [None]:
import random
import json
import csv

from matplotlib import pyplot as plt
%matplotlib inline

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.preprocessing import image
from keras import optimizers

In [None]:
TRAIN_DIR = './train/train/'
TEST_DIR = './test/test'

ROWS = 150
COLS = 150
CHANNELS = 3

BATCH_SIZE=64

# HyperParams
EPOCHS=5
train_steps = len(os.listdir(TRAIN_DIR))/BATCH_SIZE
validation_steps = len(os.listdir(TEST_DIR))/BATCH_SIZE
lr=1e-4

In [None]:
original_train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] # use this for full dataset
train_dogs =   [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR) if 'dog' in i]
train_cats =   [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR) if 'cat' in i]

test_images =  [TEST_DIR+i for i in os.listdir(TEST_DIR)]

# slice datasets for memory efficiency on Kaggle Kernels, delete if using full dataset
original_train_images = train_dogs[:12000] + train_cats[:12000]
# test_images =  test_images[:100]

# section = int(len(original_train_images) * 0.8)
train_images = original_train_images[:18000]
validation_images = original_train_images[18000:]

In [None]:
len(train_images)

In [None]:
def plot_arr(arr):
    plt.figure()
    plt.imshow(image.array_to_img(arr))
    plt.show()

def plot(img):
    plt.figure()
    plt.imshow(img)
    plt.show()
    
def prep_data(images):
    count = len(images)
    X = np.ndarray((count, ROWS, COLS, CHANNELS), dtype=np.float32)
    y = np.zeros((count,), dtype=np.float32)
    
    for i, image_file in enumerate(images):
        img = image.load_img(image_file, target_size=(ROWS, COLS))
        X[i] = image.img_to_array(img)
        if 'dog' in image_file:
            y[i] = 1.
        if i%1000 == 0: print('Processed {} of {}'.format(i, count))
    
    return X, y

In [None]:
X_train, y_train = prep_data(train_images)

In [None]:
X_validation, y_validation = prep_data(validation_images)

In [None]:
train_datagen = image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

validation_datagen = image.ImageDataGenerator(rescale=1./255)

In [None]:
train_generator = train_datagen.flow(
    X_train,
    y_train,
    batch_size=BATCH_SIZE)

validation_generator = validation_datagen.flow(
    X_validation,
    y_validation,
    batch_size=BATCH_SIZE)

## Custom Model

In [None]:
def create_custom_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(ROWS, COLS, CHANNELS)))
    model.add(Conv2D(32, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2,2)))

    model.add(Flatten())
    model.add(Dropout(0.5))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    return model

In [None]:
model = create_custom_model()
model.summary()

In [None]:
model.compile(loss='binary_crossentropy',
             optimizer=optimizers.Adam(lr=lr),
             metrics=['accuracy'])

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=EPOCHS,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    verbose=1)

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy', color='red')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color='red')
plt.title('Training and validation loss')
plt.legend()
plt.show()

## VGG-16

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. 

![VGG-16](https://xgkfq28377.i.lithium.com/t5/image/serverpage/image-id/8241i196E2A78143567C5/image-size/medium?v=1.0&px=400)

The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv.  layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.

Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.

The Network architecture is as follows:
![architecture](https://neurohive.io/wp-content/uploads/2018/11/Capture-564x570.jpg)

Here is a link to the sample VGG network for Keras: 
[Keras code of VGG-16](https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3)

In [None]:
model = keras.models.Sequential()
model.add(keras.applications.VGG16(include_top=False, pooling='max', weights='imagenet'))
model.add(Dense(1, activation='sigmoid'))
# ResNet-50 model is already trained, should not be trained
model.layers[0].trainable = True

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.Adam(lr=lr),
             metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=EPOCHS,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    verbose=1)

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_accuracy']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation acc', color='red')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color='red')
plt.title('Training and validation loss')
plt.legend()
plt.show()

# VGG-19

In [None]:
model = keras.models.Sequential()
model.add(keras.applications.VGG19(include_top=False, pooling='max', weights='imagenet'))
model.add(Dense(1, activation='sigmoid'))
# ResNet-50 model is already trained, should not be trained
model.layers[0].trainable = True

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.Adam(lr=lr),
             metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=EPOCHS,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    verbose=1)

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy', color='red')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color='red')
plt.title('Training and validation loss')
plt.legend()
plt.show()

The differences between the “VGG-16 Neural Network” and the “VGG-19 Neural Network” are as follows respectively:

1. The “VGG-19 Neural Network” consists of 19 layers of deep neural network whereas the “VGG-16 Neural Network” consists of 16 layers of deep neural network respectively.
2. The smaller number in terms of deep neural network is used for “ImageNet” and the other bigger number in terms of deep neural network is used for “CIFAR-10” respectively.
3. The size of the “VGG-16” network in terms of fully connected nodes is 533 MB. and the size of the “VGG-19” network in terms of fully connected nodes is 574 MB. respectively.
4. The smaller net neural network in terms of “VGG-16” are more desirable like “Squeezenet”, “GoogLeNet” etc. , whereas the more larger net in terms of neural network employs certain deep learning techniques as well as certain image classification problems as well respectively.

So, this was all about the most basic difference between the “VGG-16” and the “VGG-19” Neural Networks respectively, and what does they actually mean and also what does they stands for and what are their file size and also what does they actually deployed and implemented in terms of various “Neural Networks” respectively.There are many more differences between them, but these were some of the very most basic differences between these two types of the “Neural Networks” respectively.

## ResNet50

> This was one of the bottlenecks of VGG. They couldn’t go as deep as wanted, because they started to lose generalization capability.

To solve this problem, Resnets were introduced. 
One of the problems ResNets solve is the famous known vanishing gradient. This is because when the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This result on the weights never updating its values and therefore, no learning is being performed.

With ResNets, **the gradients can flow directly through the skip connections backwards from later layers to initial filters.**

![Resnet](https://miro.medium.com/max/1524/1*6hF97Upuqg_LdsqWY6n_wg.png)

Since ResNets can have variable sizes, depending on how big each of the layers of the model are, and how many layers it has, we will follow the described by the authors in the paper [1] — ResNet 34 — in order to explain the structure after these networks.

In here we can see that the ResNet (the one on the right) consists on one convolution and pooling step (on orange) followed by 4 layers of similar behavior.

Each of the layers follow the same pattern. They perform 3x3 convolution with a fixed feature map dimension (F) [64, 128, 256, 512] respectively, bypassing the input every 2 convolutions. Furthermore, the width (W) and height (H) dimensions remain constant during the entire layer.

The dotted line is there, precisely because there has been a change in the dimension of the input volume (of course a reduction because of the convolution). Note that this reduction between layers is achieved by an increase on the stride, from 1 to 2, at the first convolution of each layer; instead of by a pooling operation, which we are used to see as down samplers.
In the table, there is a summary of the output size at every layer and the dimension of the convolutional kernels at every point in the structure.

![Resnet Table](https://miro.medium.com/max/936/1*I2557MCaFdNUm4q9TfvOpw.png)

Now in order to calculate the shape of the upcoming block, here is what should be done:

![block](https://miro.medium.com/max/680/1*CJn_fMeW4m2OSt71jzO4WA.png)

![block2](https://miro.medium.com/max/904/1*_kbJ_fvRhVPQ1fRRssEwhA.png)



To understand the function of every single block in Residual Network, you should refer to an amazing blog : [Understanding and visualizing ResNets](https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8)

### Implementing ResNet in Keras

In [None]:
model = keras.models.Sequential()
model.add(keras.applications.ResNet50(include_top=False, pooling='max', weights='imagenet'))
model.add(Dense(1, activation='sigmoid'))
# ResNet-50 model is already trained, should not be trained
model.layers[0].trainable = True

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.Adam(lr=lr),
             metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=EPOCHS,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    verbose=1)

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc', color='red')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color='red')
plt.title('Training and validation loss')
plt.legend()
plt.show()

In [None]:
model = keras.models.Sequential()
model.add(keras.applications.ResNet101(include_top=False, pooling='max', weights='imagenet'))
model.add(Dense(1, activation='sigmoid'))
# ResNet-50 model is already trained, should not be trained
model.layers[0].trainable = True

model.compile(loss='binary_crossentropy',
             optimizer=optimizers.Adam(lr=lr),
             metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=train_steps,
    epochs=EPOCHS,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    verbose=1)

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy', color='red')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss', color='red')
plt.title('Training and validation loss')
plt.legend()
plt.show()

In [None]:
!rm -rf train
!rm -rf test