### Step 0: Download the dog breed dataset

In [None]:
!wget https://s3-us-west-1.amazonaws.com/udacity-aind/dog-project/dogImages.zip

In [None]:
!unzip dogImages.zip

In [1]:
!ls dogImages/*

dogImages/test:
001.Affenpinscher		    068.Flat-coated_retriever
002.Afghan_hound		    069.French_bulldog
003.Airedale_terrier		    070.German_pinscher
004.Akita			    071.German_shepherd_dog
005.Alaskan_malamute		    072.German_shorthaired_pointer
006.American_eskimo_dog		    073.German_wirehaired_pointer
007.American_foxhound		    074.Giant_schnauzer
008.American_staffordshire_terrier  075.Glen_of_imaal_terrier
009.American_water_spaniel	    076.Golden_retriever
010.Anatolian_shepherd_dog	    077.Gordon_setter
011.Australian_cattle_dog	    078.Great_dane
012.Australian_shepherd		    079.Great_pyrenees
013.Australian_terrier		    080.Greater_swiss_mountain_dog
014.Basenji			    081.Greyhound
015.Basset_hound		    082.Havanese
016.Beagle			    083.Ibizan_hound
017.Bearded_collie		    084.Icelandic_sheepdog
018.Beauceron			    085.Irish_red_and_white_setter
019.Bedlington_terrier		    086.Irish_setter
020.Belgian_malinois		    087.Irish_terrier
021.Belgian_sheepdog	

 Clearly from the folders names, we can see that there are 133 dog breed classes in our dataset. Thus our task it to train a 133 class Deep Learning CNN model which can classify the dogs on the basis of their breed.

### Step 1: Import packages, and dog breed dataset

In [2]:
# numrical comp
import numpy as np
# reading dir
from glob import glob
# cv
import cv2     
import os
import PIL.Image
import matplotlib.pyplot as plt                        
%matplotlib inline 
# printing loop time
from tqdm import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from torch.utils.data import Dataset, DataLoader
from torchvision.models import vgg16, resnet101
from torchvision import datasets, transforms

from torchsummary import summary

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

In [3]:
dog_files = np.array(glob("dogImages/*/*/*"))

print('There are %d total dog images.' % len(dog_files))

There are 8351 total dog images.


### Step 2: Data Exploration and Data Loading

#### Write transforms and data loaders for training, testing and validation

The 224 X 224 random crop from every image for training. For testing and validation, the images are first resizrd to 256 X 256 center crop will be taken.  As a part of data-augmentation to the training set, we will use, random rotation, and horizontal flipping. Also, before passing the image as an input to the CNN model, we will normalize the images by subtracting mean, and dividing it by standard deviation. This is done for the three RGB channels separately. Channel-wise mean and standard deviation values used for normalisation are from large scale Imagenet data.

In [4]:
data_dir = 'dogImages/'

data_transforms_train =  transforms.Compose([
        transforms.RandomRotation(30),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

data_transforms_valid = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

data_transforms_test = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

# create a dictionary of three Imagefolder objects
train_data =  datasets.ImageFolder(os.path.join(data_dir, 'train'), data_transforms_train)
test_data =  datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms_test)
val_data =  datasets.ImageFolder(os.path.join(data_dir, 'valid'), data_transforms_valid)

# create dataloaders, there is no point in shuffling the testing and validation data
loaders = {}
loaders['train'] = DataLoader(train_data, batch_size=256, shuffle=True, drop_last=True)
loaders['test'] = DataLoader(test_data, batch_size=128, shuffle=False, drop_last=True)
loaders['val'] = DataLoader(val_data, batch_size=128, shuffle=False, drop_last=True)


print("Initializing Datasets and Dataloaders...")

Initializing Datasets and Dataloaders...


#### Check dog breed classes

In [5]:
class_names = train_data.classes
n_classes = len(class_names)
print(f"There are {n_classes} classes in the dataset\n")

for name in class_names:
    print(name)

There are 133 classes in the dataset

001.Affenpinscher
002.Afghan_hound
003.Airedale_terrier
004.Akita
005.Alaskan_malamute
006.American_eskimo_dog
007.American_foxhound
008.American_staffordshire_terrier
009.American_water_spaniel
010.Anatolian_shepherd_dog
011.Australian_cattle_dog
012.Australian_shepherd
013.Australian_terrier
014.Basenji
015.Basset_hound
016.Beagle
017.Bearded_collie
018.Beauceron
019.Bedlington_terrier
020.Belgian_malinois
021.Belgian_sheepdog
022.Belgian_tervuren
023.Bernese_mountain_dog
024.Bichon_frise
025.Black_and_tan_coonhound
026.Black_russian_terrier
027.Bloodhound
028.Bluetick_coonhound
029.Border_collie
030.Border_terrier
031.Borzoi
032.Boston_terrier
033.Bouvier_des_flandres
034.Boxer
035.Boykin_spaniel
036.Briard
037.Brittany
038.Brussels_griffon
039.Bull_terrier
040.Bulldog
041.Bullmastiff
042.Cairn_terrier
043.Canaan_dog
044.Cane_corso
045.Cardigan_welsh_corgi
046.Cavalier_king_charles_spaniel
047.Chesapeake_bay_retriever
048.Chihuahua
049.Chinese_c

### Step 3: Custom CNN model architecture

We shall try writing our own custom CNN model instead of directly trying some popular models like VGG, ResNet or DenseNet. While designing, we shall keep in mind, the basic principles which are common in all the CNN models for classification. These are as follows:

1. Gradually decrease the size of the activations maps/ outputs.
2. Gradually increase the number of filters for Convolutional layer as we go deeper.
3. Use ReLU non-linear activation (except in the last classfication layer). 
4. Also use droput after the FC layers for regularization.
5. Use Max-pooling for decreasing the size of the activation maps.
6. Use stride=1 as in the case with "most" of the classification models.

In [6]:
# define the CNN architecture
class CustomNet(nn.Module):
    
    def __init__(self):
        super(CustomNet, self).__init__()
        
        self.relu = nn.ReLU(inplace=True)
        self.pool = nn.MaxPool2d(2, 2)
        self.dropout = nn.Dropout(0.25)
        
        # use padding such that conv output has size as previous layer, [(Input−Kernel+2*padding)/Stride]+1
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(16)
        
        # (16, 224, 224) --> (16, 112, 112) (halved by max-pool)
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(32)
        
        # (32, 112, 112) -->  (16, 56, 56) (halved by max-pool) 
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        
        # (64, 56, 56) -- > (64, 28, 28)
        self.fc1 = nn.Linear(64*28*28, 512)
        self.fc2 = nn.Linear(512, 256)
        # no of classes `n_classes`: 133
        self.fc3 = nn.Linear(256, n_classes)
    
    # without batch normalisation
    def forward(self, x):
        # input image: (3, 224, 224)
        x = self.pool(self.relu(self.conv1(x)))
        # (16, 112, 112)
        x = self.pool(self.relu(self.conv2(x)))
        # (32, 56, 56)
        x = self.pool(self.relu(self.conv3(x)))
        # (64, 28, 28)
        # flatten image input
        x = x.view(-1, 64 * 28 * 28)
        x = self.dropout(x) # use droput for regularization
        x = self.relu(self.fc1(x))
        x = self.dropout(x) # use droput for regularization
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# create the model object
model_scratch = CustomNet()

if torch.cuda.is_available:
    model_scratch.cuda()

In [7]:
# print the model summary
summary(model_scratch, input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 16, 224, 224]             448
              ReLU-2         [-1, 16, 224, 224]               0
         MaxPool2d-3         [-1, 16, 112, 112]               0
            Conv2d-4         [-1, 32, 112, 112]           4,640
              ReLU-5         [-1, 32, 112, 112]               0
         MaxPool2d-6           [-1, 32, 56, 56]               0
            Conv2d-7           [-1, 64, 56, 56]          18,496
              ReLU-8           [-1, 64, 56, 56]               0
         MaxPool2d-9           [-1, 64, 28, 28]               0
          Dropout-10                [-1, 50176]               0
           Linear-11                  [-1, 512]      25,690,624
             ReLU-12                  [-1, 512]               0
          Dropout-13                  [-1, 512]               0
           Linear-14                  [

We start with a model is much lesser deeper than VGG-16 model as the number of layers having learnable parameters are 6, as compared to the VGG-16 model with 16 layers. A good pratice is to start with a smaller model as 
deeper models might be tough to train. We will gradually try to increase the layers in the model to see if there is anby performance imporovement. 

#### Defining loss function and optimiser

In [8]:
# because it is a multi-class classfication task, we use multi-class cross-entropy class
criterion_scratch = nn.CrossEntropyLoss()
# Adam optimiser generally leads to quicker training, and requires lesser tuning as compared to SGD
optimizer_scratch = optim.Adam(model_scratch.parameters(), lr=0.001)

#### Define the training loop

In [9]:
from tqdm import tqdm
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        model.train()
        # run the model in training mode
        for batch_idx, (data, target) in tqdm(enumerate(loaders['train'])):
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            #print(batch_idx)
            optimizer.zero_grad()
            outputs = model(data)
            _, preds = torch.max(outputs, 1)
            # find the loss and update the model parameters accordingly
            loss = criterion(outputs, target)
            loss.backward()
            optimizer.step()
            
            # record the average training loss
            train_loss += ((1 / (batch_idx + 1)) * (loss.data - train_loss))
        
        # run the model in evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['val']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            # update the average validation loss
            with torch.no_grad():
                outputs = model(data)
                loss = criterion(outputs, target)
                valid_loss += ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
                
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss,valid_loss))
        
        # save the model only if validation loss has decreased (save the best model)
        if valid_loss <= valid_loss_min:
            print(f'Validation loss decreased ({valid_loss_min:.3f} ---> {valid_loss:.3f}).  Saving model ...')
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss

    # return trained model
    return model

#### Train the custom/scratch model

In [10]:
# train the model for 15 epochs
model_scratch = train(15, loaders, model_scratch, optimizer_scratch, criterion_scratch, torch.cuda.is_available(), 'custom_model.pt')

#### Test the custom/scratch model

In [10]:
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('custom_model.pt'))

<All keys matched successfully>

In [10]:
def test(loaders, model, criterion, use_cuda):
    test_loss = 0.0
    correct = 0.0
    total = 0.0

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        
        output = model(data)
        loss = criterion(output, target)
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        pred = output.data.max(1, keepdim=True)[1]
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %f%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

In [11]:
test(loaders, model_scratch, criterion_scratch, torch.cuda.is_available())

Test Loss: 3.837285


Test Accuracy: 10.807292% (83/768)


**Conclusion: Our custom model with even very less layers, and hardly trained for few epochs due to computational constraints could achieve near VGG-16 results as mentioned in the reference paper. Although, 10% doesn't seem to be too good of an accuracy, but considering 133 class classificatio  problem, a random classifier would have lesser than 1% accuracy, and 10% now seems to be a good number.**

### Step 4: VGG-16 model with transfer learning

We will import pre-built standard CNN classfication models, and will try to compare the accuracy. If time and computation resources allow, we will try training these standard models without pre-training.

In [10]:
data_dir = 'dogImages/'

data_transforms_train =  transforms.Compose([
        transforms.RandomRotation(30),
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

data_transforms_valid = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

data_transforms_test = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])

# create a dictionary of three Imagefolder objects
train_data =  datasets.ImageFolder(os.path.join(data_dir, 'train'), data_transforms_train)
test_data =  datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms_test)
val_data =  datasets.ImageFolder(os.path.join(data_dir, 'valid'), data_transforms_valid)

# create dataloaders, there is no point in shuffling the testing and validation data
loaders = {}
loaders['train'] = DataLoader(train_data, batch_size=64, shuffle=True, drop_last=True)
loaders['test'] = DataLoader(test_data, batch_size=64, shuffle=False, drop_last=True)
loaders['val'] = DataLoader(val_data, batch_size=64, shuffle=False, drop_last=True)


print("Initializing Datasets and Dataloaders...")

Initializing Datasets and Dataloaders...


In [11]:
class Net(nn.Module):
    def __init__(self, original_model):
        super(Net, self).__init__()
        self.pretrained = nn.Sequential(*list(original_model.children())[:-1])

        self.finetuned = nn.Sequential(nn.Linear(512*7*7, 512),
                           nn.ReLU(),
                           nn.Dropout(0.2),
                           nn.Linear(512, n_classes))
#         nn.Sequential(
#             nn.Linear(512 * 7 * 7, 4096),
#             nn.ReLU(inplace=True),
#             nn.Dropout(p=0.5),
#             nn.Linear(4096, 4096),
#             nn.ReLU(inplace=True),
#             nn.Dropout(p=0.5),
#             nn.Linear(4096, 120),
#             nn.Softmax())

    def forward(self, x):
        x = self.pretrained(x)
        x = x.view(x.size(0), -1)
        x = self.finetuned(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

pretrained = vgg16(pretrained=True)
for param in pretrained.parameters():
    param.requires_grad = False

vgg16_pretrained = Net(pretrained)

if torch.cuda.is_available:
    vgg16_pretrained.cuda()

In [12]:
# print the model summary
from torchsummary import summary
summary(vgg16_pretrained, input_size=(3, 224, 224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256,

We noticed that this is much deeper model than our custom model, and it is pretrained on the large ImageNet dataset. We expect that this model already understand the general features of an image. And now we will try to transfer the general knowledge for our dog breed classification task. 

In [13]:
criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.Adam(vgg16_pretrained.parameters(), lr=0.001)

### Fine-tune the pre-trained VGG16 model

In [14]:
from tqdm import tqdm
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        model.train()
        # run the model in training mode
        for batch_idx, (data, target) in tqdm(enumerate(loaders['train'])):
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            #print(batch_idx)
            optimizer.zero_grad()
            outputs = model(data)
            _, preds = torch.max(outputs, 1)
            # find the loss and update the model parameters accordingly
            loss = criterion(outputs, target)
            loss.backward()
            optimizer.step()
            
            # record the average training loss
            train_loss += ((1 / (batch_idx + 1)) * (loss.data - train_loss))
        
        # run the model in evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['val']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            # update the average validation loss
            with torch.no_grad():
                outputs = model(data)
                loss = criterion(outputs, target)
                valid_loss += ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
                
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss,valid_loss))
        
        # save the model only if validation loss has decreased (save the best model)
        if valid_loss <= valid_loss_min:
            print(f'Validation loss decreased ({valid_loss_min:.3f} ---> {valid_loss:.3f}).  Saving model ...')
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss

    # return trained model
    return model

vgg16_pretrained = train(15, loaders, vgg16_pretrained, optimizer_transfer, criterion_transfer, torch.cuda.is_available, 'vgg16_pretrained.pt')

104it [01:39,  1.03it/s]


Epoch: 1 	Training Loss: 3.199297 	Validation Loss: 1.372656
Validation loss decreased (inf ---> 1.373).  Saving model ...


104it [01:28,  1.00s/it]


Epoch: 2 	Training Loss: 1.895132 	Validation Loss: 1.004533
Validation loss decreased (1.373 ---> 1.005).  Saving model ...


104it [01:30,  1.17s/it]


Epoch: 3 	Training Loss: 1.634808 	Validation Loss: 0.813026
Validation loss decreased (1.005 ---> 0.813).  Saving model ...


104it [01:26,  1.19it/s]


Epoch: 4 	Training Loss: 1.471400 	Validation Loss: 0.785570
Validation loss decreased (0.813 ---> 0.786).  Saving model ...


104it [01:26,  1.14it/s]


Epoch: 5 	Training Loss: 1.436238 	Validation Loss: 0.778134
Validation loss decreased (0.786 ---> 0.778).  Saving model ...


104it [01:28,  1.14it/s]


Epoch: 6 	Training Loss: 1.405385 	Validation Loss: 0.776769
Validation loss decreased (0.778 ---> 0.777).  Saving model ...


104it [01:30,  1.60s/it]
0it [00:00, ?it/s]

Epoch: 7 	Training Loss: 1.365881 	Validation Loss: 0.843558


104it [01:29,  1.21it/s]
0it [00:00, ?it/s]

Epoch: 8 	Training Loss: 1.390453 	Validation Loss: 0.800375


104it [01:31,  1.32it/s]


Epoch: 9 	Training Loss: 1.376861 	Validation Loss: 0.767955
Validation loss decreased (0.777 ---> 0.768).  Saving model ...


104it [01:32,  1.17it/s]


Epoch: 10 	Training Loss: 1.273174 	Validation Loss: 0.746902
Validation loss decreased (0.768 ---> 0.747).  Saving model ...


104it [01:29,  1.12it/s]
0it [00:00, ?it/s]

Epoch: 11 	Training Loss: 1.262901 	Validation Loss: 0.798560


104it [01:28,  1.04it/s]
0it [00:00, ?it/s]

Epoch: 12 	Training Loss: 1.312789 	Validation Loss: 0.783381


104it [01:26,  1.14it/s]
0it [00:00, ?it/s]

Epoch: 13 	Training Loss: 1.272275 	Validation Loss: 0.803909


104it [01:26,  1.22it/s]
0it [00:00, ?it/s]

Epoch: 14 	Training Loss: 1.317230 	Validation Loss: 0.808449


104it [01:32,  1.07it/s]


Epoch: 15 	Training Loss: 1.293354 	Validation Loss: 0.832512


In [15]:
# load the model that got the best validation accuracy (uncomment the line below)
vgg16_pretrained.load_state_dict(torch.load('vgg16_pretrained.pt'))

def test(loaders, model, criterion, use_cuda):
    test_loss = 0.0
    correct = 0.0
    total = 0.0

    model.eval()
    for batch_idx, (data, target) in enumerate(loaders['test']):
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        
        output = model(data)
        loss = criterion(output, target)
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data - test_loss))
        pred = output.data.max(1, keepdim=True)[1]
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %f%% (%2d/%2d)' % (
        100. * correct / total, correct, total))
    
test(loaders, vgg16_pretrained, criterion_transfer, torch.cuda.is_available())

Test Loss: 0.832465


Test Accuracy: 78.365385% (652/832)


### Fine-tune Resnet-101 to compare with pre-trained VGG-16's performance

In [18]:
resnet101_pretrained = resnet101(pretrained=True)

for param in resnet101_pretrained.parameters():
    param.requires_grad = False
    
num_ftrs = resnet101_pretrained.fc.in_features

classifier = nn.Sequential(nn.Linear(num_ftrs, 512),
                           nn.ReLU(),
                           nn.Dropout(0.2),
                           nn.Linear(512, n_classes))
resnet101_pretrained.fc = classifier

if torch.cuda.is_available():
    resnet101_pretrained = resnet101_pretrained.cuda()
    
print(summary(resnet101_pretrained, input_size=(3, 224, 224)))

criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.Adam(resnet101_pretrained.fc.parameters(), lr=0.001)

resnet101_pretrained = train(15, loaders, resnet101_pretrained, optimizer_transfer, criterion_transfer, torch.cuda.is_available, 'resenet101_pretrained.pt')

0it [00:00, ?it/s]

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]           4,096
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]          16,384
      BatchNorm2d-12          [-1, 256, 56, 56]             512
           Conv2d-13          [-1, 256, 56, 56]          16,384
      BatchNorm2d-14          [-1, 256,

104it [01:47,  1.05it/s]


Epoch: 1 	Training Loss: 3.274376 	Validation Loss: 1.223914
Validation loss decreased (inf ---> 1.224).  Saving model ...


104it [01:32,  1.21it/s]


Epoch: 2 	Training Loss: 1.466539 	Validation Loss: 0.703403
Validation loss decreased (1.224 ---> 0.703).  Saving model ...


104it [01:28,  1.19it/s]


Epoch: 3 	Training Loss: 1.192024 	Validation Loss: 0.547178
Validation loss decreased (0.703 ---> 0.547).  Saving model ...


104it [01:33,  1.07s/it]


Epoch: 4 	Training Loss: 1.080988 	Validation Loss: 0.478114
Validation loss decreased (0.547 ---> 0.478).  Saving model ...


104it [01:31,  1.00it/s]


Epoch: 5 	Training Loss: 1.020919 	Validation Loss: 0.477737
Validation loss decreased (0.478 ---> 0.478).  Saving model ...


104it [01:30,  1.07it/s]


Epoch: 6 	Training Loss: 1.004512 	Validation Loss: 0.437551
Validation loss decreased (0.478 ---> 0.438).  Saving model ...


104it [01:32,  1.15it/s]


Epoch: 7 	Training Loss: 0.958949 	Validation Loss: 0.413958
Validation loss decreased (0.438 ---> 0.414).  Saving model ...


104it [01:28,  1.07it/s]


Epoch: 8 	Training Loss: 0.942190 	Validation Loss: 0.361269
Validation loss decreased (0.414 ---> 0.361).  Saving model ...


104it [01:27,  1.15it/s]
0it [00:00, ?it/s]

Epoch: 9 	Training Loss: 0.888914 	Validation Loss: 0.380626


104it [01:42,  1.07s/it]
0it [00:00, ?it/s]

Epoch: 10 	Training Loss: 0.902732 	Validation Loss: 0.394522


104it [01:30,  1.04it/s]
0it [00:00, ?it/s]

Epoch: 11 	Training Loss: 0.856674 	Validation Loss: 0.393285


104it [01:27,  1.02it/s]
0it [00:00, ?it/s]

Epoch: 12 	Training Loss: 0.870481 	Validation Loss: 0.392891


104it [01:30,  1.00it/s]
0it [00:00, ?it/s]

Epoch: 13 	Training Loss: 0.849959 	Validation Loss: 0.370500


104it [01:29,  1.24it/s]


Epoch: 14 	Training Loss: 0.845716 	Validation Loss: 0.353925
Validation loss decreased (0.361 ---> 0.354).  Saving model ...


104it [01:36,  1.14it/s]


Epoch: 15 	Training Loss: 0.827539 	Validation Loss: 0.353638
Validation loss decreased (0.354 ---> 0.354).  Saving model ...


In [20]:
resnet101_pretrained.load_state_dict(torch.load('resenet101_pretrained.pt'))
test(loaders, resnet101_pretrained, criterion_transfer, torch.cuda.is_available)

Test Loss: 0.435837


Test Accuracy: 86.538462% (720/832)


### Train VGG-16 from scratch

In [21]:
no_pretrained = vgg16(pretrained=False)
for param in no_pretrained.parameters():
    param.requires_grad = False

vgg16_scratch = Net(no_pretrained)

if torch.cuda.is_available:
    vgg16_scratch.cuda()
    
criterion_transfer = nn.CrossEntropyLoss()
optimizer_transfer = optim.Adam(vgg16_scratch.parameters(), lr=0.001)

from tqdm import tqdm
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        model.train()
        # run the model in training mode
        for batch_idx, (data, target) in tqdm(enumerate(loaders['train'])):
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            #print(batch_idx)
            optimizer.zero_grad()
            outputs = model(data)
            _, preds = torch.max(outputs, 1)
            # find the loss and update the model parameters accordingly
            loss = criterion(outputs, target)
            loss.backward()
            optimizer.step()
            
            # record the average training loss
            train_loss += ((1 / (batch_idx + 1)) * (loss.data - train_loss))
        
        # run the model in evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['val']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            # update the average validation loss
            with torch.no_grad():
                outputs = model(data)
                loss = criterion(outputs, target)
                valid_loss += ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
                
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(epoch, train_loss,valid_loss))
        
        # save the model only if validation loss has decreased (save the best model)
        if valid_loss <= valid_loss_min:
            print(f'Validation loss decreased ({valid_loss_min:.3f} ---> {valid_loss:.3f}).  Saving model ...')
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss

    # return trained model
    return model

vgg16_scratch = train(15, loaders, vgg16_scratch, optimizer_transfer, criterion_transfer, torch.cuda.is_available, 'vgg16_scratch.pt')


0it [00:00, ?it/s][A
1it [00:01,  1.42s/it][A
2it [00:02,  1.42s/it][A
3it [00:03,  1.34s/it][A
4it [00:04,  1.17s/it][A
5it [00:05,  1.18s/it][A
6it [00:06,  1.07s/it][A
7it [00:07,  1.03s/it][A
8it [00:08,  1.08it/s][A
9it [00:09,  1.09it/s][A
10it [00:10,  1.07it/s][A
11it [00:11,  1.11it/s][A
12it [00:11,  1.18it/s][A
13it [00:12,  1.16it/s][A
14it [00:13,  1.21it/s][A
15it [00:14,  1.22it/s][A
16it [00:15,  1.20it/s][A
17it [00:15,  1.21it/s][A
18it [00:16,  1.28it/s][A
19it [00:17,  1.31it/s][A
20it [00:18,  1.14it/s][A
21it [00:19,  1.13it/s][A
22it [00:20,  1.21it/s][A
23it [00:20,  1.28it/s][A
24it [00:21,  1.23it/s][A
25it [00:22,  1.31it/s][A
26it [00:23,  1.08it/s][A
27it [00:25,  1.09s/it][A
28it [00:26,  1.06s/it][A
29it [00:26,  1.03it/s][A
30it [00:27,  1.03it/s][A
31it [00:29,  1.06s/it][A
32it [00:29,  1.01it/s][A
33it [00:31,  1.06s/it][A
34it [00:31,  1.02it/s][A
35it [00:33,  1.07s/it][A
36it [00:33,  1.04it/s][A
37it [00:34,  

Epoch: 1 	Training Loss: 4.912992 	Validation Loss: 4.871336
Validation loss decreased (inf ---> 4.871).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.19it/s][A
2it [00:01,  1.25it/s][A
3it [00:02,  1.30it/s][A
4it [00:03,  1.27it/s][A
5it [00:03,  1.31it/s][A
6it [00:04,  1.31it/s][A
7it [00:05,  1.30it/s][A
8it [00:06,  1.25it/s][A
9it [00:07,  1.20it/s][A
10it [00:07,  1.29it/s][A
11it [00:08,  1.30it/s][A
12it [00:09,  1.14it/s][A
13it [00:10,  1.16it/s][A
14it [00:11,  1.13it/s][A
15it [00:12,  1.08it/s][A
16it [00:13,  1.13it/s][A
17it [00:13,  1.17it/s][A
18it [00:14,  1.21it/s][A
19it [00:16,  1.07s/it][A
20it [00:17,  1.03it/s][A
21it [00:17,  1.08it/s][A
22it [00:19,  1.04it/s][A
23it [00:19,  1.08it/s][A
24it [00:20,  1.21it/s][A
25it [00:21,  1.24it/s][A
26it [00:22,  1.20it/s][A
27it [00:22,  1.26it/s][A
28it [00:23,  1.32it/s][A
29it [00:24,  1.11it/s][A
30it [00:25,  1.13it/s][A
31it [00:26,  1.18it/s][A
32it [00:27,  1.16it/s][A
33it [00:27,  1.27it/s][A
34it [00:28,  1.23it/s][A
35it [00:29,  1.22it/s][A
36it [00:30,  1.11it/s][A
37it [00:31,  

Epoch: 2 	Training Loss: 4.858111 	Validation Loss: 4.824403
Validation loss decreased (4.871 ---> 4.824).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.11it/s][A
2it [00:01,  1.24it/s][A
3it [00:02,  1.26it/s][A
4it [00:03,  1.22it/s][A
5it [00:04,  1.08it/s][A
6it [00:04,  1.20it/s][A
7it [00:05,  1.23it/s][A
8it [00:06,  1.28it/s][A
9it [00:07,  1.28it/s][A
10it [00:08,  1.19it/s][A
11it [00:09,  1.16it/s][A
12it [00:09,  1.15it/s][A
13it [00:10,  1.29it/s][A
14it [00:11,  1.36it/s][A
15it [00:11,  1.41it/s][A
16it [00:12,  1.44it/s][A
17it [00:13,  1.41it/s][A
18it [00:13,  1.37it/s][A
19it [00:14,  1.35it/s][A
20it [00:15,  1.34it/s][A
21it [00:16,  1.36it/s][A
22it [00:16,  1.50it/s][A
23it [00:17,  1.53it/s][A
24it [00:18,  1.48it/s][A
25it [00:18,  1.48it/s][A
26it [00:19,  1.35it/s][A
27it [00:20,  1.43it/s][A
28it [00:21,  1.30it/s][A
29it [00:22,  1.06s/it][A
30it [00:24,  1.15s/it][A
31it [00:25,  1.25s/it][A
32it [00:26,  1.17s/it][A
33it [00:27,  1.06s/it][A
34it [00:28,  1.03it/s][A
35it [00:29,  1.02it/s][A
36it [00:29,  1.12it/s][A
37it [00:30,  

Epoch: 3 	Training Loss: 4.809144 	Validation Loss: 4.750602
Validation loss decreased (4.824 ---> 4.751).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.20it/s][A
2it [00:01,  1.10it/s][A
3it [00:03,  1.03s/it][A
4it [00:04,  1.03it/s][A
5it [00:04,  1.07it/s][A
6it [00:05,  1.09it/s][A
7it [00:07,  1.03s/it][A
8it [00:07,  1.10it/s][A
9it [00:08,  1.18it/s][A
10it [00:09,  1.17it/s][A
11it [00:10,  1.14it/s][A
12it [00:10,  1.19it/s][A
13it [00:11,  1.23it/s][A
14it [00:12,  1.19it/s][A
15it [00:13,  1.19it/s][A
16it [00:14,  1.23it/s][A
17it [00:14,  1.32it/s][A
18it [00:16,  1.06it/s][A
19it [00:17,  1.02it/s][A
20it [00:17,  1.15it/s][A
21it [00:18,  1.27it/s][A
22it [00:19,  1.21it/s][A
23it [00:20,  1.24it/s][A
24it [00:20,  1.30it/s][A
25it [00:21,  1.33it/s][A
26it [00:22,  1.44it/s][A
27it [00:22,  1.49it/s][A
28it [00:23,  1.36it/s][A
29it [00:24,  1.32it/s][A
30it [00:25,  1.37it/s][A
31it [00:25,  1.32it/s][A
32it [00:26,  1.33it/s][A
33it [00:27,  1.35it/s][A
34it [00:28,  1.36it/s][A
35it [00:29,  1.24it/s][A
36it [00:29,  1.40it/s][A
37it [00:30,  

Epoch: 4 	Training Loss: 4.776853 	Validation Loss: 4.708124
Validation loss decreased (4.751 ---> 4.708).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.53it/s][A
2it [00:01,  1.41it/s][A
3it [00:02,  1.44it/s][A
4it [00:03,  1.27it/s][A
5it [00:03,  1.31it/s][A
6it [00:04,  1.39it/s][A
7it [00:05,  1.36it/s][A
8it [00:05,  1.36it/s][A
9it [00:06,  1.40it/s][A
10it [00:07,  1.41it/s][A
11it [00:08,  1.31it/s][A
12it [00:09,  1.22it/s][A
13it [00:09,  1.33it/s][A
14it [00:10,  1.36it/s][A
15it [00:11,  1.22it/s][A
16it [00:12,  1.10it/s][A
17it [00:13,  1.16it/s][A
18it [00:14,  1.22it/s][A
19it [00:14,  1.30it/s][A
20it [00:15,  1.36it/s][A
21it [00:16,  1.25it/s][A
22it [00:16,  1.37it/s][A
23it [00:17,  1.34it/s][A
24it [00:18,  1.22it/s][A
25it [00:19,  1.34it/s][A
26it [00:20,  1.34it/s][A
27it [00:20,  1.41it/s][A
28it [00:21,  1.37it/s][A
29it [00:22,  1.23it/s][A
30it [00:23,  1.19it/s][A
31it [00:24,  1.13it/s][A
32it [00:25,  1.11it/s][A
33it [00:26,  1.07it/s][A
34it [00:27,  1.02it/s][A
35it [00:34,  2.76s/it][A
36it [00:35,  2.22s/it][A
37it [00:36,  

Epoch: 5 	Training Loss: 4.758449 	Validation Loss: 4.684705
Validation loss decreased (4.708 ---> 4.685).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.65it/s][A
2it [00:01,  1.67it/s][A
3it [00:01,  1.78it/s][A
4it [00:02,  1.56it/s][A
5it [00:03,  1.53it/s][A
6it [00:03,  1.46it/s][A
7it [00:04,  1.43it/s][A
8it [00:05,  1.42it/s][A
9it [00:06,  1.37it/s][A
10it [00:06,  1.41it/s][A
11it [00:07,  1.42it/s][A
12it [00:08,  1.37it/s][A
13it [00:09,  1.39it/s][A
14it [00:09,  1.32it/s][A
15it [00:10,  1.30it/s][A
16it [00:11,  1.38it/s][A
17it [00:12,  1.22it/s][A
18it [00:13,  1.20it/s][A
19it [00:14,  1.10it/s][A
20it [00:14,  1.19it/s][A
21it [00:16,  1.08it/s][A
22it [00:17,  1.07it/s][A
23it [00:18,  1.05s/it][A
24it [00:19,  1.02s/it][A
25it [00:20,  1.06it/s][A
26it [00:21,  1.00it/s][A
27it [00:22,  1.02it/s][A
28it [00:22,  1.12it/s][A
29it [00:23,  1.18it/s][A
30it [00:24,  1.17it/s][A
31it [00:25,  1.06it/s][A
32it [00:26,  1.04it/s][A
33it [00:27,  1.12it/s][A
34it [00:28,  1.03it/s][A
35it [00:29,  1.01it/s][A
36it [00:30,  1.08it/s][A
37it [00:31,  

Epoch: 6 	Training Loss: 4.737562 	Validation Loss: 4.639932
Validation loss decreased (4.685 ---> 4.640).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.14it/s][A
2it [00:01,  1.11it/s][A
3it [00:02,  1.13it/s][A
4it [00:03,  1.14it/s][A
5it [00:04,  1.22it/s][A
6it [00:04,  1.28it/s][A
7it [00:06,  1.09it/s][A
8it [00:06,  1.17it/s][A
9it [00:07,  1.12it/s][A
10it [00:08,  1.15it/s][A
11it [00:09,  1.15it/s][A
12it [00:10,  1.10it/s][A
13it [00:11,  1.12it/s][A
14it [00:12,  1.16it/s][A
15it [00:13,  1.13it/s][A
16it [00:14,  1.06it/s][A
17it [00:15,  1.06s/it][A
18it [00:16,  1.00it/s][A
19it [00:17,  1.02s/it][A
20it [00:18,  1.10it/s][A
21it [00:18,  1.10it/s][A
22it [00:19,  1.16it/s][A
23it [00:20,  1.08it/s][A
24it [00:22,  1.01s/it][A
25it [00:22,  1.05it/s][A
26it [00:23,  1.18it/s][A
27it [00:24,  1.25it/s][A
28it [00:25,  1.21it/s][A
29it [00:25,  1.21it/s][A
30it [00:26,  1.14it/s][A
31it [00:27,  1.26it/s][A
32it [00:28,  1.20it/s][A
33it [00:29,  1.12it/s][A
34it [00:30,  1.11it/s][A
35it [00:31,  1.16it/s][A
36it [00:32,  1.14s/it][A
37it [00:33,  

Epoch: 7 	Training Loss: 4.723183 	Validation Loss: 4.635653
Validation loss decreased (4.640 ---> 4.636).  Saving model ...



0it [00:00, ?it/s][A
1it [00:01,  1.11s/it][A
2it [00:01,  1.03s/it][A
3it [00:02,  1.02s/it][A
4it [00:03,  1.09it/s][A
5it [00:04,  1.03it/s][A
6it [00:05,  1.13it/s][A
7it [00:06,  1.06it/s][A
8it [00:07,  1.14it/s][A
9it [00:07,  1.20it/s][A
10it [00:08,  1.18it/s][A
11it [00:09,  1.27it/s][A
12it [00:10,  1.25it/s][A
13it [00:11,  1.21it/s][A
14it [00:11,  1.34it/s][A
15it [00:12,  1.29it/s][A
16it [00:13,  1.30it/s][A
17it [00:14,  1.26it/s][A
18it [00:14,  1.31it/s][A
19it [00:15,  1.36it/s][A
20it [00:16,  1.22it/s][A
21it [00:17,  1.30it/s][A
22it [00:17,  1.31it/s][A
23it [00:18,  1.26it/s][A
24it [00:19,  1.19it/s][A
25it [00:20,  1.19it/s][A
26it [00:21,  1.26it/s][A
27it [00:21,  1.34it/s][A
28it [00:22,  1.36it/s][A
29it [00:23,  1.36it/s][A
30it [00:24,  1.37it/s][A
31it [00:24,  1.51it/s][A
32it [00:25,  1.40it/s][A
33it [00:26,  1.15it/s][A
34it [00:27,  1.31it/s][A
35it [00:28,  1.02it/s][A
36it [00:29,  1.15it/s][A
37it [00:29,  

Epoch: 8 	Training Loss: 4.709727 	Validation Loss: 4.635141
Validation loss decreased (4.636 ---> 4.635).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.16it/s][A
2it [00:01,  1.24it/s][A
3it [00:02,  1.28it/s][A
4it [00:03,  1.22it/s][A
5it [00:03,  1.28it/s][A
6it [00:04,  1.35it/s][A
7it [00:05,  1.39it/s][A
8it [00:06,  1.24it/s][A
9it [00:06,  1.28it/s][A
10it [00:08,  1.11it/s][A
11it [00:09,  1.05s/it][A
12it [00:10,  1.02it/s][A
13it [00:11,  1.06it/s][A
14it [00:11,  1.11it/s][A
15it [00:12,  1.09it/s][A
16it [00:13,  1.07it/s][A
17it [00:15,  1.01it/s][A
18it [00:15,  1.10it/s][A
19it [00:16,  1.08it/s][A
20it [00:17,  1.10it/s][A
21it [00:18,  1.13it/s][A
22it [00:19,  1.14it/s][A
23it [00:20,  1.12it/s][A
24it [00:21,  1.03s/it][A
25it [00:22,  1.07it/s][A
26it [00:23,  1.01it/s][A
27it [00:24,  1.01s/it][A
28it [00:25,  1.04it/s][A
29it [00:26,  1.04it/s][A
30it [00:27,  1.02s/it][A
31it [00:28,  1.07it/s][A
32it [00:28,  1.14it/s][A
33it [00:29,  1.09it/s][A
34it [00:31,  1.04s/it][A
35it [00:32,  1.03it/s][A
36it [00:32,  1.07it/s][A
37it [00:33,  

Epoch: 9 	Training Loss: 4.706005 	Validation Loss: 4.582660
Validation loss decreased (4.635 ---> 4.583).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.28it/s][A
2it [00:01,  1.39it/s][A
3it [00:01,  1.45it/s][A
4it [00:02,  1.56it/s][A
5it [00:03,  1.58it/s][A
6it [00:04,  1.39it/s][A
7it [00:04,  1.49it/s][A
8it [00:05,  1.40it/s][A
9it [00:06,  1.41it/s][A
10it [00:06,  1.46it/s][A
11it [00:07,  1.46it/s][A
12it [00:08,  1.43it/s][A
13it [00:08,  1.41it/s][A
14it [00:10,  1.19it/s][A
15it [00:10,  1.32it/s][A
16it [00:11,  1.36it/s][A
17it [00:12,  1.37it/s][A
18it [00:12,  1.38it/s][A
19it [00:13,  1.33it/s][A
20it [00:14,  1.35it/s][A
21it [00:14,  1.34it/s][A
22it [00:15,  1.39it/s][A
23it [00:16,  1.42it/s][A
24it [00:17,  1.33it/s][A
25it [00:18,  1.27it/s][A
26it [00:18,  1.29it/s][A
27it [00:19,  1.33it/s][A
28it [00:20,  1.38it/s][A
29it [00:20,  1.41it/s][A
30it [00:21,  1.44it/s][A
31it [00:22,  1.39it/s][A
32it [00:23,  1.37it/s][A
33it [00:23,  1.25it/s][A
34it [00:24,  1.27it/s][A
35it [00:25,  1.23it/s][A
36it [00:26,  1.27it/s][A
37it [00:27,  

Epoch: 10 	Training Loss: 4.692046 	Validation Loss: 4.575372
Validation loss decreased (4.583 ---> 4.575).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.11it/s][A
2it [00:02,  1.02s/it][A
3it [00:02,  1.08it/s][A
4it [00:03,  1.10it/s][A
5it [00:04,  1.11it/s][A
6it [00:05,  1.20it/s][A
7it [00:06,  1.15it/s][A
8it [00:07,  1.11it/s][A
9it [00:08,  1.11it/s][A
10it [00:09,  1.06it/s][A
11it [00:10,  1.09it/s][A
12it [00:10,  1.19it/s][A
13it [00:11,  1.22it/s][A
14it [00:12,  1.23it/s][A
15it [00:13,  1.26it/s][A
16it [00:13,  1.21it/s][A
17it [00:14,  1.24it/s][A
18it [00:15,  1.31it/s][A
19it [00:16,  1.33it/s][A
20it [00:16,  1.30it/s][A
21it [00:17,  1.27it/s][A
22it [00:18,  1.19it/s][A
23it [00:19,  1.19it/s][A
24it [00:20,  1.11it/s][A
25it [00:21,  1.12it/s][A
26it [00:22,  1.02it/s][A
27it [00:23,  1.13it/s][A
28it [00:24,  1.14it/s][A
29it [00:24,  1.18it/s][A
30it [00:25,  1.17it/s][A
31it [00:26,  1.17it/s][A
32it [00:27,  1.27it/s][A
33it [00:28,  1.22it/s][A
34it [00:28,  1.23it/s][A
35it [00:29,  1.30it/s][A
36it [00:30,  1.36it/s][A
37it [00:30,  

Epoch: 11 	Training Loss: 4.672331 	Validation Loss: 4.553545
Validation loss decreased (4.575 ---> 4.554).  Saving model ...



0it [00:00, ?it/s][A
1it [00:01,  1.11s/it][A
2it [00:01,  1.00s/it][A
3it [00:02,  1.03s/it][A
4it [00:03,  1.04it/s][A
5it [00:04,  1.06it/s][A
6it [00:05,  1.20it/s][A
7it [00:05,  1.31it/s][A
8it [00:06,  1.29it/s][A
9it [00:07,  1.37it/s][A
10it [00:07,  1.42it/s][A
11it [00:09,  1.10it/s][A
12it [00:10,  1.10it/s][A
13it [00:11,  1.12it/s][A
14it [00:12,  1.10it/s][A
15it [00:13,  1.02it/s][A
16it [00:14,  1.05it/s][A
17it [00:14,  1.18it/s][A
18it [00:15,  1.20it/s][A
19it [00:16,  1.19it/s][A
20it [00:17,  1.20it/s][A
21it [00:17,  1.23it/s][A
22it [00:19,  1.07it/s][A
23it [00:20,  1.06it/s][A
24it [00:21,  1.02it/s][A
25it [00:22,  1.07s/it][A
26it [00:23,  1.06s/it][A
27it [00:24,  1.02s/it][A
28it [00:25,  1.07it/s][A
29it [00:26,  1.08it/s][A
30it [00:26,  1.09it/s][A
31it [00:27,  1.04it/s][A
32it [00:29,  1.04s/it][A
33it [00:30,  1.10s/it][A
34it [00:31,  1.17s/it][A
35it [00:32,  1.06s/it][A
36it [00:33,  1.03s/it][A
37it [00:34,  

Epoch: 12 	Training Loss: 4.662022 	Validation Loss: 4.561559



1it [00:00,  1.03it/s][A
2it [00:01,  1.18it/s][A
3it [00:02,  1.20it/s][A
4it [00:02,  1.32it/s][A
5it [00:03,  1.35it/s][A
6it [00:04,  1.19it/s][A
7it [00:05,  1.27it/s][A
8it [00:06,  1.22it/s][A
9it [00:06,  1.28it/s][A
10it [00:07,  1.49it/s][A
11it [00:08,  1.30it/s][A
12it [00:09,  1.16it/s][A
13it [00:10,  1.26it/s][A
14it [00:10,  1.28it/s][A
15it [00:11,  1.36it/s][A
16it [00:12,  1.41it/s][A
17it [00:13,  1.24it/s][A
18it [00:13,  1.34it/s][A
19it [00:14,  1.44it/s][A
20it [00:15,  1.27it/s][A
21it [00:16,  1.27it/s][A
22it [00:16,  1.33it/s][A
23it [00:17,  1.26it/s][A
24it [00:18,  1.37it/s][A
25it [00:18,  1.40it/s][A
26it [00:19,  1.36it/s][A
27it [00:20,  1.43it/s][A
28it [00:20,  1.47it/s][A
29it [00:21,  1.33it/s][A
30it [00:22,  1.32it/s][A
31it [00:23,  1.26it/s][A
32it [00:24,  1.28it/s][A
33it [00:24,  1.35it/s][A
34it [00:25,  1.39it/s][A
35it [00:26,  1.38it/s][A
36it [00:27,  1.33it/s][A
37it [00:27,  1.29it/s][A
38it [00:

Epoch: 13 	Training Loss: 4.650563 	Validation Loss: 4.566178



1it [00:01,  1.20s/it][A
2it [00:02,  1.15s/it][A
3it [00:03,  1.07s/it][A
4it [00:03,  1.03it/s][A
5it [00:04,  1.09it/s][A
6it [00:05,  1.11it/s][A
7it [00:06,  1.08it/s][A
8it [00:07,  1.02it/s][A
9it [00:08,  1.04it/s][A
10it [00:09,  1.11it/s][A
11it [00:09,  1.20it/s][A
12it [00:10,  1.24it/s][A
13it [00:11,  1.21it/s][A
14it [00:12,  1.25it/s][A
15it [00:12,  1.32it/s][A
16it [00:13,  1.27it/s][A
17it [00:14,  1.18it/s][A
18it [00:15,  1.24it/s][A
19it [00:16,  1.23it/s][A
20it [00:16,  1.33it/s][A
21it [00:18,  1.09it/s][A
22it [00:19,  1.09it/s][A
23it [00:19,  1.18it/s][A
24it [00:20,  1.16it/s][A
25it [00:21,  1.10it/s][A
26it [00:22,  1.18it/s][A
27it [00:23,  1.13it/s][A
28it [00:24,  1.18it/s][A
29it [00:24,  1.26it/s][A
30it [00:25,  1.15it/s][A
31it [00:26,  1.11it/s][A
32it [00:27,  1.12it/s][A
33it [00:28,  1.20it/s][A
34it [00:29,  1.03s/it][A
35it [00:30,  1.03it/s][A
36it [00:31,  1.11it/s][A
37it [00:32,  1.24it/s][A
38it [00:

Epoch: 14 	Training Loss: 4.648955 	Validation Loss: 4.503046
Validation loss decreased (4.554 ---> 4.503).  Saving model ...



0it [00:00, ?it/s][A
1it [00:00,  1.54it/s][A
2it [00:01,  1.56it/s][A
3it [00:02,  1.45it/s][A
4it [00:03,  1.18it/s][A
5it [00:04,  1.10it/s][A
6it [00:05,  1.01it/s][A
7it [00:07,  1.18s/it][A
8it [00:08,  1.17s/it][A
9it [00:08,  1.02s/it][A
10it [00:09,  1.02it/s][A
11it [00:10,  1.11it/s][A
12it [00:11,  1.11it/s][A
13it [00:12,  1.02it/s][A
14it [00:13,  1.06it/s][A
15it [00:14,  1.09it/s][A
16it [00:15,  1.16it/s][A
17it [00:15,  1.16it/s][A
18it [00:16,  1.09it/s][A
19it [00:17,  1.19it/s][A
20it [00:18,  1.14it/s][A
21it [00:19,  1.16it/s][A
22it [00:20,  1.17it/s][A
23it [00:20,  1.22it/s][A
24it [00:21,  1.28it/s][A
25it [00:22,  1.24it/s][A
26it [00:23,  1.09it/s][A
27it [00:25,  1.04s/it][A
28it [00:25,  1.03it/s][A
29it [00:26,  1.11it/s][A
30it [00:27,  1.04it/s][A
31it [00:28,  1.05s/it][A
32it [00:29,  1.03it/s][A
33it [00:30,  1.06it/s][A
34it [00:31,  1.03it/s][A
35it [00:32,  1.08it/s][A
36it [00:33,  1.15it/s][A
37it [00:34,  

Epoch: 15 	Training Loss: 4.640911 	Validation Loss: 4.535429


In [23]:
vgg16_scratch.load_state_dict(torch.load('vgg16_scratch.pt'))
test(loaders, vgg16_scratch, criterion_transfer, torch.cuda.is_available)

Test Loss: 4.488107


Test Accuracy: 3.846154% (32/832)


As mentioned eaelier, when there is not too much of data, it can be tough to train a deep model from scratch. Even our custom model which was much smaller than  the VGG-16 model. Whereas, when we took the pre-trained models, even 101 deep layer model could easily be trained. 