## Cats and Dogs Classification

This notebook will guide you to train a simple neural network to perform classification of image whether it is a dog or a cat. We will use Pytorch as a framework to build and train our neural network. The training will make use of two Nvidia Tesla V100 GPUs available on TACC server to accelerate the training of our neural network. 

### Library Install
This section will install all of the libraries needed for this assignment. Please make sure that all of libraries are installed correctly.

In [None]:
!conda install -y pytorch torchvision cuda100 -c pytorch
!pip install googledrivedownloader requests
!pip install split-folders tqdm
!pip install matplotlib
!pip install tensorboardX tensorboard
!pip install cxxfilt

### Library Import
Load all of the libraries needed for this assignment.

In [1]:
import os
import shutil
import split_folders
import tqdm
from google_drive_downloader import GoogleDriveDownloader as gdd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import datetime
import copy
import math
from PIL import Image
from tensorboardX import SummaryWriter

### Dataset Download
We will use dataset from Kaggle. It contains 12,500 images of cats and 12,500 images of dogs. 

In [None]:
# Download and Unzipping the Training Dataset
gdd.download_file_from_google_drive(file_id='1TgS3BLPIoc3FHUBrvp6rXaz6g1UJz_2E',
                                    dest_path='./raw.zip',
                                    showsize=False,
                                    overwrite=True,
                                    unzip=True)

In [None]:
# Download and Unzipping the Testing Dataset
gdd.download_file_from_google_drive(file_id='1JRMQY-gXp43ag65nP7HMNFEKhkJTxykw',
                                    dest_path='./test.zip',
                                    showsize=False,
                                    overwrite=True,
                                    unzip=True)

### Dataset Preparation
We will preprocess the dataset here.

In [None]:
# Create folder
os.makedirs("./raw/cats"   ,exist_ok=True)
os.makedirs("./raw/dogs"   ,exist_ok=True)
os.makedirs("./train"      ,exist_ok=True)
os.makedirs("./train/cats" ,exist_ok=True)
os.makedirs("./train/dogs" ,exist_ok=True)
os.makedirs("./val"        ,exist_ok=True)
os.makedirs("./val/cats"   ,exist_ok=True)
os.makedirs("./val/dogs"   ,exist_ok=True)
os.makedirs("./log"        ,exist_ok=True)
os.makedirs("./checkpoint" ,exist_ok=True)

In [2]:
# Store the folder path in the variable
data_dir = './'
raw_dir   = f'{data_dir}/raw'
raw_dogs_dir = f'{raw_dir}/dogs'
raw_cats_dir = f'{raw_dir}/cats'
train_dir = f'{data_dir}/train'
train_dogs_dir = f'{train_dir}/dogs'
train_cats_dir = f'{train_dir}/cats'
val_dir = f'{data_dir}/val'
val_dogs_dir = f'{val_dir}/dogs'
val_cats_dir = f'{val_dir}/cats'
log_dir = f'{data_dir}/log'
chk_dir = f'{data_dir}/checkpoint'
test_dir = f'{data_dir}/test'

In [None]:
# Move the cats into cats folder and dogs into dogs folder
files = os.listdir(raw_dir)
for f in files:
    catImageSearch = re.search("cat", f)
    dogImageSearch = re.search("dog", f)
    if catImageSearch:
        shutil.move(f'{raw_dir}/{f}', raw_cats_dir)
    elif dogImageSearch:
        shutil.move(f'{raw_dir}/{f}', raw_dogs_dir)

In [None]:
# Splitting dataset for training and validation
percentage_for_training = 0.8
percentage_for_validation = 0.2
random_seed = 12345
split_folders.ratio(f'{raw_dir}', output="./", seed=random_seed, ratio=(percentage_for_training, percentage_for_validation))

### Dataset Augmentation and Normalization

In [3]:
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomRotation(5),
        transforms.RandomHorizontalFlip(),
        transforms.RandomResizedCrop(224, scale=(0.96, 1.0), ratio=(0.95, 1.05)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize([224,224]),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

In [4]:
# Configure the batch size and workers
batch_size = 1024
num_workers= 96

# Checkpoint file. We currently don't use the checkpoint file
check_point = f'{chk_dir}/checkpoint.tar'

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size,
                                              shuffle=True, num_workers=num_workers)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
print(class_names) # => ['cats', 'dogs']
print(f'Train image size: {dataset_sizes["train"]}')
print(f'Validation image size: {dataset_sizes["val"]}')

['cats', 'dogs']
Train image size: 20000
Validation image size: 5000


In [5]:
# Prepare the device that will be used. In this assignment, we will only use single GPU.
# If there is no GPU available, we will use CPU.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

### Training (Single Precision)

In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs, timestamp):
    since = time.time()
    writer = SummaryWriter('log/'+timestamp+'-single')
    
    # Initialization
    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = math.inf
    best_acc = 0.
    
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for i, (inputs, labels) in enumerate(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                
                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                
                
                if phase == 'train' :
                    writer.add_scalar('Train/Current_Running_Loss', loss.item(), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Current_Running_Corrects', torch.sum(preds == labels.data), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Accum_Running_Loss', running_loss, epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Accum_Running_Corrects', running_corrects, epoch*len(dataloaders[phase])+i)
                else :
                    writer.add_scalar('Validation/Current_Running_Loss', loss.item(), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Current_Running_Corrects', torch.sum(preds == labels.data), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Running_Loss', epoch_loss, epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Running_Corrects', epoch_acc, epoch*len(dataloaders[phase])+i)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))
            
            if phase == 'train' :
                writer.add_scalar('Train/Loss', epoch_loss, epoch)
                writer.add_scalar('Train/Accuracy', epoch_acc, epoch)
            else :
                writer.add_scalar('Validation/Loss', epoch_loss, epoch)
                writer.add_scalar('Validation/Accuracy', epoch_acc, epoch)
            
            # deep copy the model
            if phase == 'val' and epoch_loss < best_loss:
                print(f'New best model found!')
                print(f'New record loss: {epoch_loss}, previous record loss: {best_loss}')
                best_loss = epoch_loss
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()
        

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:.4f} Best val loss: {:.4f}'.format(best_acc, best_loss))

    # load best model weights
    model.load_state_dict(best_model_wts)
    writer.close()
    return model, best_loss, best_acc

In [None]:
# Download the Pretrained Resnet50 Model
model_conv = torchvision.models.resnet50(pretrained=True)

In [None]:
# Parameters of newly constructed modules have requires_grad=True by default
for param in model_conv.parameters():
    param.requires_grad = False

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

# Copy the pretrained model to GPU (if available) for further training
model_conv = model_conv.to(device)

# Choose the Criterion
criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

In [None]:
num_epochs = 1
today = datetime.datetime.today() 
timestamp = today.strftime('%Y%m%d-%H%M%S')

# Start the training
model_conv, best_val_loss, best_val_acc = train_model(model_conv,
                                                      criterion,
                                                      optimizer_conv,
                                                      exp_lr_scheduler,
                                                      num_epochs,
                                                      timestamp)

# Save the trained model for future use.
torch.save({'model_state_dict': model_conv.state_dict(),
            'optimizer_state_dict': optimizer_conv.state_dict(),
            'best_val_loss': best_val_loss,
            'best_val_accuracy': best_val_acc,
            'scheduler_state_dict' : exp_lr_scheduler.state_dict(),
            }, check_point)

### Inference (Single Precision)

In [None]:
# List of all test data directory
test_data_files = os.listdir(test_dir)

In [None]:
def apply_test_transforms(inp):
    out = transforms.functional.resize(inp, [224,224])
    out = transforms.functional.to_tensor(out)
    out = transforms.functional.normalize(out, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    return out

def predict_dog_prob_of_single_instance(model, tensor):
    batch = torch.stack([tensor])
    # Send the input to GPU (if any)
    batch = batch.to(device)
    softMax = nn.Softmax(dim = 1)
    preds = softMax(model(batch))
    return preds[0,1].item()

def test_data_from_fname(fname):
    im = Image.open(f'{test_dir}/{fname}')
    return apply_test_transforms(im)

In [None]:
model_conv.eval()

In [None]:
num_of_test_images = 4
if(num_of_test_images<2) :
    num_of_test_images = 2
image_inferenced   = 0
fig, ax = plt.subplots(num_of_test_images, figsize=(num_of_test_images*5, num_of_test_images*5))
fig.tight_layout(pad=5)

for fname in test_data_files :    
    im         = Image.open(f'{test_dir}/{fname}')
    imstar     = apply_test_transforms(im)    
    outputs = predict_dog_prob_of_single_instance(model_conv, imstar)
    ax[image_inferenced].imshow(im)
    ax[image_inferenced].axis('on')
    if(outputs<0.5) :
        ax[image_inferenced].set_title('predicted: cat \n probability: ' + str(1-outputs))
    else :
        ax[image_inferenced].set_title('predicted: dog \n probability: ' + str(outputs))
    image_inferenced += 1
    if(image_inferenced>=num_of_test_images) :
        break

### Training (Mixed Precision)

In [7]:
def train_model(model, criterion, optimizer, scheduler, num_epochs, timestamp):
    since = time.time()
    writer = SummaryWriter('log/'+timestamp+'-mixed')
    
    # Initialization
    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = math.inf
    best_acc = 0.
    
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for i, (inputs, labels) in enumerate(dataloaders[phase]):
                inputs = inputs.to(torch.float16).to(device)
                labels = labels.to(device)
                
                
                # zero the parameter gradients
                optimizer.zero_grad()
                
                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
                print(type(running_loss))
                print(type(running_corrects))
                
                if phase == 'train' :
                    writer.add_scalar('Train/Current_Running_Loss', loss.item(), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Current_Running_Corrects', torch.sum(preds == labels.data), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Accum_Running_Loss', running_loss, epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Train/Accum_Running_Corrects', running_corrects, epoch*len(dataloaders[phase])+i)
                else :
                    writer.add_scalar('Validation/Current_Running_Loss', loss.item(), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Current_Running_Corrects', torch.sum(preds == labels.data), epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Running_Loss', epoch_loss, epoch*len(dataloaders[phase])+i)
                    writer.add_scalar('Validation/Running_Corrects', epoch_acc, epoch*len(dataloaders[phase])+i)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))
            
            if phase == 'train' :
                writer.add_scalar('Train/Loss', epoch_loss, epoch)
                writer.add_scalar('Train/Accuracy', epoch_acc, epoch)
            else :
                writer.add_scalar('Validation/Loss', epoch_loss, epoch)
                writer.add_scalar('Validation/Accuracy', epoch_acc, epoch)
            
            # deep copy the model
            if phase == 'val' and epoch_loss < best_loss:
                print(f'New best model found!')
                print(f'New record loss: {epoch_loss}, previous record loss: {best_loss}')
                best_loss = epoch_loss
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()
        

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:.4f} Best val loss: {:.4f}'.format(best_acc, best_loss))

    # load best model weights
    model.load_state_dict(best_model_wts)
    writer.close()
    return model, best_loss, best_acc

In [8]:
# Download the Pretrained Resnet50 Model
model_conv = torchvision.models.resnet50(pretrained=True)

In [9]:
# Parameters of newly constructed modules have requires_grad=True by default
for param in model_conv.parameters():
    param.requires_grad = False

num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

# Copy the pretrained model to GPU (if available) for further training
model_conv = model_conv.to(torch.float16).to(device)

# Choose the Criterion
criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

In [10]:
# Start the training
num_epochs = 1
today = datetime.datetime.today() 
timestamp = today.strftime('%Y%m%d-%H%M%S')
model_conv, best_val_loss, best_val_acc = train_model(model_conv,
                                                      criterion,
                                                      optimizer_conv,
                                                      exp_lr_scheduler,
                                                      num_epochs,
                                                      timestamp)

# Save the trained model for future use.
torch.save({'model_state_dict': model_conv.state_dict(),
            'optimizer_state_dict': optimizer_conv.state_dict(),
            'best_val_loss': best_val_loss,
            'best_val_accuracy': best_val_acc,
            'scheduler_state_dict' : exp_lr_scheduler.state_dict(),
            }, check_point)

Epoch 0/0
----------




<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
train Loss: 0.4895 Acc: 0.7947
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Tensor'>
<class 'float'>
<class 'torch.Ten

### Inference (Mixed Precision)