# Introduction
Image classification is the process of taking an input (like a picture) and outputting a class (like “cat”) or a probability that the input is a particular class (“there’s a 90% probability that this input is a cat”). You can look at a picture and know that you’re looking at a terrible shot of your own face, but how can a computer learn to do that? With a convolutional neural network!

-----
# Goals
We would like you to establish a neural network involving advance DNN modules (i.e. convolution layers, RELU, pooling and fully connection layers and etc.)  to distinguish the specific category of an input image.

-------------
## Packages
Let's first import the necessary packages,

In [1]:
from __future__ import division

import warnings
from collections import namedtuple
import torch
import torch.nn as nn
from torch.jit.annotations import Optional, Tuple
from torch import Tensor
import os
import numpy as np
import os.path
from glob import glob
from PIL import Image
from tqdm import tqdm
import torchvision.datasets as dset
import torch.utils.data as data
from ipywidgets import IntProgress

ModuleNotFoundError: No module named 'torch'

-----
## GPU Device Configuration
Then, we set up and configure our computational devices: 
Whether we use GPU or perform the calculation on CPU.
we use the torch.devices() and torch.cude.is_available() functions to configure our computational devices

In [None]:
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = 'cpu'
device

-----
## Configuration
### hyper parameters
We then set up and hyper parameters that need for the our model.
we need to define several hyper parameters for our model:
1. learning rate
2. batch size when training
3. batch size when testing
4. numbper of epoches
5. out put directory

In [None]:
learning_rate = 0.01
batch_train = 100
batch_test = 100
epochs = 100
directory = "./output"

Create a directory if not exists
using os.path.exists() to check whether it is exist
using os.makedires to create a directory.

In [None]:
if not os.path.exists(directory):
    os.makedirs(directory)

-----
##  Data Loading
Next, we are going to load our data. 
### We need to prepare our data:

### We first import necessary librarys for data loading

In [None]:
import torchvision
import torchvision.transforms as transforms

-----
###  Image processing
Then, we define a image preprocessing object that our dataloader can directly use this object to preprocess our data
We use the pytorch API to preform the data processing.
1. Use transforms.Compose()
2. Use .RandomHorizontalFlip()
3. You add any extra transforms you like.
4. Create this transform for both training set and testting set. Note that the testing spilit do not require any transform

In [None]:
train_transform = transforms.Compose([transforms.RandomHorizontalFlip(), transforms.ToTensor()])
test_transform = transforms.Compose([transforms.ToTensor()])

-----
### We then download and prepare the data with the transforms defined above:
1. Use command torchvision.datasets.CIFAR10() with root, train, download and transform posional arguments.
2. Use the same command to create both train split and test split.
3. Use torch.utils.data.DataLoader() to create the data loader based on the data we have.
3. Use this command for both training split data loader and test split data loader

In [None]:
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
train_loader = torch.utils.data.DataLoader(dataset=train_set, batch_size=trainBatchSize, shuffle=True)
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transform)
test_loader = torch.utils.data.DataLoader(dataset=test_set, batch_size=testBatchSize, shuffle=False)

-----
##  Network
Next, we are going to design our GoogLeNet
### First, we define our GoogLeNet class
### You need to refer the paper below to understand the structure.
### https://arxiv.org/abs/1409.4842



------
### Inception Module with dimension reductions (There exist many implement methods)
1. Create a python class called Inception which inherits nn.module

2. Create a init function to init this python class
    1. Require in_planes, kernel_1_x, kernel_3_in, kernel_3_x, kernel_5_in, kernel_5_x and pool_planes 7 arguments.
    
    2. Consists of 4 variables b1,b2,b3,b4
    
    3. b1 is a block consists of 2D convolution, a 2D batch normalization layer and a ReLU activation function
    
    4. b2 is a block consists of two 2D convolutions, two 2D batch normalization layers and tow ReLU activation functions
    
    5. b3 is a block consists of two 2D convolutions, two 2D batch normalization layers and two ReLU activation functions
    
    6. b4 is a block consists of a Maxpooling layer, a 2D convolution, a 2D batch normalization layer and a ReLU activation function
    
3. Create the forward function

    1. this forward function will forward the input function though every block and return the concatenation of all the output.

In [None]:
class Inception(nn.Module):
    def __init__(self, in_planes, kernel_1_x, kernel_3_in, kernel_3_x, kernel_5_in, kernel_5_x, pool_planes):
        super(Inception, self).__init__()
        
        # e.g: The first inception block (192, 64, 96, 128, 16 ,32, 32)
        
        # Original: 28*28*192
        
        # 1x1 conv branch (28*28*192 -> 28*28*64)
        self.b1 = nn.Sequential(
            nn.Conv2d(in_planes, kernel_1_x, kernel_size=1),
            nn.BatchNorm2d(kernel_1_x),
            nn.ReLU(True),
        )

        # 1x1 conv -> 3x3 conv branch (28*28*192 -> 28*28*96 -> 28*28*128)
        self.b2 = nn.Sequential(
            nn.Conv2d(in_planes, kernel_3_in, kernel_size=1),
            nn.BatchNorm2d(kernel_3_in),
            nn.ReLU(True),
            
            nn.Conv2d(kernel_3_in, kernel_3_x, kernel_size=3, padding = 1), # add padding=1 to maintain the size of 28.
            nn.BatchNorm2d(kernel_3_x),
            nn.ReLU(True)
        )
        
        # 1x1 conv -> 5x5 conv branch (28*28*192 -> 28*28*16 -> 28*28*32)
        self.b3 = nn.Sequential(
            nn.Conv2d(in_planes, kernel_5_in, kernel_size=1),
            nn.BatchNorm2d(kernel_5_in),
            nn.ReLU(True),
            
            nn.Conv2d(kernel_5_in, kernel_5_x, kernel_size=5, padding = 2), # add padding=2 to maintain the size of 28.
            nn.BatchNorm2d(kernel_5_x),
            nn.ReLU(True)
        )
         

        # 3x3 pool -> 1x1 conv branch (28*28*192 -> 28*28*192 -> 28*28*32)
        self.b4 = nn.Sequential(
            nn.MaxPool2d(3, stride=1, padding=1), # again, add padding=1 to maintain the size of 28. 
            nn.Conv2d(in_planes, pool_planes, kernel_size=1),
            nn.BatchNorm2d(pool_planes),
            nn.ReLU(True)
        )
       

    def forward(self, x):
        b1_result = self.b1(x) #28*28*64
        b2_result = self.b2(x) #28*28*128
        b3_result = self.b3(x) #28*28*32
        b4_result = self.b4(x) #28*28*32
        channel_concat = torch.cat([b1_result, b2_result, b3_result, b4_result], 1) #28*28*256
        return channel_concat


-----
### GoogLeNet Module (There exist many implement methods)


1. Create a python class called GoogLeNet which inherits nn.module

2. Create a init function to init this python class

    1. Consists of a variables that serves as all layers before the inception, which contains a 2D convolution with padding=1, kernel_size=3 output channel=192, a 2D batch normalization layer and a ReLU activation fucntion.
    
    2. Two Inception blocks
    
    3. Maxpooling layer
    
    4. Five Inception blocks
    
    5. Maxpooling layer
    
    6. Two Inception blocks  
    
    7. Average Pooling layer
    
    8. A fully connected layer.
    
3. Create the forward function

    1. this forward function will forward the input function though every block and return the output

In [None]:
class GoogLeNet(nn.Module):

    def __init__(self):
        
        super(GoogLeNet, self).__init__()
        
        # A. Before inception
        self.initial = nn.Sequential(
            nn.Conv2d(3, 192, padding = 1, kernel_size = 3),
            nn.BatchNorm2d(192),
            nn.ReLU(True)
        )
        
        # B. 2 inception blocks
        self.inception1 = Inception(192,  64,  96, 128, 16, 32, 32)
        self.inception2 = Inception(256, 128, 128, 192, 32, 96, 64)

        # C. Maxpooling layer 1
        self.max_pooling1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # D. 5 inception blocks (3-7)
        self.inception3 = Inception(480, 192,  96, 208, 16,  48,  64)
        self.inception4 = Inception(512, 160, 112, 224, 24,  64,  64)
        self.inception5 = Inception(512, 128, 128, 256, 24,  64,  64)
        self.inception6 = Inception(512, 112, 144, 288, 32,  64,  64)
        self.inception7 = Inception(528, 256, 160, 320, 32, 128, 128)
        
        # E. Maxpooling layer 2
        self.max_pooling2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # F. 2 inception blocks (8-9)
        self.inception8 = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception9 = Inception(832, 384, 192, 384, 48, 128, 128)

        # G. Average pooling layer
        self.average_pooling = nn.AvgPool2d(kernel_size=8, stride=1)

        # H. A fully connected layer 384+384+128+128=1024 & 10 different classes
        self.fully_connected = nn.Linear(1024,10)

    def forward(self,x):
        x = self.initial(x)
        x = self.inception1(x)
        x = self.inception2(x)
        x = self.max_pooling1(x)
        x = self.inception3(x)
        x = self.inception4(x)
        x = self.inception5(x)
        x = self.inception6(x)
        x = self.inception7(x)
        x = self.max_pooling2(x)
        x = self.inception8(x)
        x = self.inception9(x)
        x = self.average_pooling(x)
        #flatten it
        x = x.reshape(x.size(0), -1)
        x = self.fully_connected(x)

        return x

### Next, we create the network and send it to the target device

In [None]:
GoogLeNet = GoogLeNet()
GoogLeNet.to(device)

### Finally, We create:
 1. an optimizer  (we use adam optimzer here)
 2. A Criterion (CrossEntropy) function
 3. A Scheduler which is used to decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones.

In [None]:
# 1. optimizer
optimizer = torch.optim.Adam(GoogLeNet.parameters(), lr = learning_rate)

# 2. criterion
criterion = torch.nn.CrossEntropyLoss()

# 3. scheduler 
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)

-----
##  Training
Then, we are going to train our Network

1. Set our network to the training model.
2. Init the train loss, total data and number corrected predictions. 
3. For each data in the training split
    1. Put the data to the correct devices using .to()
    2. Reset the gradient of the optimzier.
    3. Feed the data forward to the google net
    4. Use the criterion function to compute the loss term
    5. Backprop the loss
    6. Update the network parameters using the optimzier
    7. Accumulate the training loss
    8. Find the prediciton. hint: using torch.max()
    9. Increment the data size
    10. Increment the corrected prediction
    11. Print log
    
-----
##  Testing
Then, we are going to test our module

1. Set our network to the test model.
2. Init the test loss, total data and number corrected predictions. 
3. For each data in the training split, we warp it using torch.no_grad()
    1. Put the data to the correct devices using .to()
    2. Feed the data forward to the google net
    3. Use the criterion function to compute the loss term
    4. Accumulate the training loss
    5. Find the prediciton. hint: using torch.max()
    6. Increment the data size
    7. Increment the corrected prediction
    8. Print log

-----
##  Epochs:
For each epoch:
1. we first step our scheduler
2. we train our module
3. we test our module
4. we update the testing accuracy
5. we save the module at the end and print the accuracy

In [None]:
def batch_train_and_test(train_loader, test_loader, device, optimizer):
    
    # Train-1. Set the network to the training model
    GoogLeNet.train()
    
    # Train-2. Initialize the train loss, total data and number correct predictions.
    train_loss = 0
    train_total_data = 0
    train_correct_predictions = 0
    
    # Train-3. For each data in the training split  
    for (data, value) in train_loader: # data: original image data. value: the correct label.
        
        # 3-1. Put the data to the correct devices using .to()
        data = data.to(device)
        value = value.to(device)
        
        # 3-2. Reset the gradient of the optimizer
        optimizer.zero_grad()
        
        # 3-3. Feed the data forward to the google net
        forward_result = GoogLeNet(data)
        
        # 3-4. Use the criterion function to compute the loss term
        loss = criterion(forward_result, value)
        
        # 3-5. Backprop the loss
        loss.backward()
        
        # 3-6. Update the network parameters using the optimizer
        optimizer.step()
        
        # 3-7. Accumulate the training loss
#         train_loss += loss.item()
        train_loss += loss.data.item()

        # 3-8. Find the prediction. hint: using torch.max()
        max_value, prediction = torch.max(forward_result, dim = 1)
        
        # 3-9. Increment the data size
        train_total_data += train_total_data + value.size(0)
        
        # 3-10. Increment the correct_predictions
        split_correct_predictions = 0
        for i in range(value.size(0)):
            if prediction[i] == value[i]:
                split_correct_predictions += 1
        train_correct_predictions += split_correct_predictions
        
    # Train-4. Compute the train accuracy.
    train_accuracy = train_correct_predictions / train_total_data
    
    #########################################################################################################
    
    # Test-1: Set the network to the test model.
    GoogLeNet.eval()
    
    # Test-2: Initialize the test loss, total data and number corrected predictions. 
    test_loss = 0
    test_total_data = 0
    test_correct_predictions = 0
    
    # Test-3. For each data in the training split, we warp it using torch.no_grad()
    with torch.no_grad():
        for (data,value) in test_loader:
            
            # 3-1. Put the data to the correct devices using .to()
            data = data.to(device)
            value = value.to(device)
            
            # 3-2. Feed the data forward to the google net
            forward_result = GoogLeNet(data)
            
            # 3-3. Use the criterion function to compute the loss term
            loss = criterion(forward_result, value)
            
            # 3-4. Accumulate the test loss
            test_loss += loss.data.item()
            
            # 3-5. Find the prediction
            max_value, prediction = torch.max(forward_result, dim = 1)
            
            # 3-6. Increment the data size
            test_total_data += test_total_data + value.size(0)
            
            # 3-7. Increment the correct_prediction
            split_correct_predictions = 0
            for i in range(value.size(0)):
                if prediction[i] == value[i]:
                    split_correct_predictions += 1
            test_correct_predictions += split_correct_predictions
            
    # Test-4. Compute the train accuracy.
    test_accuracy = test_correct_predictions / test_total_data

            
    return train_accuracy, train_loss, test_accuracy, test_loss
        

In [None]:
def batch_train_and_test(train_loader, test_loader, device, optimizer):
    
    # Train-1. Set the network to the training model
    GoogLeNet.train()
    
    # Train-2. Initialize the train loss, total data and number of correct predictions.
    train_loss = 0
    train_total_data = 0
    train_correct_predictions = 0
    
    # Train-3. For each data in the training split  
    for (data, value) in train_loader:
        
        # 3-1. Put the data to the correct devices using .to()
        data = data.to(device)
        value = value.to(device)
        
        # 3-2. Reset the gradient of the optimizer
        optimizer.zero_grad()
        
        # 3-3. Feed the data forward to the google net
        forward_result = GoogLeNet(data)
        
        # 3-4. Use the criterion function to compute the loss term
        loss = criterion(forward_result,value)
    
        # 3-5. Backprop the loss
        loss.backward()
        
        # 3-6. Update the network parameters using the optimizer
        optimizer.step()
        
        # 3-7. Accumulate the training loss
        train_loss += loss.data.item()
        
        # 3-8. Find the prediction. hint: using torch.max()
        max_elements, prediction = torch.max(forward_result, dim = 1)
        
        # 3-9. Increment the data size
        train_total_data += value.size(0)
        
        # 3-10. Increment the correct_predictions
        batch_correct_predictions = 0
        for i in range(value.size(0)):
            if prediction[i] == value[i]:
                batch_correct_predictions += 1
        
        train_correct_predictions += batch_correct_predictions
    
    # Train-4. Compute the train accuracy.
    train_accuracy = train_correct_predictions / train_total_data
        
    #########################################################################################################
    
    # Test-1: Set the network to the test model.
    GoogLeNet.eval()
    
    # Test-2: Initialize the test loss, total data and number corrected predictions. 
    test_loss = 0
    test_total_data = 0
    test_correct_predictions = 0
    
    # Test-3. For each data in the training split, we warp it using torch.no_grad()
    with torch.no_grad():
        for (data,value) in test_loader:
          
            # 3-1. Put the data to the correct devices using .to()
            data = data.to(device)
            value = value.to(device)
            
            # 3-2. Feed the data forward to the google net
            forward_result = GoogLeNet(data)
            
            # 3-3. Use the criterion function to compute the loss term
            loss = criterion(forward_result, value)
            
            # 3-4. Accumulate the test loss
            test_loss += loss.item()
            
            # 3-5. Find the prediction
            max_elements, prediction = torch.max(forward_result, dim = 1)
            
            # 3-6. Increment the data size
            test_total_data += value.size(0)
            
            # 3-7. Increment the correct_prediction
            batch_correct_predictions = 0
            for i in range(value.size(0)):
                if prediction[i] == value[i]:
                    batch_correct_predictions += 1
                    
            test_correct_predictions += batch_correct_predictions

        
    # Test-4. Compute the train accuracy.
    test_accuracy = test_correct_predictions / test_total_data
    
    return train_accuracy, train_loss, test_accuracy, test_loss

In [None]:
best_test = 0
best_epoch = -1

epochs_count = []
LossTrain, LossTest = [], []

#Actual training
for i in range(epoches):
    print("Epoch: ", i)
    epochs_count.append(i)
    
    scheduler.step()
    training_acc, testing_acc, train_loss, test_loss = training(train_set, train_loader, device, optimizer, test_loader)
    
    training_acc = training_acc * 100
    testing_acc = testing_acc * 100
    
    if testing_acc > best_test:
        best_epoch = i
        best_test = testing_acc
    
    LossTrain.append(train_loss)
    LossTest.append(test_loss)
    
    print("Trainig accuracy is: ", ("%.4g" % training_acc), "%, Testing accuracy: ", ("%.4g" % testing_acc), "%.")
    print("Training loss is: ", train_loss, " Testing loss is: ", test_loss)
    
torch.save(goNetwork, "model")
print("_____________________________________________________________________________")
print("Best Epoch was: ", best_epoch, " at ", best_test, "% accuracy") 

In [None]:
epochs_list = []
train_loss_list = []
test_loss_list = []

for i in range(1, epochs + 1):
    
    # 1. Step the scheduler
    scheduler.step()
    
    # 2. Run the batch_training_and_testing function
    train_accuracy, train_loss, test_accuracy, test_loss = batch_train_and_test(train_loader, test_loader, device, optimizer)
    
    # 3. Append the values to the corresponding lists
    train_loss_list.append(train_loss)
    test_loss_list.append(test_loss)
    epochs_list.append(epoch)
    
    # 4. Print the train loss, test loss, train accuracy, and test accuracy:
    print("epoch",i,":")
    print("train accuracy: {:.5f} | test accuracy: {:.5f}".format(train_accuracy, test_accuracy))
    print("train loss: {:.5f} | test loss: {:.5f}".format(train_loss, test_loss))

# 5. Save the model
torch.save(GoogLeNet, "GoogLeNet")


In [None]:
# Plot the train loss and test loss over epoch
import matplotlib.pyplot as plt

plt.title('Train Loss vs Test Loss')
plt.plot(epochs_list, train_loss_list, label='Train')
plt.plot(epochs_list, test_loss_list, label='Test')
plt.legend()
plt.show()