# Introduction

In this project, you will build a neural network of your own design to evaluate the CIFAR-10 dataset.

To meet the requirements for this project, you will need to achieve an accuracy greater than 45%. 
If you want to beat Detectocorp's algorithm, you'll need to achieve an accuracy greater than 70%. 
(Beating Detectocorp's algorithm is not a requirement for passing this project, but you're encouraged to try!)

Some of the benchmark results on CIFAR-10 include:

78.9% Accuracy | [Deep Belief Networks; Krizhevsky, 2010](https://www.cs.toronto.edu/~kriz/conv-cifar10-aug2010.pdf)

90.6% Accuracy | [Maxout Networks; Goodfellow et al., 2013](https://arxiv.org/pdf/1302.4389.pdf)

96.0% Accuracy | [Wide Residual Networks; Zagoruyko et al., 2016](https://arxiv.org/pdf/1605.07146.pdf)

99.0% Accuracy | [GPipe; Huang et al., 2018](https://arxiv.org/pdf/1811.06965.pdf)

98.5% Accuracy | [Rethinking Recurrent Neural Networks and other Improvements for ImageClassification; Nguyen et al., 2020](https://arxiv.org/pdf/2007.15161.pdf)

Research with this dataset is ongoing. Notably, many of these networks are quite large and quite expensive to train. 

## Imports

In [8]:
## This cell contains the essential imports you will need – DO NOT CHANGE THE CONTENTS! ##
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

## Load the Dataset

Specify your transforms as a list first.
The transforms module is already loaded as `transforms`.

CIFAR-10 is fortunately included in the torchvision module.
Then, you can create your dataset using the `CIFAR10` object from `torchvision.datasets` ([the documentation is available here](https://pytorch.org/docs/stable/torchvision/datasets.html#cifar)).
Make sure to specify `download=True`! 

Once your dataset is created, you'll also need to define a `DataLoader` from the `torch.utils.data` module for both the train and the test set.

In [9]:
# Define transforms

# For the Normalize method, we have 3 channels, ( red, green, blue ). First tuple represents the mean for the 3 channels, 
# and the second tuple represents the standard deviation for the 3 channels. 

transform = transforms.Compose(
    [transforms.RandomVerticalFlip(0.5), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        transform=transform, download=True)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=5,
                                         num_workers=3, shuffle=True)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       transform=transform, download=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=5,
                                         num_workers=3, shuffle=False)


# The 10 classes in the dataset
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


## Explore the Dataset
Using matplotlib, numpy, and torch, explore the dimensions of your data.

You can view images using the `show5` function defined below – it takes a data loader as an argument.
Remember that normalized images will look really weird to you! You may want to try changing your transforms to view images.
Typically using no transforms other than `toTensor()` works well for viewing – but not as well for training your network.
If `show5` doesn't work, go back and check your code for creating your data loaders and your training/test sets.

In [10]:
def show5(img_loader):
    dataiter = iter(img_loader)
    
    batch = next(dataiter)
    labels = batch[1][0:5]
    images = batch[0][0:5]
    for i in range(5):
        print(classes[labels[i]])
    
        image = images[i].numpy()
        plt.imshow(np.rot90(image.T, k=3))
        plt.show()

## Build your Neural Network
Using the layers in `torch.nn` (which has been imported as `nn`) and the `torch.nn.functional` module (imported as `F`), construct a neural network based on the parameters of the dataset. 
Feel free to construct a model of any architecture – feedforward, convolutional, or even something more advanced!

In [11]:
## YOUR CODE HERE ##

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 24, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(24, 34, 3)
        self.fc1 = nn.Linear(34 * 6 * 6, 100)
        self.fc2 = nn.Linear(100, 80)
        self.fc3 = nn.Linear(80, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 34 * 6 * 6)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.dropout(x, training=self.training)
        x = self.fc3(x)
        return x


# Instantiate the network
net = Net()
net.cuda()   # Enable GPU operations

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Specify a loss function and an optimizer, and instantiate the model.

If you use a less common loss function, please note why you chose that loss function in a comment.

In [12]:
## YOUR CODE HERE ##

## Running your Neural Network
Use whatever method you like to train your neural network, and ensure you record the average loss at each epoch. 
Don't forget to use `torch.device()` and the `.to()` method for both your model and your data if you are using GPU!

If you want to print your loss during each epoch, you can use the `enumerate` function and print the loss after a set number of batches. 250 batches works well for most people!

In [14]:
## YOUR CODE HERE ##

## Reference - https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
## This reference - https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html has been very useful for me
## Pytorch Tutorial on GitHub https://github.com/pytorch/tutorials/blob/master/beginner_source/blitz/cifar10_tutorial.py 
## that I used to complete this assignment 

from torch.autograd import Variable
for epoch in range(5):  # loop over the dataset multiple times

    for data in trainloader: 
        inputs, labels = data
        inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    correctly_classified = 0
    total_count = 0
    for data in testloader:
        images, labels = data
        outputs = net(Variable(images.cuda())).cpu()
        _, predicted = torch.max(outputs.data, 1)
        total_count += labels.size(0)
        correctly_classified += (predicted == labels).sum()
        
    value_accuracy = 100 * correctly_classified / total_count

    print('accuracy: %d %%' % value_accuracy)

print('Done')

accuracy: 48 %
accuracy: 51 %
accuracy: 55 %
accuracy: 55 %
accuracy: 57 %
Done


Plot the training loss (and validation loss/accuracy, if recorded).

## Testing your model
Using the previously created `DataLoader` for the test set, compute the percentage of correct predictions using the highest probability prediction. 

If your accuracy is over 70%, great work! 
This is a hard task to exceed 70% on.

If your accuracy is under 45%, you'll need to make improvements.
Go back and check your model architecture, loss function, and optimizer to make sure they're appropriate for an image classification task.

## Saving your model
Using `torch.save`, save your model for future loading.

In [15]:
## YOUR CODE HERE ##

PATH = './final_cifar.pth'
torch.save(net.state_dict(), PATH)

In [16]:
net = Net()
net.load_state_dict(torch.load(PATH))

## Make a Recommendation

Based on your evaluation, what is your recommendation on whether to build or buy? Explain your reasoning below.

Some things to consider as you formulate your recommendation:
* How does your model compare to Detectocorp's model?
* How does it compare to the far more advanced solutions in the literature? 
* What did you do to get the accuracy you achieved? 
* Is it necessary to improve this accuracy? If so, what sort of work would be involved in improving it?

Reference - https://cs231n.github.io/convolutional-networks/

**Double click this cell to modify it**

- My model doesn't have 70% accuracy, so it is not beating Detectocorp's algorithm. 
- My model only has a 57% after training on the specified number of epoch, it is not as good as those models mentioned in the literature above, but I think it is doing its job, considering that it is above the 45% accuracy required for this project. If I have the necessary resources such as experience and computational resources, my model would probably do much better. 
- I have two convolutional layers with a pooling layer and 3 fully-connected layers. The input of my first convolutional layers is 3 since I have 3 RGB layers. 24 is the output and 3 as the filter size. Filter size of my pooling layer would be 2 by 2. My second convolutional layer would have the input of 24, output of 34 and filter size of 3. And then I have the fully connected layer of 34 x 6 x 6. How I get the values for the fully connected layer is the following. Since I am working with 32 by 32 images, 

Convolutional layer formula is as follow - ( ( height - filter-size + 2 ( padding ) ) / s ) + 1


First convolutional layer I have filter size 3, stride = 1, and padding = 0. 
output is then 32 - 3 / 1 + 1 = 30 -> so we have 30 x 30 x 24 

Pooling output => 30 / 2 = 15 , so we have 15 x 15 x 24

Second convolutional layer output = ( 15 - 3 ) / 1 + 1 = 13 x 13 x 34

Pooling output => 13 / 2 = 6 => 34 x 6 x 6

Fully connected layer 2 will be linear from 100 to 80. 

And then last layer will also be linear from 80 to 10 since we have 10 classes. 

And then for the forward part, I activate the first convolutional layer with relu function, and then enclose in a pooling layer. 

Similarly for the second convolutional layer. 

Then, I convert the 3 channels into one channel with the view function. 

I then use the relu layer on the fully connected layer, repeat one more time on the second fully connected layer. 

And then I apply the dropout function before apply the third fully connected layer. 


- Although my model pass the bare minimum accuracy required for this project, I do recommend improving the accuracy by training over more epochs. Another thing that could be tried is applying an additional dropout layer to make the model more robust and less prone to overfitting. 






## Submit Your Project

When you are finished editing the notebook and are ready to turn it in, simply click the **SUBMIT PROJECT** button in the lower right.

Once you submit your project, we'll review your work and give you feedback if there's anything that you need to work on. If you'd like to see the exact points that your reviewer will check for when looking at your work, you can have a look over the project [rubric](https://review.udacity.com/#!/rubrics/3077/view).