# Convolutional Neural Network (CNN)
implement our first convolutional neural network (CNN) that can do image classification based on the famous CIFAR-10 dataset.

We will learn:
- Dataset: CIFAR-10 dataset available in PyTorch.  https://www.cs.toronto.edu/~kriz/cifar.html
    - dataset with 10 differnet classes like airplane, automobile, bird, cat, deer,	dog, frog, horses, ship, truck
- Architecture of CNNs
- Convolutional Filter
- Max Pooling
- Determine the correct layer size
- Implement the CNN architecture in PyTorch

- Similar to other neural nets they are made of neurons that have learnable weights and biases. The difference is that Conv Nets were mainly for image data and apply the convolutional filters. We have image --> conv layers --> activation fucntion --> pooling --> fully connected layer
- After applying convolution the resulting image will may have a smalller size because our filter does not fit in the corner. So we use a technique called padding. Getting correct size is import.
- Pooling reduce the size of the image so reduice the cost of computation. OS this reduces the number of parameters our model has to run and avoaid overfitting by giving abstract features,

<center><img src='./images/cnn_Cifar.PNG' width=550px></center> 

# Convolutional Neural Network Intro 
<center><img src='./images/cnn_example.PNG' width=800px></center> 

### Image Filter / Image Kernel 
- First, we do the convolution to the image to extract features. We create a filter/kernel (whcih uis basically a matrix) and apply convolution.
- In this kernel the values are designed by NN to extract features. NN will learn those and convolute with image.
- CNN is usually detect the edges (like sobel filter-right, left, top, bottom)
- In colour images we have 3 colour channels (RGB) with hights and widht of the image.
- After pooling we have fully connected layer.

<center><img src='./images/cnn_kernel.PNG' width=600px></center> 

Ref: Visualize the convolution: https://setosa.io/ev/image-kernels/

### Why to use CNN instead of ANN or FCNN?
- When the data becaomes very big, in ANN all neurons are fully connected. Its diffilcult to process data. However in CNN its not fully connected. it is locally connected.
- Once we extract feateures we do pooling tor reduce feature
- Once we do this process we than use fully connected layer after flattening.
- CNN is crunching the parameters down by doing convolution using filetering and further by pooling.

<center><img src='./images/cnn_local_connect.PNG' width=600px></center>  

### Pooling Layer
- Reduce the features further. It reduces the amount of data in an image by combining information from multiple vectors into fewer vectors
<center><img src='./images/pool_concept.jpg' width=600px></center>  

<center><img src='./images/pooling.PNG' width=600px></center>  

# CNN in PyTorch

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters 
num_epochs = 5
batch_size = 4
learning_rate = 0.001

# dataset has PILImage images of range [0, 1]. 
# We transform them to Tensors of normalized range [-1, 1]
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# CIFAR10: 60000 32x32 color images in 10 classes, with 6000 images per class
# PyToech dataset and PyTorch dataloader. This can help in batch optimization and batch training
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size,
                                          shuffle=True)

test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size,
                                         shuffle=False)

# hard coded the classes
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(train_loader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))

<center><img src='./images/cnn_Cifar_.PNG' width=600px></center> 
<center><img src='./images/convFilter.PNG' width=550px></center> 

In [None]:
# Show images
imshow(torchvision.utils.make_grid(images))

# First conv layer
conv1 = nn.Conv2d(3, 6, 5)
pool = nn.MaxPool2d(2,2)
# second conv layer
conv2 = nn.Conv2d(6, 16, 5)
print(images.shape)
# >> torch.Size([4, 3, 32, 32]) beacuse the battch size is 4, 3 color channel, image size of 32x32

# Apply first convolutional layer
x = conv1(images)
# print(x.shape)
# >> torch.Size([4, 6, 28, 28]) # 6 output channels. image size 28x28 because of convolution
x= pool(x)
# print(x.shape)
# >> torch.Size([4, 6, 14, 14]) # pooling reduce the images by factor of two with kernel of 2

x= conv2(x)
# print(x.shape)
# >> torch.Size([4, 16, 10, 10])  # 16 because w especified the 16 

x= pool(x)
print(x.shape)
# >> torch.Size([4, 16, 5, 5]) 

# Now flatten 3D tensor to 1D tensor. So in fully connected layer our size is (16 * 5 * 5)
# self.fc1 = nn.Linear(16 * 5 * 5, 120) 

In [None]:
# implement ConvNet
"""
class ConvNet(nn.Module):
    def __init__(self):
        pass

    def forward(self, x):
        pass
# Refer the architecture. 
(1) First we have convolutional layer followed by ReLu activation funciton --> than Pooling.
(2) We have second convolution layer + activation + pooling 
(3) we have three differnt fully connected layers
(4) At end we have softMax and crossEntropy. In PyTorch softMax is already included in crossEntropy Loss
"""

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # First conv layer + Pooling  (No activation)
        self.conv1 = nn.Conv2d(3, 6, 5) # input channel size, output channel size, kernel size
        self.pool = nn.MaxPool2d(2, 2) # pooling size and stride
        # Second conv layer
        self.conv2 = nn.Conv2d(6, 16, 5)
        # Set fully connected layer. 16 * 5 * 5 and at end 10 must be fixed
        self.fc1 = nn.Linear(16 * 5 * 5, 120) # input size, output Size (choose what you want). this is what we obtain after convolution. Check above cell 
        self.fc2 = nn.Linear(120, 84) # 120 input features and 84 output features
        self.fc3 = nn.Linear(84, 10) # ouput must be 10 for 10 differnt classes

    # We have all the layers and than do the forward pass.
    def forward(self, x):
        # -> n, 3, 32, 32
        x = self.pool(F.relu(self.conv1(x)))  # -> n, 6, 14, 14 # First convolution and polling layer
        x = self.pool(F.relu(self.conv2(x)))  # -> n, 16, 5, 5  # Second convolution and polling layer
        x = x.view(-1, 16 * 5 * 5)            # -> n, 400       # Flatten the output of convolution
        x = F.relu(self.fc1(x))               # -> n, 120       # First fully connected layer with activation
        x = F.relu(self.fc2(x))               # -> n, 84        # Second fully connected layer with activation
        x = self.fc3(x)                       # -> n, 10        # Third fully connected layer, no activation
        return x

In [None]:
# Create the model and define loss and optimizer
model = ConvNet().to(device)

criterion = nn.CrossEntropyLoss() # multiclass classification problem
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [None]:
# Training loop for batch optimization. Loop over epoch and than loop over train laoder
n_total_steps = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # origin shape: [4, 3, 32, 32] = 4, 3, 1024
        # input_layer: 3 input channels, 6 output channels, 5 kernel size
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad() # empty the gradients
        loss.backward()
        optimizer.step()

        if (i+1) % 2000 == 0:
            print (f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{n_total_steps}], Loss: {loss.item():.4f}')

print('Finished Training')
PATH = './cnn.pth'
torch.save(model.state_dict(), PATH)

In [None]:
# wrap using `with` because w e dont need backward propagation
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    n_class_correct = [0 for i in range(10)]
    n_class_samples = [0 for i in range(10)]
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        # max returns (value ,index)
        _, predicted = torch.max(outputs, 1)
        n_samples += labels.size(0)
        n_correct += (predicted == labels).sum().item()
        
        for i in range(batch_size):
            label = labels[i]
            pred = predicted[i]
            if (label == pred):
                n_class_correct[label] += 1
            n_class_samples[label] += 1

    acc = 100.0 * n_correct / n_samples
    print(f'Accuracy of the network: {acc} %')

    for i in range(10):
        acc = 100.0 * n_class_correct[i] / n_class_samples[i]
        print(f'Accuracy of {classes[i]}: {acc} %')