# **CI Course - EX10**

--------

--------


## Theory Overview ##

**What is CNN (Convolutional Neural Networks) ??**

Convolutional Neural Networks are a type of deep learning algorithm designed mainly for analyzing visual data, such as images or videos.CNNs are composed of multiple layers, with the convolutional layers being the core components. **Convolutional layers apply filters or feature detectors** to input images, **capturing important patterns and features** in a hierarchical manner. 

**Unlike fully connected** layers, which **connect every neuron to every neuron** in the previous layer, **convolutional layers have sparse connectivity**. This means that each neuron in a convolutional layer only receives input from a small region of the previous layer, **reducing the number of parameters and enabling the network to scale efficiently**. This is a **significant difference** between CNNs and fully connected layers.

Two main advantages of CNNs are:

1. Local feature learning: Convolutional layers capture local patterns and features in images, allowing the network to learn hierarchical representations. This enables CNNs to automatically extract relevant features from raw data, making them effective for tasks like image classification and object detection.

2. Translation invariance: CNNs are able to recognize patterns regardless of their position in an image. This property, known as translation invariance, allows CNNs to robustly identify objects even if they appear in different locations within an image. This makes CNNs well-suited for tasks involving object recognition and localization.

Two widely known applications of CNNs include image classification and object detection, although there are numerous other applications where CNNs have proven to be effective.

**The convolution operation**

So we said that convolutional layers apply filters, what are they??

The filters in CNN layers are small matrices or grids of numbers.This filters slide over the image, pixel by pixel, performing a mathematical operation called convolution. The convolution operation combines the values of the filter and the corresponding pixels in the image, producing a new value.

![CNN1.png](attachment:9a933cce-218e-4523-a6b9-db7a01f8bbe2.png)

To illustrate, if we have a 3x3 filter matrix and a 4x4 input, the filter would be applied by sliding it over the input as follow:

![CNN2.png](attachment:3f6e3f7a-6aa9-4bdb-9e0f-63a4bee8fb9a.png)


**Alright, what is the next step? How should we proceed from here?**

After CNN layers, the next steps of a neural network typically involve additional layers to further process the extracted features. This often includes fully connected layers enabling the network to learn complex relationships and make predictions based on the extracted features.

---

### Exercise - Image Classification - CIFAR Dataset ###

What is the CIFAR dataset? 

The CIFAR dataset is a popular benchmark dataset commonly used for training and evaluating machine learning models, particularly for image classification tasks. CIFAR stands for the Canadian Institute for Advanced Research, which originally created the dataset.

There are two main versions of the CIFAR dataset: CIFAR-10 and CIFAR-100.

**Input:**

For CIFAR-10, the inputs are **RGB (color) images with a fixed size of 32x32** pixels. Each image is represented as a **3-dimensional array with dimensions (32, 32, 3), where 3 corresponds to the three color channels (red, green, and blue)**.

**Output:**

The CIFAR-10 dataset consists of **10 different classes**, including common objects such as airplanes, cars, cats, dogs, and more. Each image in the dataset is assigned a corresponding label indicating its class.




---

## Solution ##

**Solution flow:**

0. Imports
1. Set optimizer, loss-function and hyperparameters
2. Load and pre-process
3. Build the model
4. Training loop and learning curves
5. Evaluate

----

### Imports ###

In this exercise, we will utilize the torchvision library, which is a valuable PyTorch library.

torchvision containing computer vision-related utilities. It provides access to popular datasets, model architectures, and commonly used image transformations for computer vision. It makes it easy to load and preprocess data for training and evaluation of deep learning models for computer vision tasks.


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# welcome torchvision
import torchvision
import torchvision.transforms as transforms

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns #not necessary for the training , only for visualising the confusion matrix
%matplotlib qt 


---

### Set hyperparameters ###

In [None]:
epochs = 5
batch_size = 12
lr =0.005
momentum = 0.9

torch.random.manual_seed(42)

PATH = './cifar_net.pth' # for saving the model

---

### Loading and Pre-process ###

**Pre-process**

In this exercise the pre-process steps will be implemented by torchvision transforms module.

**transforms** is a module within the torchvision package that provides common transformations. These transformations can be used to preprocess data before feeding it into a neural network.

```transforms.ToTensor()``` - Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]

```transforms.Normalize()``` - Normalizing input by given mean and std. does the following for each channel of the input. 

$$ x_{normalized} = {{(x - \mu)} \over {\sigma}} $$



In [49]:
transform = transforms.Compose([transforms.ToTensor(),  
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# image_input(HXWXC,[0,255])>>ToTensor(image_input)>>image_tensor((CxHxW),[0,1])>>Normalize(image_tensor)
# >>image_tensor((CxHxW),[-1,1])

**Loading the data (using pytorch DataLoader)**

In PyTorch, loading a dataset involves **creating an instance of a dataset object** and passing it to a DataLoader. The dataset object can be a custom dataset that you define, or one of the many built-in datasets provided by the torchvision, torchaudio, and torchtext packages. **These built-in datasets automatically download and preprocess the data**, making it easy to get started with training machine learning models. Once you have created an instance of the dataset, **you can pass it to a DataLoader along with some additional parameters such as the batch size and whether the data should be shuffled. The DataLoader returns an iterator that yields batches of data**, which can be easily iterated over in a training loop to feed batches of data into the model.

```torchvision.datasets``` - creating a dataset instance


```torch.utils.data.DataLoader``` - creating a dataloader object


In [50]:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)


trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


In [51]:
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))
# show images
imshow(torchvision.utils.make_grid(images))

ship  frog  truck frog  deer  frog  horse plane frog  bird  truck frog 


### Build the model ###

So, today our classifier model is going to consist **2 convolutional layers** each of them activated by **Relu** and after that downsample using **pooling layer**. 

**what is pooling layer??**

Pooling layers in neural networks are used to downsample the input data by aggregating neighboring values, reducing the spatial dimensions while preserving important features.

To illustrate, let's consider the utilization of a max pooling layer today:

![pool.png](attachment:f33608a2-cff9-4faf-acb4-db74c4d80185.png)

Following the CNN step, we will proceed with **3 consecutive fully connected layers**. Finally, for the output layer, we will apply the **softmax activation function** (more information - see EX9).

In [None]:
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, 5) #(in channels, out channels, kernel size)
        self.pool = nn.MaxPool2d(2, 2) #(kernel size, strid step)
        self.conv2 = nn.Conv2d(16, 24, 5)
        self.fc1 = nn.Linear(24 * 5 * 5, 120) 
        self.fc2 = nn.Linear(120, 64)
        self.fc3 = nn.Linear(64, len(classes))
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        # x = torch.flatten(-1, x) # flatten all dimensions except batch
        x = torch.view(-1,24*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        # x = self.softmax(x)
        return x

net = Net()
net

To summarize, our network architecture can be visualized as follows:

![net.png](attachment:6a5f0805-91cf-424b-8cc7-cbe7a64fd37c.png)

### Set optimizer and loss function ###

In [53]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=momentum)

### Training loop ###

In [61]:
# net.train()
for epoch in range(epochs):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):

        
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        # forward + backward + optimize
        
        print(type(inputs))
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        
        # zero the parameter gradients
        optimizer.zero_grad()
        # back prop
        loss.backward()
        #update weights
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')


<class 'torch.Tensor'>


TypeError: flatten(): argument 'input' (position 1) must be Tensor, not int

## Summary ##

bla.. bla..


---

## Summary ##

bla.. bla..


---