**What is convolution neural network**

Convolutional neural networks, also known as convnet, or CNNs, are a
special kind of neural network for processing data that has a known
grid-like topology like time series data(1D) or images(2D).

**CNN vs ANN** 

1. Overfitting (as there are more learnable parameters in ann compared to cnn , make it more complex model).

2. Loss of imp info like spatial arrangement of pixels.

**Convolution layer and Filters**

1. Edge Detection (Convolution Operation) : They do feature extraction like first they extract edges , then shapes etc. 

2. Each filter is a matrix which has kernel size (eg : 3 => (3,3)) , which is convolve over an area of image (receptive field) , then points at same position first are multiplied then added for a whole.

3. For single channel image : (n , n) conv (f,f)  => (n - f + 1 , n - f + 1)

4. For multi-channel image : (n , n , c) conv (f,f,c)  => (n - f + 1 , n - f + 1)


**Padding**

1. We can notice when convolution is done , we get smaller size .After many layers → The image shrinks too much , information gets lost.

2. Without padding, pixels near the edges and edge features get used less for convolution.

3. Padding adds a border around the image to resolve this issue.

4. There are two types : valid (no padding , image shrinks) and same (padding applied , output size same as input size).

5. Output image size : (n + 2p - f + 1, n - 2p + f + 1)

**Strides**

1. Strides are required when high level feature is required.

2. It decrease also the computing.

3. Output image size : [ (n + 2p - f)/s + 1, (n + 2p - f)/s + 1 ]  (when padding applied) , it considers floor value in division.

**Problem with Convolution**

[32 images](228 , 228 , 3) conv (3 , 3 , 3) [100 filters]  => ( 226 , 226, 100) * 32 which is around 634 MB if each value around 4 bytes. (total value * 4 / 1024 ^ 2)

1. **Memory issue** :  Above example shows how much memory does convolution takes place.

2. **Translation Variance** : Features become location dependant.


**Pooling** 

1. Pooling down sample you image. (using max pooling , min pooling etc.)

2. It helps to reduce the size and Translation invariance (check slides).

3. In case of max_pooling it leads in enhanced features.

4. There is no need of training. (as there is no training parameters)

5. But in some cases where object location is considered , avoid to use pooling.

**CNN Architecture**

| Layer               | Parameters                 | Input Shape  | Output Shape | Notes                                 |
| ------------------- | -------------------------- | ------------ | ------------ | ------------------------------------- |
| **Input**           | —                          | (3, 28, 28)  | (3, 28, 28)  | Image with 3 RGB channels                        |
| **Conv Layer 1**    | 32 filters, 3×3, padding=1 | (3, 28, 28)  | (32, 28, 28) | Filters automatically span 3 channels |
| **MaxPool 1**       | 2×2, stride=2              | (32, 28, 28) | (32, 14, 14) | Reduces spatial size                  |
| **Conv Layer 2**    | 64 filters, 3×3, padding=1 | (32, 14, 14) | (64, 14, 14) | Filters span 32 channels              |
| **MaxPool 2**       | 2×2, stride=2              | (64, 14, 14) | (64, 7, 7)   | Still 3D tensor                       |
| **Flatten**         | —                          | (64, 7, 7)   | (3136,)      | 64×7×7 = 3136                         |
| **Hidden FC Layer** | 64 neurons                 | (3136,)      | (64,)        | Fully connected                       |
| **Output Layer**    | 10 neurons                 | (64,)        | (10,)        | Class scores                          |


**BackPropagation in CNN**

1. The architecture can be seen as two parts.

2. First part include operations like convolution , pooling and flattening.

3. Second part can be seen as oprations like matrix muliplication in ANN .

4. Check the slides to see how gradients is calculated.

**CODE**

In [31]:
import pandas as pd
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torchinfo import summary

In [32]:
torch.manual_seed(42)

<torch._C.Generator at 0x12bbed5d0>

In [33]:
df = pd.read_csv('../2. Dataset/fmnist_small.csv')


In [34]:
x = df.iloc[:, 1:].values/255.0
y = df.iloc[:,0].values

In [35]:
xtrain , xtest , ytrain , ytest = train_test_split( x , y , test_size=0.2 , random_state=20)

In [36]:
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):

  def __init__(self, features, labels):

    self.features = torch.tensor(features, dtype=torch.float32).reshape(-1,1,28,28)  # -1 is used for final dim , it automatically handles that.
    self.labels = torch.tensor(labels, dtype=torch.long)

  def __len__(self):

    return len(self.features)

  def __getitem__(self, idx):

    return self.features[idx], self.labels[idx]


In [37]:
train_dataset = CustomDataset(xtrain,ytrain)
test_dataset = CustomDataset(xtest,ytest)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True , pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True , pin_memory=True)
# helps to faster copy to gpu

In [38]:
class MyNN(nn.Module):

  def __init__(self, num_channels):

    super().__init__()

    self.feature_extraction = nn.Sequential(

      nn.Conv2d(num_channels , 32, kernel_size=3, padding='same'),  # input : (1,28,28) , output : (32,28,28)
      nn.BatchNorm2d(32), # input : (32,28,28) , output : (32,28,28)
      nn.ReLU(), # input : (32,28,28) , output : (32,28,28)
      nn.MaxPool2d(kernel_size=2, stride=2), # input : (32,28,28) , output : (32,14,14)

      nn.Conv2d(32, 64, kernel_size=3, padding='same'), # input : (32,14,14) , output : (64,14,14)  [64 filter each with (32 channels,3 height,3 width)]
      nn.BatchNorm2d(64), # input : (64,14,14) , output : (64,14,14)    [normal applied along channel]
      nn.ReLU(), # input : (64,14,14) , output : (64,14,14)
      nn.MaxPool2d(kernel_size=2, stride=2) # input : (64,7,7) , output : (64,7,7)

    )

    self.classifier = nn.Sequential(

       nn.Flatten(),

       nn.Linear(64*7*7, 128),
       nn.ReLU(),
       nn.Dropout(p=0.4),

       nn.Linear(128, 64),
       nn.ReLU(),
       nn.Dropout(p=0.4),
       
       nn.Linear(64, 10)

    )

  def forward(self, features):

    out = self.feature_extraction(features)
    out = self.classifier(out)

    return out
  

# In pytorch (batch_size , num of channels , height , width)
# conv2d expect no of channels only.

# in keras (batch_size , height , width  , num of channels)
# there it expect complext image size excluding batch size.

In [39]:
device = 'cpu'
if hasattr(torch,'mps') and torch.backends.mps.is_available():
    device = 'mps'
    print("MPS is available")

MPS is available


In [40]:
num_channels = train_dataset[0][0].shape[0]
model = MyNN(num_channels)

model = model.to(device) # so that weights also move on device

num_channels, height, width = train_dataset[0][0].shape
# it expects a batch dim and recommended to give tuple
summary(model , input_size = (1,num_channels, height, width) , device=device)   # shoudl pass device , else it takes cpu and possibility of runtime

Layer (type:depth-idx)                   Output Shape              Param #
MyNN                                     [1, 10]                   --
├─Sequential: 1-1                        [1, 64, 7, 7]             --
│    └─Conv2d: 2-1                       [1, 32, 28, 28]           320
│    └─BatchNorm2d: 2-2                  [1, 32, 28, 28]           64
│    └─ReLU: 2-3                         [1, 32, 28, 28]           --
│    └─MaxPool2d: 2-4                    [1, 32, 14, 14]           --
│    └─Conv2d: 2-5                       [1, 64, 14, 14]           18,496
│    └─BatchNorm2d: 2-6                  [1, 64, 14, 14]           128
│    └─ReLU: 2-7                         [1, 64, 14, 14]           --
│    └─MaxPool2d: 2-8                    [1, 64, 7, 7]             --
├─Sequential: 1-2                        [1, 10]                   --
│    └─Flatten: 2-9                      [1, 3136]                 --
│    └─Linear: 2-10                      [1, 128]                  401,536
│   

In [41]:
epochs = 10
learning_rate = 0.1

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate)

In [42]:
for epoch in range(epochs):

  total_epoch_loss = 0

  for batch_features, batch_labels in train_loader:

    # move data to gpu 
    # one way you can also do is in initial only store tensors in data (then train_dataset points to tensor and hence in gpu only)
    # .to(device) creates a copy on GPU

    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)
 
    outputs = model(batch_features)

    loss = criterion(outputs, batch_labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    total_epoch_loss = total_epoch_loss + loss.item()

  avg_loss = total_epoch_loss/len(train_loader)
  print(f'Epoch: {epoch + 1} , Loss: {avg_loss}')




Epoch: 1 , Loss: 1.1967440501848856
Epoch: 2 , Loss: 0.7672253000736237
Epoch: 3 , Loss: 0.6758850697676341
Epoch: 4 , Loss: 0.6118074768781662
Epoch: 5 , Loss: 0.5795599378148715
Epoch: 6 , Loss: 0.5486063539981842
Epoch: 7 , Loss: 0.5056073557337125
Epoch: 8 , Loss: 0.4862733202179273
Epoch: 9 , Loss: 0.43397352352738383
Epoch: 10 , Loss: 0.409811270882686


In [43]:
model.eval()

MyNN(
  (feature_extraction): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
    (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU()
    (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=3136, out_features=128, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.4, inplace=False)
    (4): Linear(in_features=128, out_features=64, bias=True)
    (5): ReLU()
    (6): Dropout(p=0.4, inplace=False)
    (7): Linear(in_features=64, out_features=10, bias=True)
  )
)

In [44]:
total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in test_loader:

    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

    outputs = model(batch_features)

    _, predicted = torch.max(outputs, 1) 
    # torch.max(input, dim)  ==> maximum along dim 1 i.e along rows
    # gives max_values,max_indices #

    total = total + batch_labels.shape[0]

    correct = correct + (predicted == batch_labels).sum().item()

print(correct/total)


0.8483333333333334
