**What is convolution neural network**

Convolutional neural networks, also known as convnet, or CNNs, are a
special kind of neural network for processing data that has a known
grid-like topology like time series data(1D) or images(2D).

**CNN vs ANN** 

1. Overfitting (as there are more learnable parameters in ann compared to cnn , make it more complex model).

2. Loss of imp info like spatial arrangement of pixels.

**Convolution layer and Filters**

1. Edge Detection (Convolution Operation) : They do feature extraction like first they extract edges , then shapes etc. 

2. Each filter is a matrix which has kernel size (eg : 3 => (3,3)) , which is convolve over an area of image (receptive field) , then points at same position first are multiplied then added for a whole.

3. For single channel image : (n , n) conv (f,f)  => (n - f + 1 , n - f + 1)

4. For multi-channel image : (n , n , c) conv (f,f,c)  => (n - f + 1 , n - f + 1)


**Padding**

1. We can notice when convolution is done , we get smaller size .After many layers → The image shrinks too much , information gets lost.

2. Without padding, pixels near the edges and edge features get used less for convolution.

3. Padding adds a border around the image to resolve this issue.

4. There are two types : valid (no padding , image shrinks) and same (padding applied , output size same as input size).

5. Output image size : (n + 2p - f + 1, n - 2p + f + 1)

**Strides**

1. Strides are required when high level feature is required.

2. It decrease also the computing.

3. Output image size : [ (n + 2p - f)/s + 1, (n + 2p - f)/s + 1 ]  (when padding applied) , it considers floor value in division.

**Problem with Convolution**

[32 images](228 , 228 , 3) conv (3 , 3 , 3) [100 filters]  => ( 226 , 226, 100) * 32 which is around 634 MB if each value around 4 bytes. (total value * 4 / 1024 ^ 2)

1. **Memory issue** :  Above example shows how much memory does convolution takes place.

2. **Translation Variance** : Features become location dependant.


**Pooling** 

1. Pooling down sample you image. (using max pooling , min pooling etc.)

2. It helps to reduce the size and Translation invariance (check slides).

3. In case of max_pooling it leads in enhanced features.

4. There is no need of training. (as there is no training parameters)

5. But in some cases where object location is considered , avoid to use pooling.

**CNN Architecture**

| Layer               | Parameters                 | Input Shape  | Output Shape | Notes                                 |
| ------------------- | -------------------------- | ------------ | ------------ | ------------------------------------- |
| **Input**           | —                          | (3, 28, 28)  | (3, 28, 28)  | Image with 3 RGB channels                        |
| **Conv Layer 1**    | 32 filters, 3×3, padding=1 | (3, 28, 28)  | (32, 28, 28) | Filters automatically span 3 channels |
| **MaxPool 1**       | 2×2, stride=2              | (32, 28, 28) | (32, 14, 14) | Reduces spatial size                  |
| **Conv Layer 2**    | 64 filters, 3×3, padding=1 | (32, 14, 14) | (64, 14, 14) | Filters span 32 channels              |
| **MaxPool 2**       | 2×2, stride=2              | (64, 14, 14) | (64, 7, 7)   | Still 3D tensor                       |
| **Flatten**         | —                          | (64, 7, 7)   | (3136,)      | 64×7×7 = 3136                         |
| **Hidden FC Layer** | 64 neurons                 | (3136,)      | (64,)        | Fully connected                       |
| **Output Layer**    | 10 neurons                 | (64,)        | (10,)        | Class scores                          |


**BackPropagation in CNN**

1. The architecture can be seen as two parts.

2. First part include operations like convolution , pooling and flattening.

3. Second part can be seen as oprations like matrix muliplication in ANN .

4. Check the slides to see how gradients is calculated.

**CODE**

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torchinfo import summary

In [3]:
torch.manual_seed(42)

<torch._C.Generator at 0x135291290>

In [4]:
df = pd.read_csv('../2. Dataset/fmnist_small.csv')


In [5]:
x = df.iloc[:, 1:]/255.0
y = df.iloc[:,0]

In [6]:
xtrain , xtest , ytrain , ytest = train_test_split( x , y , test_size=0.2 , random_state=20)

In [7]:
xtrain_tensor = torch.from_numpy(xtrain.values).float()
xtest_tensor = torch.from_numpy(xtest.values).float()
ytrain_tensor = torch.from_numpy(ytrain.values)
ytest_tensor = torch.from_numpy(ytest.values)

In [8]:
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):

  def __init__(self, features, labels):

    self.features = features
    self.labels = labels

  def __len__(self):

    return len(self.features)

  def __getitem__(self, idx):

    return self.features[idx], self.labels[idx]


In [9]:
train_dataset = CustomDataset(xtrain_tensor,ytrain_tensor)
test_dataset = CustomDataset(xtest_tensor,ytest_tensor)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True , pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=True , pin_memory=True)
# helps to faster copy to gpu

In [10]:
class MyNN(nn.Module):

  def __init__(self, num_features):

    super().__init__()

    self.network = nn.Sequential(
        nn.Linear(num_features, 128),
        nn.ReLU(),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Linear(64, 10)
    )

  def forward(self, features):

    out = self.network(features)

    return out

In [11]:
device = 'cpu'
if hasattr(torch,'mps') and torch.backends.mps.is_available():
    device = 'mps'
    print("MPS is available")

MPS is available


In [12]:
model = MyNN(xtrain_tensor.shape[1]) 
model = model.to(device) # so that weights also move on device

summary(model , input_size = xtrain_tensor.shape , device=device)   # shoudl pass device , else it takes cpu and possibility of runtime

Layer (type:depth-idx)                   Output Shape              Param #
MyNN                                     [4800, 10]                --
├─Sequential: 1-1                        [4800, 10]                --
│    └─Linear: 2-1                       [4800, 128]               100,480
│    └─ReLU: 2-2                         [4800, 128]               --
│    └─Linear: 2-3                       [4800, 64]                8,256
│    └─ReLU: 2-4                         [4800, 64]                --
│    └─Linear: 2-5                       [4800, 10]                650
Total params: 109,386
Trainable params: 109,386
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 525.05
Input size (MB): 15.05
Forward/backward pass size (MB): 7.76
Params size (MB): 0.44
Estimated Total Size (MB): 23.25

In [None]:
epochs = 10
learning_rate = 0.1

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate)

In [None]:
for epoch in range(epochs):

  total_epoch_loss = 0

  for batch_features, batch_labels in train_loader:

    # move data to gpu 
    # one way you can also do is in initial only store tensors in data (then train_dataset points to tensor and hence in gpu only)
    # .to(device) creates a copy on GPU

    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)
 
    outputs = model(batch_features)

    loss = criterion(outputs, batch_labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    total_epoch_loss = total_epoch_loss + loss.item()

  avg_loss = total_epoch_loss/len(train_loader)
  print(f'Epoch: {epoch + 1} , Loss: {avg_loss}')


In [None]:
model.eval()

In [None]:
total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in test_loader:

    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

    outputs = model(batch_features)

    _, predicted = torch.max(outputs, 1) 
    # torch.max(input, dim)  ==> maximum along dim 1 i.e along rows
    # gives max_values,max_indices

    total = total + batch_labels.shape[0]

    correct = correct + (predicted == batch_labels).sum().item()

print(correct/total)
