# Mastering PyTorch Ashish Ranjan Jha

## Deep CNN Architectures

### Why are CNNs so powerful?
CNNs are among the most powerful machine learning models at solving challenging problems such as image classification, object detection, object segmentation, video processing, natural language pro-cessing, and speech recognition.

Few reasons:
- Weight sharing: Different features are extracted using the same set of weights or parameters. This makes CNNs parameter-efficient.
- Automatic feature extraction: Multiple feature extraction stages help a CNN to atutomatically learn feature representation in a dataset. 
- Hierarchical learning: The multi-layered CNN structure helps CNNs to learn low-, mid-, and high-level features.
- The ability to explore both spatial and temporal correlations in the data, such as in video-processing tasks.

Besides these pre-existing fundamental characteristics, CNNs have advanced over the years with the help of improvements in the following areas:
- The use of better activation and loss functions, such as using ReLU to overcome the vanishing gradient problem.
- Parameter optimization, such as using an optimizer based on Adaptive Momentum (Adam) instead of simple stochastic gradient descent.
- Regularization: Applying dropouts and batch normalization besides L2 regularization.

**NOTE: Features** are the high-level representations of input data that the model generates with its parameters.

The various architectural innovations:
- Spatial exploration-based CNNs: The idea behind spatial exploration is using different kernel sizes in order to explore different levels of visual features in input data.
- Depth-based CNNs: The depth here refers to the depth of the neural network, that is, the number of layers. So, the idea here is to create a CNN model with multiple convolutional layers in order to extract highly complex visual features.
- Width-based CNNs: Width refers to the number of channels or feature maps in the data or features extracted from the data. So, width-based CNNs are all about increasing the number of feature maps as we go from the input to the output layers.
- Multi-path-based CNNs: So far, the preceding three types of architectures have had monotonicity in connections between layers; that is, direct connections exist only between consecutive layers. Multi-path CNNs brought the idea of making shortcut connections or skip connections between non-consecutive layers.

### Fine-tuning the AlexNet model

![AlexNet](https://neurohive.io/wp-content/uploads/2018/10/AlexNet-1.png)

In [5]:
import torch
import torch.nn as nn

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.feats = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4, padding=0),  # Adjusted out_channels and padding
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  
            nn.Conv2d(in_channels=96, out_channels=256, kernel_size=5, stride=1, padding=2),  # Adjusted out_channels
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  
            nn.Conv2d(in_channels=256, out_channels=384, kernel_size=3, stride=1, padding=1),  # Adjusted out_channels
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=384, out_channels=384, kernel_size=3, stride=1, padding=1),  # Adjusted out_channels
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=384, out_channels=256, kernel_size=3, stride=1, padding=1),  # Adjusted out_channels
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),  
        )
        self.clf = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, inp):
        op = self.feats(inp)
        op = torch.flatten(op, 1)
        op = self.clf(op)
        return op


But besides the option of initializing the model architecture and training it ourselves, PyTorch, with its torchvision package, provides a models sub-package, which contains definitions of CNN models meant for solving different tasks, such as image classification, semantic segmentation, object detection, and so on.

- AlexNet
- VGG
- ResNet
- SqueezeNet
- DenseNet
- Inception v3
- GoogLeNet
- ShuffleNet v2
- MobileNet v2
- ResNeXt
- Wide ResNet
- MnasNet
- EfficientNet