# Introduction to CNNs
Convolutional Neural Networks (CNNs) are a type of deep neural network specifically designed for image processing tasks like classification, detection, and segmentation. CNNs operates on grids of data to capture spatial patterns in images.

# Introduction to Convolutional Layers

A Convolutional Layer is the fundamental building block of a Convolutional Neural Network (CNN). It is designed to automatically and adaptively learn spatial relationships of features from input images. This is achieved through the process of convolution, where a small filter or kernel is applied to an input image to produce a feature map (or a convolutional output).

### Some typical layers found in a CNN:
Convolutional Layer: Captures spatial features.
ReLU Activation: Introduces non-linearity.
Pooling Layer: Reduces dimensionality of convolution outputs, retaining important information.
Fully Connected Layer: Makes final predictions. Usually found after all convolution layers.

# Convolution Layer Parameters (Hyperparameters):

1. Input Image: The input is a multi-dimensional array (usually a 2D image). For grayscale images, the input is a 2D matrix, but for color images, it's 3D (height, width, and color channels like RGB).
2. Filter (Kernel): A small matrix that slides (convolves) over the input image. This filter detects specific features like edges, textures, and patterns in the image.
3. Stride: The step size by which the filter moves across the input image. Larger strides result in smaller feature maps, reducing dimensionality.
4. Padding: To control the size of the output feature map, padding can be applied. Zero-padding adds pixels with zero value around the border, preserving the size of the input after the convolution operation.

# How Does Convolution Work?
Imagine an input image as a 5x5 matrix and a 3x3 filter (kernel). The convolution operation multiplies the filter's values with the corresponding image region and sums them up to produce a single value in the feature map. The filter "slides" over the image to produce a new output image (feature map).

### ReLU Activation :
After the convolution operation, the output usually goes through a ReLU (Rectified Linear Unit) activation function. This function applies the rule ```ReLU(x)=max(0,x)```, making all negative values zero. This introduces non-linearity, which helps in learning complex patterns.

# Example in PyTorch:
In the code, we use the following convolutional layers:

In [None]:
# Example of a Convolution Operation

import torch
import torch.nn as nn

# Define a 5x5 input (1 channel image, or grayscale)
input_image = torch.tensor([
    [1.0, 0.0, 1.0, 0.0, 1.0],
    [0.0, 1.0, 0.0, 1.0, 0.0],
    [1.0, 0.0, 1.0, 0.0, 1.0],
    [0.0, 1.0, 0.0, 1.0, 0.0],
    [1.0, 0.0, 1.0, 0.0, 1.0]
]).unsqueeze(0).unsqueeze(0)  # Add batch and channel dimensions (1x1x5x5)

# Define a 3x3 kernel
kernel = torch.tensor([
    [0.0, 1.0, 0.0],
    [1.0, 1.0, 1.0],
    [0.0, 1.0, 0.0]
]).unsqueeze(0).unsqueeze(0)  # Add dimensions to match the input (1x1x3x3)

# Create a 2D convolution layer with the kernel (also called filter) as weights
conv_layer = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=0, bias=False)
# Here we are explicitly making the filter weights as a parameters. This line is not required while implementing as part of a CNN
conv_layer.weight = nn.Parameter(kernel)  # Manually set the kernel weights

# Apply the convolution to the input image
output_image = conv_layer(input_image)

print("Input Image:\n", input_image.squeeze().numpy())
print("Kernel (Filter):\n", kernel.squeeze().numpy())
print("Output Feature Map after Convolution:\n", output_image.squeeze().detach().numpy())


Input Image:
 [[1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1.]]
Kernel (Filter):
 [[0. 1. 0.]
 [1. 1. 1.]
 [0. 1. 0.]]
Output Feature Map after Convolution:
 [[1. 4. 1.]
 [4. 1. 4.]
 [1. 4. 1.]]


# Benefits of Convolution:
1. Parameter Sharing: Unlike fully connected layers where each neuron has its own weights, convolutional layers use the same filter across the entire image, significantly reducing the number of parameters.
2. Local Connectivity: Each filter is applied to a small local region of the input, making it efficient in capturing spatial characteristics, such as edges, textures, and patterns.
3. In a CNN, multiple convolutional layers are stacked to capture different levels of information: from low-level features like edges in the first layers to high-level representations like full objects in the deeper layers.

An important property of a CNN is the "receptive field", which is essentially the portion of the input image that the network sees at different layers. For example, the first layer sees the smallest (local) parts of the image, and as we go deeper into the network, the network sees larger parts of the image (global) because of the successive convolution operations. Try to visualize this yourself. Hint: the first layer of a CNN sees a part of the image equal to the filter size, and as we slide the filter through the entire image, we get an output that sees only small parts of the input image. The next layer sees the output of the first layer, which has already seen portions of the input image, which means that the second layer effectively sees a larger part of the input image determined by the filter sizes of both the first and the second layer.   

# Implementation in PyTorch

In [None]:
# Import required libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# 1. Load and preprocess data (an example dataset called MNIST)
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

# 2. Define the CNN model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        # Convolutional layer 1
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1) # Single input channel and 32 output channels
        # Pooling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Convolutional layer 2
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1) # 32 input channels and 64 output channels
        # Fully connected layers (or dense layers)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        # Activation function
        self.relu = nn.ReLU()

    def forward(self, x):
        # Convolution -> ReLU -> Pooling
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        # Flatten the tensor before the fully connected layer
        x = x.view(-1, 64 * 7 * 7)
        # Fully connected layer -> ReLU -> Fully connected layer
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# 3. Instantiate the model, loss function, and optimizer
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 4. Train the model
def train_model():
    for epoch in range(5):  # Loop over the dataset 5 times
        running_loss = 0.0
        for inputs, labels in trainloader:
            optimizer.zero_grad()  # Zero the parameter gradients
            outputs = model(inputs)  # Forward pass
            loss = criterion(outputs, labels)  # Compute the loss
            loss.backward()  # Backward pass
            optimizer.step()  # Optimize

            running_loss += loss.item()
        print(f"Epoch {epoch + 1}, Loss: {running_loss / len(trainloader)}")
    print("Training complete.")

train_model()

# 5. Test the model
def test_model():
    correct = 0
    total = 0
    with torch.no_grad():  # No need to track gradients during testing
        for inputs, labels in testloader:
            outputs = model(inputs)  # Forward pass
            _, predicted = torch.max(outputs, 1)  # Get predicted label
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print(f"Accuracy on test set: {100 * correct / total}%")

test_model()


In the above example, we used a pre-existing dataset that is offered by PyTorch. However, we can also use our own data in the form of a dataloader that we discussed in the previous tutorial notebook. An example of the same dataloader used previously is shown below. Feel free to modify the code below based on your requirements.

In [None]:
# An example of an image dataloader
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms

# This inherits from the Dataset subclass
class CustomDataset(Dataset):
    def __init__(self, img_list, labels, transform=transforms.ToTensor()):
        self.img_list = img_list # Input image paths
        self.labels = labels # Corresponding ground truth labels
        self.transform = transform # Preprocessing transform

    def __len__(self):
        return len(self.img_list) # Returns the total number of training samples

    def __getitem__(self, idx):
        x = Image.open(self.img_list[idx]) # Opening the image from its path
        if x.mode != 'RGB':
            x = x.convert('RGB') # Convert the image to RGB if it's not
        x = self.transform(x) # Apply the transform
        label = self.labels[idx] # Get the label for the image
        return x , label

# Some Useful Tips for the Assignments

1. Since Assignment 1 involves training a DCGAN, you will need to use a network that contains only convolutional layers for the generator. For this, all layers should be of type ```nn.Conv2d()```. Feel free to experiment with different activation functions like  ```nn.ReLU()``` or ```nn.Tanh()```.
2. After defining a class for the generator, you will have to define another class for the disriminator, that takes in the same input and outputs a prediction of whether the input is generated or it comes from your dataset. Since the discriminator predicts a class, the final layers must be of type ```nn.Linear()```. Before adding the linear or fully connected layers, you may want to pool the convolutional outputs using ```nn.AvgPool2d()``` to bring it to a lower dimension that is acceptable for a linear layer as an input.   
3. The final activation function for the discriminator should be a ```sigmoid``` function so that the output is restricted to 0 to 1. The loss function for training the discriminator could be the binary cross entropy (BCE) loss which can be implemented using ```nn.BCELoss()```.
4. Write a training loop for training both the generator and discriminator. Keep in mind that ```model.train()``` and ```model.eval()``` are functions that need to be called in order to choose the right models or parameters that need to be updated in an iteration.
5. **Important:** It is highly possible that the GAN you train might not converge properly, so please be patient, and train the models multiple times with different learning rates. Once you find a model that is trained relatively well, you can save the model using the following function and calling it in the training loop after some fixed number of epochs.

In [None]:
import os

# Function to save both generator and discriminator models
def save_models(generator, discriminator, epoch, folder='saved_models'):
    if not os.path.exists(folder):
        os.makedirs(folder)

    torch.save(generator.state_dict(), f"{folder}/generator_epoch_{epoch}.pth")
    torch.save(discriminator.state_dict(), f"{folder}/discriminator_epoch_{epoch}.pth")
    print(f"Models saved at epoch {epoch}.")

# Example of how to save the models at a particular epoch of training
save_models(generator, discriminator, epoch=10) # This line should be used in the training loop

# Function to load both generator and discriminator models
def load_models(generator, discriminator, epoch, folder='saved_models'):
    generator.load_state_dict(torch.load(f"{folder}/generator_epoch_{epoch}.pth"))
    discriminator.load_state_dict(torch.load(f"{folder}/discriminator_epoch_{epoch}.pth"))
    print(f"Models loaded from epoch {epoch}.")

# Example of loading the model in case you want to test its generation ability
load_models(generator, discriminator, epoch=10)


Happy training!