# Model Training in PyTorch

In this lab, you will explore the core concepts and steps involved in ***building a convolutional neural network (CNN) using PyTorch***, a leading deep learning framework. CNNs are particularly well-suited for image classification tasks due to their ability to automatically learn spatial hierarchies in images.

**Throughout the lab, you will:**

* Define a simple CNN architecture using torch.nn.Module.
* Work with image data by loading and transforming it for training.
* Implement the forward pass with convolutional, pooling, and fully connected layers.
* Optimize the model using an optimizer like SGD or Adam.
* Evaluate the model's performance with metrics such as loss and accuracy.

By the end of the lab, you will have a foundational understanding of CNNs in PyTorch and how to train them for image classification tasks. This hands-on experience will serve as a stepping stone for building more advanced, custom CNN architectures.

As was practice in the introductory course, `XXXX` means you have to fill in the correct code. If you are following along and not in our course at the University of Rhode Island, you can find the answers in the `02-model-training-with-pytorch-ANSWERKEY.ipynb` file in the repository.

Let’s begin by importing the necessary libraries for your first CNN.

## 1. Set up

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

## 2. Parameters for Convolutional Layers

There a few parameters that we have to specify for each convolutional layer that we add. Below is a description of what they are and how to select appropriate values.

#### **`in_channels`: The Number of Input Feature Maps**

* `in_channels` refers to the number of input channels (feature maps) being fed into a convolutional layer.

* It must match the number of output channels from the previous layer (except for the first layer, which depends on the input data).

  **First Convolutional Layer**:

  * If the input is an RGB image (CIFAR-10, ImageNet, etc.), it has 3 channels (R, G, B), so `in_channels`=3.

  * If the input is a grayscale image (MNIST, medical imaging, etc.), it has 1 channel, so `in_channels`=1.

  **Subsequent Layers**:

  * The `in_channels` for each layer is equal to the out_channels of the previous convolutional layer.

#### **`out_channels`: The Number of Output Feature Maps**

* `out_channels` determines how many feature maps (filters) the convolutional layer will output.

* Each filter in a CNN learns to detect different features (edges, textures, shapes, etc.), so increasing `out_channels` allows the model to learn more complex patterns.

  **Typical Design Choices:**

  * Start with a small number of filters (e.g., `out_channels`=16 or
`out_channels`=32) to extract low-level patterns.

  * Gradually increase `out_channels` (e.g., 32 → 64 → 128 → 256) as the network goes deeper, capturing more abstract features.

#### **`kernel_size`: The x,y dimensions of the filters**
  
  * Start with a smaller filter (e.g., `kernel_size`=3) and work your way to larger filters if needed.

#### **`padding`: The additional pixels added around your image**
  
  * Typically set to zero.
  * Add padding if the information at the edges of your images is highly important to your task.
  
#### **`stride`: The number of pixels we shift our kernel**

* `stride` refers to the size of the shift of the kernel across the image at each step.
* Practitioners oven opt for a `stride` of `1` to capture the largest amount of detail. This is the default value, so we can leave it out for now.

## 3. Parameters for Pooling Layers

There a few parameters that we have to specify for each ***pooling*** layer that we add. Below is a description of what they are and how to select appropriate values.

#### **`kernel_size`: The size of the pooling kernel**

* `kernel_size` refers to the size of the pooling kernel that you are applying. A common value to select in practice is `2`. The reduces the size of the feature map by an order of 2.

#### **`stride`: The number of pixels we shift our kernel**

* `stride` refers to the size of the shift of the pooling kernel across the image at each step.
* Practitioners oven opt for no overlap with their pooling kernel. Hence, if you select `2` for your pooling `kernel_size`, then you should select `2` for the `stride`.


## 4. Parameters for Fully Connected Layers

There a few parameters that we have to specify for our ***fully connected*** layer. We must specify 4 numbers in `nn.Linear(#, #, #, #)`.

* The first number is the input depth (i.e., the `output_size` of the last layer).
* The second and third numbers are the dimensions of the image.
* The final number is the number of classes in your dataset.

## 5. Let's build your first custom CNN!

In [None]:
# Define the CNN model
class CustomCNN(nn.Module):
    def __init__(self):
        super(CustomCNN, self).__init__()
        self.conv1 = nn.Conv2d(XXXX)
        self.conv2 = nn.Conv2d(XXXX)
        self.pool = nn.MaxPool2d(XXXX)
        self.fc1 = nn.Linear(XXXX)

    def forward(self, x):
        x = XXXX
        x = XXXX
        x = XXXX  # Hint: Flatten before passing to fully connected layer
        x = XXXX
        return x

# 6. Prepare the Data

For this lab, we will be using the MNIST dataset. This is a popular dataset containing black-and-white images of numbers.

In [None]:
# Load MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(XXXX)

testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = DataLoader(XXXX)

## 7. Training the CNN

First we must specify a few things about our training process:
* What model we will use,
* what loss function we want,
* and what optimizer we want (where we also specify our learning rate).

In [None]:
# Initialize model, loss function, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = XXXX
criterion = XXXX
optimizer = XXXX

Now, we can create a loop to run through the batches of data and update the weights for each batch.

We will also save the training loss and report it back once after each epoch.

In [None]:
# Training loop
epochs = 5
for epoch in range(epochs):
    running_loss = 0.0
    for XXXX, XXXX in XXXX:
        XXXX, XXXX = XXXX, XXXX

        XXXX
        XXXX
        loss = XXXX
        XXXX
        XXXX

        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")

print("Training complete!")

**Congratulations**! You've successfully trained your first custom CNN!

## 8. Exploration Quest

Try changing up some of the model parameters, such as the number of filters, stride, and padding to see how they impact your training.

Then, try changing some of the training parameters, such as the batch size, loss function, optimizer, and learning rate, and see how they impact the model training.

Can you find an "optimal" set of parameters for this task?