# Baseline Model for Computer Vision with PyTorch

Welcome back to our series on building a computer vision model using PyTorch. In our previous discussions, we have successfully loaded our dataset into data loaders, converting the 60,000 images into 1875 batches of 32 images for training, and 313 batches of 32 for testing. This step enables us to efficiently feed our data into a neural network in manageable chunks. We also explored visualizing images from these batches, noting that our images retain their original size of 28x28 pixels with a single color channel.

## Workflow Overview

At this stage, our data is ready for modeling. We've processed our images into tensors using a combination of torchvision transforms and `torch.utils.data.Dataset`. Although torchvision datasets conveniently handled most of this for us (specifically with the FashionMNIST dataset), we employed `torch.utils.data.DataLoader` to streamline our datasets into loaders suitable for model ingestion.

Now, we embark on selecting or constructing a pre-trained model that aligns with our problem statement. However, we'll start with something simple—a baseline model. This approach is crucial in machine learning experiments. It allows us to establish a performance benchmark with a simple model, which we then strive to surpass with more complex models or refinements.

### Why Start with a Baseline Model?

A baseline model serves as a simple point of reference for future models. It's essentially the most straightforward solution we attempt to improve upon through successive experiments. This practice of starting simple and gradually introducing complexity is particularly advisable in neural network projects to avoid overfitting from the outset.

## Introducing the Flattened Layer

Before diving into model construction, let's discuss a new component: the flattened layer. This layer transforms a multi-dimensional tensor into a one-dimensional tensor, making it compatible with certain types of neural network layers that expect input in a flat vector form.

### Implementing a Flattened Layer with PyTorch

PyTorch offers the `nn.Flatten` module to seamlessly convert a tensor of shape [1, 28, 28] (1 color channel, 28x28 pixels) into a shape of [1, 784]. This process is crucial for preparing our image data for linear layers, which require a flat input vector.

## Building Our Baseline Model

With our data ready and a basic understanding of the flattened layer, we proceed to construct our baseline model. This model will include:

1. A flatten layer to prepare our image data.
2. Linear layers to perform the actual learning based on the flattened input.

Here's a simplified overview of the model construction process:

1. **Initialize the Model**: Define the structure of our neural network, including the flatten layer and linear layers.
2. **Forward Pass Definition**: Implement the logic for passing input data through the model layers and returning the output.
3. **Instantiate the Model**: Create an instance of our model, specifying input dimensions (784 for our flattened 28x28 images), the number of units in the hidden layer, and the output shape corresponding to the number of classes (10 for FashionMNIST).

### A Note on Model Complexity

Our initial model is intentionally simple, consisting of just a couple of linear layers following the input flattening. This simplicity is by design, to establish our baseline. As we progress, we'll explore adding complexity and non-linearities to enhance model performance.

## Next Steps

Having set up our baseline model, our next action is to train it on our prepared data, evaluate its performance, and iteratively refine it. This cycle of building, evaluating, and improving models is at the heart of machine learning experimentation.

Stay tuned for our next video, where we'll delve into training our model and exploring the nuances of model evaluation and refinement.

## Python Code Demonstration

Let's code out the baseline model we discussed:




In [1]:

import torch
from torch import nn
import torchvision
from torchvision import datasets
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms

# Set the seed for reproducibility
torch.manual_seed(0)

# Define a simple baseline model
class FashionMNISTModel(nn.Module):
    def __init__(self, input_shape, hidden_units, output_shape):
        super(FashionMNISTModel, self).__init__()
        self.layer_stack = nn.Sequential(
            nn.Flatten(), # Flatten the input
            nn.Linear(input_shape, hidden_units), # First linear layer
            nn.Linear(hidden_units, output_shape) # Second linear layer
        )
        
    def forward(self, x):
        return self.layer_stack(x)

# Model parameters
input_shape = 784 # 28x28 images flattened
hidden_units = 128 # Number of units in the hidden layer
output_shape = 10 # Number of classes in FashionMNIST

# Instantiate the model
model = FashionMNISTModel(input_shape, hidden_units, output_shape)
print(model)

# Dummy input to demonstrate forward pass
dummy_x = torch.rand((1, 1, 28, 28)) # A dummy batch of size 1
output = model(dummy_x)
print(f"Output shape: {output.shape}")

FashionMNISTModel(
  (layer_stack): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=128, bias=True)
    (2): Linear(in_features=128, out_features=10, bias=True)
  )
)
Output shape: torch.Size([1, 10])
