# Transfer Learning using VGG16 in PyTorch
In this project, I will apply **transfer learning** on the Fashion MNIST dataset using the **pre-trained VGG16 model**. Since VGG16 is trained on the ImageNet dataset (RGB, 1.4M images, 224x224), several preprocessing steps are required to make our dataset compatible.

#### **Author:** Feroz Khan
---

## Project Workflow Overview

1. **Understand Transfer Learning**
   - Use a model pre-trained on a large dataset (ImageNet).
   - Fine-tune it on a smaller, related dataset (Fashion MNIST).
   - Freeze early layers (which learn edges and shapes).
   - Retrain final layers (which learn task-specific features).

2. **Import Pre-trained VGG16**
   - Load `vgg16(pretrained=True)` from `torchvision.models`.
   - Remove original classifier layers meant for 1000 ImageNet classes.
   - Add a custom classifier for 10 Fashion MNIST classes.

---

## Data Preprocessing Steps

Fashion MNIST images are grayscale (1x28x28), but VGG16 expects RGB images (3x224x224) with specific normalization. So we apply the following:

### Step 1: Reshape
- Reshape each flattened image (784,) to (28, 28)

### Step 2: Convert Data Type
- Convert data type to `np.uint8` for PIL compatibility

### Step 3: Convert to RGB
- Stack grayscale channel into 3 channels: (1, 28, 28) → (3, 28, 28)
- Permute to (H, W, C) as required by `PIL.Image`

### Step 4: Convert to PIL Image
- Use `Image.fromarray()` from PIL

---




In [1]:
# Importing Essential Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

In [2]:
# Set manual seed for reproducibility
torch.manual_seed(42)

<torch._C.Generator at 0x7f41f430f2b0>

In [3]:
# Check GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'gpu')
print(f"Using Device: {device}")

Using Device: cuda


In [4]:
df = pd.read_csv('fashion-mnist_train.csv')
df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,2,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,9,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,6,0,0,0,0,0,0,0,5,0,...,0.0,0.0,0.0,30.0,43.0,0.0,0.0,0.0,0.0,0.0
3,0,0,0,0,1,2,0,0,0,0,...,3.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,3,0,0,0,0,0,0,0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23663 entries, 0 to 23662
Columns: 785 entries, label to pixel784
dtypes: float64(346), int64(439)
memory usage: 141.7 MB


In [6]:
# Train Test Split
X = df.iloc[:,1:]
y = df.iloc[:,0]

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
# We will need to apply some transformations
from torchvision.transforms import transforms

custom_transform = transforms.Compose([
    transforms.Resize(256),                   # Resize the shortest side of the image to 256 pixels
    transforms.CenterCrop(224),               # Crop the center 224x224 region (input size for VGG16)
    transforms.ToTensor(),                    # Convert PIL image to PyTorch tensor and scale pixel values to [0, 1]
    transforms.Normalize(                     # Normalize the image using ImageNet's mean and std values
        mean = [0.485, 0.456, 0.406],         # These correspond to RGB channel means
        std = [0.229, 0.224, 0.225]           # These correspond to RGB channel standard deviations
    )
])

In [39]:
# Importing essentail libraries
from PIL import Image
import numpy as np

In [28]:
# Create CustomDataset Class
class CustomDataset(Dataset):
    def __init__(self, features, labels, transform):
        self.features = features.values            # Flattened image data (e.g., 784 pixels per image)
        self.labels = labels.values                # Corresponding labels
        self.transform = transform          # Transformations to apply to each image
        
    def __len__(self):
        return len(self.features)           # Return total number of samples
        
    def __getitem__(self, index):
        # === Preprocessing and Transformation ===

        # Step 1: Reshape flattened image (784,) to (28, 28)
        image = self.features[index].reshape(28, 28)
        
        # Step 2: Convert datatype to uint8 (required by PIL)
        image = image.astype(np.uint8)
        
        # Step 3: Convert grayscale to RGB by stacking 3 channels → shape becomes (28, 28, 3)
        image = np.stack([image] * 3, axis = -1)
        
        # Step 4: Convert numpy array to PIL Image (required by torchvision transforms)
        image = Image.fromarray(image)
        
        # Step 5: Apply composed transformations (resize, crop, normalize)
        image = self.transform(image)

        # Step 6: Return the transformed image and its corresponding label as torch tensor
        return image, torch.tensor(self.labels[index], dtype = torch.long)

In [29]:
# Create train_dataset and test_dataset objects
train_dataset = CustomDataset(X_train, y_train, transform = custom_transform)
test_dataset = CustomDataset(X_test, y_test, transform = custom_transform)

In [30]:
# Create train and test loader

train_loader = DataLoader(
    train_dataset,       # CustomDataset for training data
    batch_size = 32,     # Number of samples per batch
    shuffle = True,       # Shuffle training data for better generalization
    pin_memory = True
)

test_loader = DataLoader(
    test_dataset,        # CustomDataset for test data
    batch_size = 32,     # Same batch size as training
    shuffle = False,      # Do not shuffle test data to maintain order during evaluation
    pin_memory = True
)

## Note: With this we have completed data preprocessing and transformation

#### We will fetch the pretrained model

In [16]:
import torchvision.models as models

# Load the pre-trained VGG16 model (trained on ImageNet)
vgg16 = models.vgg16(pretrained = True)  # Automatically downloads weights and architecture

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/jovyan/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:01<00:00, 286MB/s] 


In [17]:
print(vgg16)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [31]:
vgg16.classifier

Sequential(
  (0): Linear(in_features=25088, out_features=1024, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=1024, out_features=512, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=512, out_features=10, bias=True)
)

In [32]:
# Let's freeze the weights of the feature extractor part of the model

for param in vgg16.features.parameters():
    param.requires_grad = False   # Disable gradient updates for convolutional layers (pre-trained on ImageNet)

In [33]:
# Now, we will replace the classifier with our own version
# because the original classifier is meant for 1000 ImageNet classes, not our 10 Fashion MNIST classes

vgg16.classifier = nn.Sequential(
    nn.Linear(25088, 1024),   # First fully connected layer (input size matches VGG16's feature output)
    nn.ReLU(),                # Activation function
    nn.Dropout(0.5),          # Dropout for regularization

    nn.Linear(1024, 512),     # Second fully connected layer
    nn.ReLU(),                # Activation function
    nn.Dropout(0.5),          # Another dropout layer

    nn.Linear(512, 10)        # Final output layer (10 classes for Fashion MNIST)
)

In [34]:
vgg16

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [35]:
# Move the model to GPU
vgg16 = vgg16.to(device)

In [36]:
# Set learning rate and epochs
lr = 0.0001
epochs = 10

In [37]:
# Loss Function
criteria = nn.CrossEntropyLoss()
# Optimizer
optimizer = optim.Adam(vgg16.classifier.parameters(), lr = lr)

In [41]:
# Training Loop

for epoch in range(epochs):

  total_epoch_loss = 0

  for batch_features, batch_labels in train_loader:
    # Move data to GPU
    batch_features = batch_features.to(device)
    batch_labels = batch_labels.to(device)

    # Forward Pass
    y_pred = vgg16(batch_features)

    # Calculate Loss
    loss = criteria(y_pred, batch_labels)

    # Clear gradients
    optimizer.zero_grad()

    # Backpropogation
    loss.backward()

    # Update params
    optimizer.step()

    # Batch Loss
    total_epoch_loss += loss.item()

  print(f"Epoch: {epoch+1} --> Batch Loss: {total_epoch_loss/len(train_loader)}")


  image = image.astype(np.uint8)


Epoch: 1 --> Batch Loss: 2.299061297162159
Epoch: 2 --> Batch Loss: 2.248134951333742
Epoch: 3 --> Batch Loss: 2.1858989002334104
Epoch: 4 --> Batch Loss: 2.1351031489871644
Epoch: 5 --> Batch Loss: 2.0831798149927243
Epoch: 6 --> Batch Loss: 2.0421674908012957
Epoch: 7 --> Batch Loss: 2.01005243228094
Epoch: 8 --> Batch Loss: 1.986537451276908
Epoch: 9 --> Batch Loss: 1.9650589461262162
Epoch: 10 --> Batch Loss: 1.9485262213526546


In [48]:
# Model Evaluation
vgg16.eval()    # No BP; no neuron droput; batch normalization off - Behavioral changes

total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in test_loader:
    # Move data to GPU
    batch_features = batch_features.to(device)
    batch_labels = batch_labels.to(device)
      
    y_pred = vgg16(batch_features)  # 32 x 10
    y_pred = torch.argmax(y_pred, dim = 1)

    total = total + batch_labels.shape[0]

    correct = correct + (y_pred == batch_labels).sum().item()

print(correct/total)


0.29431650116205366


In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_pred, 

### Next Steps
- Train on the full dataset (70,000 images) using GPU
- Experiment with optimizers like Adam or RMSprop
- Add regularization techniques like Dropout or Batch Normalization
- Tune model architecture and hyperparameters to push accuracy beyond 90%

> This project was created by **Feroz Khan** to apply foundational concepts of PyTorch in a real-world classification task.



In [None]:
## Let's use GPUs to load more data and train efficiently

In [None]:
from google.colab import files
files.upload()

Saving fashion-mnist_train.csv to fashion-mnist_train.csv
Buffered data was truncated after reaching the output size limit.

In [None]:
# Check GPU
import torch

device = torch.device('gpu' if torch.cuda.is_available() else 'gpu')
print(device)

RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, maia, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu