# **Image Embeddings With Tabular Classification Model**

In this tutorial, we will walk through the process of building an image classifier using embeddings from a pre-trained ResNet model combined with a custom Multi-Layer Perceptron (MLP). We'll train the MLP on embeddings extracted from ResNet, which will handle feature extraction from the CIFAR-10 dataset.

## Tips
For best performance, ensure that the runtime is set to use a GPU (`Runtime > Change runtime type > T4 GPU`).

## Help & Questions

If you have any questions, please reachout on our [Discord](https://discord.gg/dncQwFdN9m).

You can also use our [documenation](https://docs.modlee.ai/README.html) as a reference for using our package.


## Step 1: Environment Setup

First, we need to make sure that we have the necessary packages installed.

## Step 2: Importing Libraries

In this section, we import the necessary libraries from `PyTorch` and `Torchvision`.

In [None]:
!pip3 install modlee torch torchvision pytorch-lightning torchtext==0.18.0

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader
import os
import cv2
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Subset, random_split
from torchvision import datasets, models, transforms
import modlee
from torch.utils.data import DataLoader, Subset, random_split, TensorDataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


Now we will set our Modlee API key and initialize the Modlee package.
Make sure that you have a Modlee account and an API key [from the dashboard](https://www.dashboard.modlee.ai/).
Replace `replace-with-your-api-key` with your API key.

In [15]:
# Set the API key to an environment variable,
# to simulate setting this in your shell profile
os.environ['MODLEE_API_KEY'] = "replace-with-your-api-key"
modlee.init(api_key=os.environ['MODLEE_API_KEY'])


## Step 3: Data Preprocessing and Augmentation
Next, we define a sequence of transformations to preprocess the images. These transformations help in data augmentation by introducing randomness into the dataset (such as horizontal flips and random cropping), making the model more robust to variations in the input data. Images are resized to (224, 224) to match the input size required by the pre-trained `ResNet-50` model.


In [16]:
transform = transforms.Compose([
    # Randomly flip the image horizontally with a probability of 0.5
    transforms.RandomHorizontalFlip(),

    # Randomly crop the image to 32x32 pixels with a padding of 4 pixels
    transforms.RandomCrop(32, padding=4),

    # Resize the image to 224x224 pixels to match the input size expected by ResNet-50
    transforms.Resize((224, 224)),

    # Convert the image to a PyTorch tensor
    transforms.ToTensor(),

    # Normalize the image with mean and standard deviation for each channel
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])


## Step 4: Loading and Splitting the Dataset
We load the `CIFAR-10` dataset, which consists of 60,000 images belonging to 10 different classes. We then create a subset of 1,000 images for faster experimentation and split it into training (80%) and validation (20%) datasets using `random_split`. This ensures that part of the data is held out for evaluation during training.

In [17]:
# Load the CIFAR-10 dataset with the specified transformations
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

# Create a subset of the dataset for quicker experimentation
subset_size = 1000
indices = list(range(subset_size))
subset_dataset = Subset(train_dataset, indices)

# Split the subset into training and validation sets
train_size = int(0.8 * len(subset_dataset))
val_size = len(subset_dataset) - train_size
train_dataset, val_dataset = random_split(subset_dataset, [train_size, val_size])

Files already downloaded and verified


## Step 5: Creating DataLoaders for Batch Processing

We define `DataLoaders` for both the training and validation datasets, setting the batch size to 64. `DataLoaders` are responsible for loading the data in batches, which is crucial for efficient training of deep learning models.

In [18]:
# Create a DataLoader for the training dataset with shuffling enabled
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Create a DataLoader for the validation dataset without shuffling
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)


## Step 6: Loading a Pre-Trained ResNet-50 Model

We load a pre-trained `ResNet-50` model from `torchvision.models` and modify it to output image embeddings instead of predictions by removing its fully connected (classification) layer. This allows us to use `ResNet` as a feature extractor, providing input features for our custom classifier.

In [19]:
# Load a pre-trained ResNet-50 model
resnet = models.resnet50(pretrained=True)

# Remove the final fully connected layer to get feature embeddings
resnet = nn.Sequential(*list(resnet.children())[:-1]).to(device)

## Step 7: Defining a Custom Multi-Layer Perceptron (MLP) Classifier

We define a custom Multi-Layer Perceptron (MLP) classifier using fully connected layers, batch normalization, and dropout for regularization. The MLP receives the embeddings from `ResNet` as input and outputs class probabilities for the 10 `CIFAR-10` classes.

In [20]:

class MLP(modlee.model.TabularClassificationModleeModel):
    def __init__(self, input_size, num_classes):
        super().__init__()
        # Define the layers of the MLP model
        self.model = nn.Sequential(
            nn.Linear(input_size, 256),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 128),
            nn.BatchNorm1d(128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )
        self.loss_fn = nn.CrossEntropyLoss()

    def forward(self, x):
        return self.model(x)  # Forward pass through the MLP

    def training_step(self, batch):
        embeddings, labels = batch
        logits = self.forward(embeddings)  # Forward pass
        loss = self.loss_fn(logits, labels)  # Compute loss
        return loss

    def validation_step(self, batch):
        embeddings, labels = batch
        logits = self.forward(embeddings)  # Forward pass
        loss = self.loss_fn(logits, labels)  # Compute validation loss
        return loss

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=1e-4)

## Step 8: Defining the Number of Classes and Initializing the MLP Model

After defining the structure of the MLP model, the next step is to specify the number of output classes, which corresponds to the number of unique labels in our dataset.

In the `CIFAR-10` dataset, there are 10 different classes, each representing a distinct object.



We then initialize our MLP model by passing the input_size of the embeddings produced by `ResNet-50` and the `num_classes` for classification. This model will map the 2048-dimensional embeddings to the 10 class labels.

In [21]:
# Define the number of output classes for the classification task
num_classes = 10

# Initialize the MLP model with the specified input size and number of classes
mlp_image = MLP(input_size=2048, num_classes=num_classes).to(device)

## Step 9: Precomputing Image Embeddings Using ResNet-50

`ResNet-50` transforms images into numerical embeddings that can be fed into our model. First, we pass the raw images through the pre-trained `ResNet-50` model, which extracts high-level features from each image.

In [22]:
# Precompute embeddings using ResNet-50
def precompute_embeddings(dataloader, model, device):
    model.eval()
    embeddings_list = []
    labels_list = []

    with torch.no_grad():
        for images, labels in dataloader:
            images = images.to(device)
            labels = labels.to(device)
            embeddings = model(images).squeeze()  # Extract features using ResNet
            embeddings_list.append(embeddings)
            labels_list.append(labels)

    return torch.cat(embeddings_list), torch.cat(labels_list)

In [23]:
# Precompute embeddings for training and validation datasets
print("Precomputing embeddings for training and validation data")
train_embeddings, train_labels = precompute_embeddings(train_loader, resnet, device)
val_embeddings, val_labels = precompute_embeddings(val_loader, resnet, device)

# Create TensorDataset for precomputed embeddings and labels
train_embedding_dataset = TensorDataset(train_embeddings, train_labels)
val_embedding_dataset = TensorDataset(val_embeddings, val_labels)

# Create DataLoaders for the precomputed embeddings
train_embedding_loader = DataLoader(train_embedding_dataset, batch_size=64, shuffle=True)
val_embedding_loader = DataLoader(val_embedding_dataset, batch_size=64, shuffle=False)

Precomputing embeddings for training and validation data


## Step 10: Training the Model
We define the `train_model` function, which handles the training loop. The model uses `Cross-Entropy Loss` as the loss function and the `Adam optimizer` for weight updates.

In [24]:
def train_model(model, dataloader, num_epochs=1):
    # Define the loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

    for epoch in range(num_epochs):
        model.train()
        total_loss = 0
        correct = 0
        total = 0

        for embeddings, labels in dataloader:
            embeddings = embeddings.to(device)
            labels = labels.to(device)

            # Forward pass through the MLP model
            outputs = model(embeddings)
            loss = criterion(outputs, labels)

            # Perform backward pass and optimization
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            total_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        # Print average loss and accuracy for the epoch
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(dataloader):.4f}, Accuracy: {100 * correct / total:.2f}%')

## Step 11: Evaluating the Model
Finally, we define an `evaluate_model` function to evaluate the model's performance on the validation set.

In [25]:
def evaluate_model(model, dataloader):
    # Set the model to evaluation mode
    model.eval()
    with torch.no_grad():
        correct = 0
        total = 0

        for embeddings, labels in dataloader:
            embeddings = embeddings.to(device)
            labels = labels.to(device)

            # Forward pass through the MLP model
            outputs = model(embeddings)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        # Print the accuracy of the model on the dataset
        print(f'Accuracy: {100 * correct / total:.2f}%')

## Step 12: Training and Evaluating the Model

After defining the model architecture and setting up the data loaders, the final step involves training the model on the training dataset and evaluating its performance on the validation set. This is done by calling two main functions: `train_model()` and `evaluate_model()`.

In [26]:
# Train and evaluate the model
train_model(mlp_image, train_embedding_loader, num_epochs=5)
evaluate_model(mlp_image, val_embedding_loader)

Epoch [1/5], Loss: 2.2100, Accuracy: 18.88%
Epoch [2/5], Loss: 1.8648, Accuracy: 45.00%
Epoch [3/5], Loss: 1.6762, Accuracy: 58.62%
Epoch [4/5], Loss: 1.5282, Accuracy: 66.38%
Epoch [5/5], Loss: 1.4218, Accuracy: 70.88%
Accuracy: 66.50%


# **Great Job!**

We've successfully completed a machine learning project focused on image classification using a combination of ResNet-50 and a custom MLP. Here’s a quick recap of what we accomplished:

- Loaded and prepared the CIFAR-10 dataset.
- Feature extraction with ResNet-50.
- Built and trained a custom MLP.
- Evaluated the model.

This project has given you a solid understanding of combining pre-trained models with custom architectures for classification tasks. With this knowledge, you're well-equipped to experiment with other datasets, adjust model architectures, and further improve your machine learning skills. Keep exploring and building on this foundation!