# Transfer Learning with Embeddings


`#pytorch` `#xgboost` `#embeddings` `#transfer-learning` `#resnet` `#convolutions` `#vectorization` `#resnet-18` `#resnet-50`

> Objectives:
>
> - Compare the workflows and resulting performance of a classification model trained on image data versus a model trained on embeddings.


## Exercise Goal


In this exercise, we will train an XGBoost Classifier model to analyze provided images and categorize them as images of "coffee" or "toast".

We will train in two ways:

1. [Phase 1] Directly on image data (`model_direct`) using XGBClassifier
1. [Phase 2] On the outputs of the Resnet18 model (`model_embeddings`)

And then compare the accuracy


## Imports


In [1]:
import os
import numpy as np
import torch
from PIL import Image
from sklearn.model_selection import train_test_split
from torchvision import transforms
from torchvision.models import resnet18, ResNet18_Weights
from torchinfo import summary
from xgboost import XGBClassifier

  from .autonotebook import tqdm as notebook_tqdm


## Initialize training data


In [2]:
image_paths, labels = [], []

folders = ["toast", "coffee"]

# For each folder, store the image paths and labels
for label, folder in enumerate(folders):
    folder = f"./downloads/Dataset_Example/{folder}"
    for filename in os.listdir(folder):
        if filename.endswith(".jpg") or filename.endswith(".png"):
            image_paths.append(os.path.join(folder, filename))
            labels.append(
                label
            )  # 0 or 1, for "toast" or "coffee" respectively

print(f"Found {len(image_paths)} images")
print(f"Labels: {set(labels)}")

Found 10342 images
Labels: {0, 1}


## Train test split


In [3]:
# First split into train/test
train_paths, test_paths, train_labels, test_labels = train_test_split(
    image_paths, labels, test_size=0.2, random_state=42
)
# Then split train into train/val
train_paths, val_paths, train_labels, val_labels = train_test_split(
    train_paths, train_labels, test_size=0.25, random_state=42
)

## Preprocess the images before training with XGBClassifier

In [4]:
transform_image = transforms.Compose(
    [
        transforms.Resize((128, 128)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],  # Resnet-specific values
        ),
    ]
)


def preprocess_images(image_paths):
    images = []
    for img_path in image_paths:
        # Open the image and read it as Red-Green-Blue values per pixel
        image = Image.open(img_path).convert("RGB")
        # Apply the transform to the image
        image = transform_image(image)
        # Flatten the image into a 1D array of features for XGBoost
        images.append(image.numpy().flatten())
    return np.array(images)


# Convert paths to image data using the preprocess_images function
train_images = preprocess_images(train_paths)
val_images = preprocess_images(val_paths)
test_images = preprocess_images(test_paths)



## Phase 1: Train with XGBClassifier

In [5]:
model_direct = XGBClassifier()
model_direct.fit(
    train_images,
    train_labels,
    early_stopping_rounds=10,
    eval_metric="logloss",
    eval_set=[(val_images, val_labels)],
)



[0]	validation_0-logloss:0.62941
[1]	validation_0-logloss:0.60727
[2]	validation_0-logloss:0.59269
[3]	validation_0-logloss:0.57728
[4]	validation_0-logloss:0.56560
[5]	validation_0-logloss:0.55550
[6]	validation_0-logloss:0.54480
[7]	validation_0-logloss:0.54114
[8]	validation_0-logloss:0.53546
[9]	validation_0-logloss:0.53256
[10]	validation_0-logloss:0.52941
[11]	validation_0-logloss:0.52510
[12]	validation_0-logloss:0.52385
[13]	validation_0-logloss:0.52308
[14]	validation_0-logloss:0.52208
[15]	validation_0-logloss:0.51964
[16]	validation_0-logloss:0.51987
[17]	validation_0-logloss:0.52001
[18]	validation_0-logloss:0.52003
[19]	validation_0-logloss:0.51534
[20]	validation_0-logloss:0.51458
[21]	validation_0-logloss:0.51414
[22]	validation_0-logloss:0.51758
[23]	validation_0-logloss:0.51682
[24]	validation_0-logloss:0.51574
[25]	validation_0-logloss:0.51578
[26]	validation_0-logloss:0.51732
[27]	validation_0-logloss:0.51482
[28]	validation_0-logloss:0.51538
[29]	validation_0-loglos

0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,
,device,
,early_stopping_rounds,
,enable_categorical,False


## Accuracy with XGB


In [6]:
print("Train accuracy:", model_direct.score(train_images, train_labels))
print("Validation accuracy:", model_direct.score(val_images, val_labels))
print("Test accuracy:", model_direct.score(test_images, test_labels))

Train accuracy: 0.9919406834300452
Validation accuracy: 0.7539874335427743
Test accuracy: 0.7515708071532141


The results above represent our baseline: A model trained on image data directly. It performs well enough but can be improved on.


## Phase 2: Training on Resnet18 Embeddings: leverage transfer learning


In [7]:
model = resnet18(weights=ResNet18_Weights.DEFAULT, progress=False)

# Switch the model out of its default training mode,
# into evaluation mode: since we just want to use it to convert images into embeddings, and not train the model
# turn of backpropagation / updating weights
model.eval()

# Print the model summary
# Note that it outputs a vector of 1000 features.
print(summary(model, input_size=(1, 3, 224, 224), depth=0))

Layer (type:depth-idx)                   Output Shape              Param #
ResNet                                   [1, 1000]                 11,689,512
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
Total mult-adds (Units.GIGABYTES): 1.81
Input size (MB): 0.60
Forward/backward pass size (MB): 39.75
Params size (MB): 46.76
Estimated Total Size (MB): 87.11


In [8]:
## Define the image transformations specific to Resnet
transform_resnet = transforms.Compose(
    [
        transforms.Resize((224, 224)),  # Resnet-specific image size
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[
                0.229,
                0.224,
                0.225,
            ],  # Resnet-specific standardization values
        ),
    ]
)


def get_embeddings(image_paths):
    features = []
    with torch.no_grad():
        for img_path in image_paths:
            image = Image.open(img_path).convert("RGB")
            image = transform_resnet(image).unsqueeze(0)
            output = model(image)
            features.append(output.numpy())
    features = np.array(features)
    features = features.reshape((features.shape[0], -1))
    return features


# Convert paths to embedding vectors using the get_embeddings function
train_embeddings = get_embeddings(train_paths)
val_embeddings = get_embeddings(val_paths)
test_embeddings = get_embeddings(test_paths)



## Train our new model on the outputs of Resnet18:
  The only difference is the training data.Because the training data is transformed into terse 1000-feature long vectors, training this models takes much less time.


In [9]:
model_embeddings = XGBClassifier()
model_embeddings.fit(
    train_embeddings,
    train_labels,
    early_stopping_rounds=10,
    eval_metric="logloss",
    eval_set=[(val_embeddings, val_labels)],
)



[0]	validation_0-logloss:0.49455
[1]	validation_0-logloss:0.39804
[2]	validation_0-logloss:0.34054
[3]	validation_0-logloss:0.30334
[4]	validation_0-logloss:0.27440
[5]	validation_0-logloss:0.25732
[6]	validation_0-logloss:0.24675
[7]	validation_0-logloss:0.23890
[8]	validation_0-logloss:0.23396
[9]	validation_0-logloss:0.22853
[10]	validation_0-logloss:0.22592
[11]	validation_0-logloss:0.22540
[12]	validation_0-logloss:0.22574
[13]	validation_0-logloss:0.22534
[14]	validation_0-logloss:0.22745
[15]	validation_0-logloss:0.22911
[16]	validation_0-logloss:0.23182
[17]	validation_0-logloss:0.23328
[18]	validation_0-logloss:0.23351
[19]	validation_0-logloss:0.23295
[20]	validation_0-logloss:0.23313
[21]	validation_0-logloss:0.23338
[22]	validation_0-logloss:0.23542


0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,
,device,
,early_stopping_rounds,
,enable_categorical,False


## Compare accuracy


In [10]:
print(
    "Train accuracy:",
    model_embeddings.score(train_embeddings, train_labels),
)
print(
    "Validation accuracy:",
    model_embeddings.score(val_embeddings, val_labels),
)
print(
    "Test accuracy:", model_embeddings.score(test_embeddings, test_labels)
)

Train accuracy: 0.9866215344938749
Validation accuracy: 0.91686805219913
Test accuracy: 0.9279845335911068
