# Using a Pre-trained Convnet

This notebook provides lecture notes on leveraging pre-trained convolutional neural networks (convnets) for image classification tasks. Using pre-trained models is a powerful technique that can significantly reduce training time and improve performance, especially when working with limited datasets.

We will cover two main approaches:

1.  **Feature Extraction:** Using the pre-trained network as a fixed feature extractor.
2.  **Fine-tuning:** Unfreezing some of the layers of the pre-trained network and jointly training them with a new classifier.

## Feature Extraction

Feature extraction involves using the convolutional base of a pre-trained network to extract meaningful features from new images. These features are then fed into a new, smaller classifier (typically a few dense layers) that is trained on the new dataset.

The idea is that the pre-trained convnet has already learned a hierarchy of features from a large dataset (like ImageNet), and these features are general enough to be useful for various image recognition tasks.

Mathematically, if we have an input image $x$ and a pre-trained convnet $f$, feature extraction can be seen as applying the convolutional base $f_{conv}$ to $x$ to get a feature map $F(x) = f_{conv}(x)$. This feature map is then flattened and passed through a new classifier $g$: $\hat{y} = g(\text{flatten}(F(x)))$.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical

# Load a sample dataset (e.g., MNIST from scikit-learn)
# MNIST is not ideal for showcasing ConvNets on images, but it's readily available
# For better results, you would typically use a dataset like CIFAR-10 or your own image data
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

# Preprocess the data
X = X.values.reshape(-1, 28, 28, 1).astype('float32') / 255.0
y = to_categorical(y)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Load the VGG16 model pre-trained on ImageNet
# We don't include the top classification layer
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(48, 48, 3)) # VGG16 expects 48x48 or larger images, and 3 channels

# Since MNIST is 28x28 and grayscale, we need to resize and convert to 3 channels
# This is a workaround for demonstration purposes. For real tasks, use appropriate datasets.
X_train_resized = np.repeat(np.array([np.pad(img.squeeze(), ((10,10),(10,10)), 'constant') for img in X_train]).reshape(-1, 48, 48, 1), 3, axis=-1)
X_test_resized = np.repeat(np.array([np.pad(img.squeeze(), ((10,10),(10,10)), 'constant') for img in X_test]).reshape(-1, 48, 48, 1), 3, axis=-1)


# Extract features
# This can be computationally expensive for large datasets
print("Extracting features...")
train_features = conv_base.predict(X_train_resized)
test_features = conv_base.predict(X_test_resized)
print("Features extracted.")

# Flatten the extracted features
train_features = train_features.reshape(train_features.shape[0], -1)
test_features = test_features.reshape(test_features.shape[0], -1)

# Build a new classifier model
model = Sequential()
model.add(Flatten(input_shape=train_features.shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dense(y_train.shape[1], activation='softmax'))

# Compile and train the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Training classifier...")
history = model.fit(train_features, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_data=(test_features, y_test))
print("Classifier trained.")

# Evaluate the model
loss, accuracy = model.evaluate(test_features, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m58889256/58889256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Extracting features...
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m971s[0m 555ms/step
[1m438/438[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m240s[0m 548ms/step
Features extracted.
Training classifier...
Epoch 1/10


  super().__init__(**kwargs)


[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.7927 - loss: 0.7121 - val_accuracy: 0.9306 - val_loss: 0.2185
Epoch 2/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9414 - loss: 0.1840 - val_accuracy: 0.9488 - val_loss: 0.1601
Epoch 3/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9542 - loss: 0.1459 - val_accuracy: 0.9596 - val_loss: 0.1294
Epoch 4/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9584 - loss: 0.1301 - val_accuracy: 0.9631 - val_loss: 0.1196
Epoch 5/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9638 - loss: 0.1135 - val_accuracy: 0.9566 - val_loss: 0.1301
Epoch 6/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9643 - loss: 0.1101 - val_accuracy: 0.9616 - val_loss: 0.1202
Epoch 7/10
[1m1750/1750[0

## Fine-tuning

Fine-tuning is a more advanced technique where we not only add a new classifier on top of the pre-trained convolutional base but also unfreeze some of the layers in the convolutional base and train the entire model end-to-end on the new data.

This allows the pre-trained model to adapt its learned features to the specific characteristics of the new dataset. It's crucial to unfreeze only the top layers of the convolutional base, as the lower layers have learned more general features, while the higher layers have learned more specific features.

The process typically involves:

1.  Adding a custom classifier on top of the pre-trained convolutional base.
2.  Freezing the convolutional base.
3.  Training the custom classifier.
4.  Unfreezing some layers of the convolutional base.
5.  Jointly training the unfrozen layers and the custom classifier with a very low learning rate.

Mathematically, we are now training the parameters of both the unfrozen part of $f_{conv}$ (let's call it $f'_{conv}$) and the new classifier $g$. The combined model is $h(x) = g(\text{flatten}(f'_{conv}(x)))$, and we optimize the parameters of $f'_{conv}$ and $g$.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from tensorflow.keras.applications import VGG16
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

# Load and preprocess data (same as feature extraction example)
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = X.values.reshape(-1, 28, 28, 1).astype('float32') / 255.0
y = to_categorical(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Resize and convert to 3 channels (same as feature extraction example)
X_train_resized = np.repeat(np.array([np.pad(img.squeeze(), ((10,10),(10,10)), 'constant') for img in X_train]).reshape(-1, 48, 48, 1), 3, axis=-1)
X_test_resized = np.repeat(np.array([np.pad(img.squeeze(), ((10,10),(10,10)), 'constant') for img in X_test]).reshape(-1, 48, 48, 1), 3, axis=-1)


# Load the VGG16 model pre-trained on ImageNet
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(48, 48, 3))

# Add a custom classifier on top
x = conv_base.output
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
predictions = Dense(y_train.shape[1], activation='softmax')(x)

model = Model(inputs=conv_base.input, outputs=predictions)

# Freeze the convolutional base
for layer in conv_base.layers:
    layer.trainable = False

# Compile and train the classifier (Stage 1: Train only the top layers)
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Training top layers...")
model.fit(X_train_resized, y_train,
          epochs=5, # Train for fewer epochs in this stage
          batch_size=32,
          validation_data=(X_test_resized, y_test))
print("Top layers trained.")

# Unfreeze some layers of the convolutional base (Stage 2: Fine-tuning)
# Decide how many layers to unfreeze. Unfreezing the last few blocks is common.
# You need to inspect the model summary to identify layer names/indices.
# For VGG16, block5_conv1, block5_conv2, block5_conv3 are in the last block.
for layer in conv_base.layers[15:]: # Unfreeze from a certain layer onwards (e.g., layer 15)
    layer.trainable = True

# Recompile the model with a lower learning rate for fine-tuning
model.compile(optimizer=Adam(learning_rate=1e-5), # Use a very low learning rate
              loss='categorical_crossentropy',
              metrics=['accuracy'])

print("Fine-tuning...")
history_fine_tune = model.fit(X_train_resized, y_train,
                              epochs=10, # Train for more epochs in this stage
                              batch_size=32,
                              validation_data=(X_test_resized, y_test))
print("Fine-tuning complete.")

# Evaluate the model
loss, accuracy = model.evaluate(X_test_resized, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

Training top layers...
Epoch 1/5
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1307s[0m 747ms/step - accuracy: 0.7868 - loss: 0.7057 - val_accuracy: 0.9409 - val_loss: 0.1943
Epoch 2/5
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1418s[0m 811ms/step - accuracy: 0.9414 - loss: 0.1862 - val_accuracy: 0.9540 - val_loss: 0.1511
Epoch 3/5
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1272s[0m 727ms/step - accuracy: 0.9539 - loss: 0.1481 - val_accuracy: 0.9592 - val_loss: 0.1328
Epoch 4/5
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1246s[0m 712ms/step - accuracy: 0.9576 - loss: 0.1299 - val_accuracy: 0.9586 - val_loss: 0.1303
Epoch 5/5
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1334s[0m 742ms/step - accuracy: 0.9631 - loss: 0.1202 - val_accuracy: 0.9630 - val_loss: 0.1175
Top layers trained.
Fine-tuning...
Epoch 1/10
[1m1750/1750[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2017s[0m 1s/step - accuracy: 

## Conclusion

Using pre-trained convnets through feature extraction or fine-tuning are effective strategies for image classification, especially with limited data. Feature extraction is simpler and faster, while fine-tuning can potentially yield better performance by adapting the pre-trained features to the new dataset. The choice between the two depends on the size of your dataset and the similarity between the original task the model was trained on and your new task.

Remember to experiment with different pre-trained models, the number of layers to unfreeze during fine-tuning, and hyperparameters to achieve the best results for your specific problem.