# Transfer Learning

## Introduction
Transfer learning is a machine learning technique that leverages knowledge gained from one task to improve the performance of a related but different task. In traditional machine learning, models are trained for a specific task from scratch using a labeled dataset. Transfer learning, on the other hand, allows a pre-trained model, often developed on a large dataset for a particular task, to be adapted for a new, related task with a smaller dataset.

The idea behind transfer learning is rooted in the notion that the knowledge acquired by a model in solving one problem can be valuable for solving a different, yet related, problem. This approach is particularly beneficial in situations where labeled data for the target task is limited or scarce.


This notebook will guide you through the process of adapting the pre-trained model for a new task: predicting whether an image is a rural or urban by working with [this dataset](https://www.kaggle.com/datasets/dansbecker/urban-and-rural-photos). You will engage with the specific dataset associated with this task.

The starting point is a pre-trained model on [ImageNet](https://en.wikipedia.org/wiki/ImageNet), an extensive dataset containing over 14 million images across thousands of categories. Keras provides access to various models (see [here](https://keras.io/api/applications/)) pre-trained on ImageNet, including ResNet, Xception and InceptionV3 etc.

The initial layers of a deep learning model are adept at identifying simple shapes, while the subsequent layers excel at recognizing intricate visual patterns like roads, buildings, windows, and open fields. For our new application, these later layers prove valuable.

The very last layer of the pre-trained model is responsible for making predictions. To tailor it to our needs, we'll substitute this final layer with a dense layer featuring two nodes. One node gauges the urban aspect of the photo, while the other assesses its rural aspect. Theoretically, any node in the last prediction layer may contribute information about how urban an image is. Therefore, the measure of urbanism can be influenced by all nodes in this layer. Similarly, each node's information may impact our measure of how rural the photo appears.

Given the multitude of connections, we will utilize training data to discern which nodes indicate an image is urban, which suggest it is rural, and which nodes are inconsequential. Essentially, we are training the last layer of the model using labeled photos categorized as either rural or urban.

Note: While classifying into only two categories would require only one node in the output, we have retained two separate nodes. This approach, with a distinct node for each potential category in the output layer, facilitates a seamless transition when predicting with more than two categories.

### Specify the Model

In this application, we aim to classify photos into two categories or classes: urban and rural. We designate this as num_classes.

To construct the model, we initiate a sequential model that allows us to progressively add layers. Initially, we incorporate a pre-trained ResNet model. The parameter `include_top=False` is used to specify the exclusion of the last layer of the ResNet model responsible for predictions. Additionally, we utilize a file without weights for that particular layer.

The argument `pooling='avg'` indicates that if there are extra channels in our tensor after this step, we want to condense them into a 1D tensor by averaging. At this point, we have a pre-trained model creating the layer depicted in the graphic. We then introduce a `Dense` layer for making predictions, specifying the number of nodes equal to the number of classes. The softmax function is applied to generate probabilities.

Lastly, we instruct TensorFlow not to train the initial layer of the sequential model, which consists of the ResNet50 layers. This is because the model has already undergone pre-training with the ImageNet data.


In [1]:
# set random seed / make reproducible
import random
import numpy as np
import tensorflow as tf
seed = 123
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)

In [2]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

def create_model_from_resnet():
    num_classes = 2
    resnet_weights_path = './input/resnet50/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'

    my_new_model = Sequential()
    my_new_model.add(ResNet50(include_top=False, pooling='avg', weights=resnet_weights_path))
    my_new_model.add(Dense(num_classes, activation='softmax'))

    # Say not to train first layer (ResNet) model. It is already trained
    my_new_model.layers[0].trainable = False
    return my_new_model

In [3]:
my_new_model = create_model_from_resnet()
my_new_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 resnet50 (Functional)       (None, 2048)              23587712  
                                                                 
 dense (Dense)               (None, 2)                 4098      
                                                                 
Total params: 23591810 (90.00 MB)
Trainable params: 4098 (16.01 KB)
Non-trainable params: 23587712 (89.98 MB)
_________________________________________________________________


### Compile the Model

The compile command in TensorFlow instructs how to adjust the connections in the final layer of the network during training.

For the measure of loss or inaccuracy that we aim to minimize, we specify `categorical_crossentropy`. If you are acquainted with log-loss, this term is synonymous.

To minimize the categorical cross-entropy loss, we employ an algorithm called stochastic gradient descent (SGD).

Additionally, we request the code to report the accuracy metric, which represents the fraction of correct predictions. This metric is more intuitive than categorical cross-entropy scores, making it beneficial to display and assess the model's performance.

In [4]:
my_new_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

### Load the Image Data

Our raw data is divided into a training data directory and a validation data directory. Each of these directories contains subdirectories for urban and rural pictures. TensorFlow offers a powerful tool for handling images organized into directories based on their labels, namely the `ImageDataGenerator`.

The process involves two steps with `ImageDataGenerator`. First, we create the generator object in the abstract. We specify applying the ResNet preprocessing function each time it reads an image, and it can also generate additional images through data augmentation.

Next, we use the `flow_from_directory` command. We inform it of the data directory, the desired image size, the batch size (number of images to read at a time), and specify that we are classifying data into different categories. A similar approach is employed to set up data reading for the validation set.

`ImageDataGenerator` is particularly valuable when dealing with large datasets, as it eliminates the need to store the entire dataset in memory. However, it's also beneficial in scenarios with smaller datasets. Note that these are generators, requiring iteration to extract data.

In [5]:
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_size = 224

# without data augmentation
# data_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
# with data augmentation
data_generator = ImageDataGenerator(preprocessing_function=preprocess_input, horizontal_flip=True)


train_generator = data_generator.flow_from_directory(
        './input/urban-and-rural-photos/train',
        target_size=(image_size, image_size),
        batch_size=12,
        class_mode='categorical')

validation_generator = data_generator.flow_from_directory(
        './input/urban-and-rural-photos/val',
        target_size=(image_size, image_size),
        batch_size=20,
        class_mode='categorical')

Found 72 images belonging to 2 classes.
Found 20 images belonging to 2 classes.


### Fit the Model
Let's proceed with fitting the model. The training data is sourced from `train_generator`, while the validation data comes from `validation_generator`. Given that we have 72 training images and load 12 images per batch, we set the number of steps for a single epoch to 6 (`steps_per_epoch=6`). Similarly, there are 20 validation images, and as we load all 20 images in a single step, we use one validation step (validation_steps=1).

During the model training, progress updates will be displayed, showcasing the evolution of our loss function and accuracy. The dense layer connections are adjusted as the model refines its understanding of distinguishing between urban and rural photos. Upon completion, the model achieves 76% accuracy on the training data. Subsequently, it evaluates the validation data and attains an accuracy of 90%.

It's important to note that this dataset is relatively small, and caution should be exercised when interpreting validation scores derived from such limited data. The intention is to initiate the learning process with small datasets, allowing for quick model training to build foundational experience.

In [6]:
my_new_model.fit(
        train_generator,
        steps_per_epoch=6,
        validation_data=validation_generator,
        validation_steps=1)



<keras.src.callbacks.History at 0x1bb235a5ac0>

Despite the limited size of the training dataset, the achieved accuracy score is remarkably high, considering we trained on only 72 photos. One could easily amass a comparable number of photos using a smartphone, upload them to platforms like [Kaggle Datasets](https://www.kaggle.com/datasets), and construct a highly accurate model capable of distinguishing various subjects of interest.

The training process is remarkably swift. Ordinarily, training a neural network can be a time-consuming endeavor, especially when dealing with extensive datasets such as [ImageNet](https://en.wikipedia.org/wiki/ImageNet). This emphasizes the efficiency and effectiveness of transfer learning in accelerating the training process.

### Note on Results
The displayed validation accuracy may seem notably superior to the training accuracy at this stage, which might initially be perplexing.

This difference stems from the fact that the training accuracy is calculated at different intervals as the neural network undergoes refinement (updating the numbers in the convolutions to enhance model accuracy). During the initial encounter with training images, the weights haven't undergone extensive training or improvement yet, influencing the initial training accuracy calculation. These initial results are then averaged into the overall measure.

In contrast, validation loss and accuracy metrics are computed ***after*** the model has processed the entire dataset. At this juncture, the network has completed thorough training, leading to the determination of these scores. Although this disparity may be puzzling initially, it is not a significant concern in practice and is typically not a cause for worry.

### Try Other Pre-trained Model

Let's explore another pre-trained model to assess the effectiveness of transfer learning. Xception, highlighted in the previous slide as the most accurate model, boasts a top-1 accuracy of 79% and a top-5 accuracy of 94.5%.

In [7]:
from tensorflow.keras.applications import Xception
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

def create_model_from_xception():
    num_classes = 2

    my_new_model = Sequential()
    my_new_model.add(Xception(include_top=False, pooling='avg', weights="imagenet"))
    my_new_model.add(Dense(num_classes, activation='softmax'))

    # Say not to train first layer (ResNet) model. It is already trained
    my_new_model.layers[0].trainable = False
    return my_new_model

In [8]:
from tensorflow.keras.applications.xception import preprocess_input
from tensorflow.keras.preprocessing.image import ImageDataGenerator

image_size = 224

# without data augmentation
# data_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
# with data augmentation
data_generator = ImageDataGenerator(preprocessing_function=preprocess_input, horizontal_flip=True)


train_generator = data_generator.flow_from_directory(
        './input/urban-and-rural-photos/train',
        target_size=(image_size, image_size),
        batch_size=12,
        class_mode='categorical')

validation_generator = data_generator.flow_from_directory(
        './input/urban-and-rural-photos/val',
        target_size=(image_size, image_size),
        batch_size=20,
        class_mode='categorical')

Found 72 images belonging to 2 classes.
Found 20 images belonging to 2 classes.


In [9]:
my_new_model = create_model_from_xception()
my_new_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 xception (Functional)       (None, 2048)              20861480  
                                                                 
 dense_1 (Dense)             (None, 2)                 4098      
                                                                 
Total params: 20865578 (79.60 MB)
Trainable params: 4098 (16.01 KB)
Non-trainable params: 20861480 (79.58 MB)
_________________________________________________________________


In [10]:
my_new_model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [11]:
my_new_model.fit(
        train_generator,
        steps_per_epoch=6,
        validation_data=validation_generator,
        validation_steps=1)



<keras.src.callbacks.History at 0x1bb29aad430>