In [None]:
# Understanding concepts of using Pre-Trained models for feature extraction

In [None]:
# The key concepts and steps involved in using a pre-trained MobileNet V2 model
# for feature extraction in deep learning

"""
MobileNet V2:
 MobileNetV2 is a convolutional neural network architecture designed for efficient
 and lightweight deep learning tasks. It's particularly useful for mobile and
 embedded applications due to its compact design while maintaining respectable accuracy.
 The architecture contains multiple layers, including depthwise separable convolutions,
 that allow it to achieve a good trade-off between model size and performance.


Feature Extraction:
 Feature extraction involves using a pre-trained deep learning model to capture
 meaningful features from input data. Instead of training the entire model from scratch,
 you leverage the knowledge already acquired by the model during its original
 training on a large dataset.


Bottleneck Layer:
 The "bottleneck layer" refers to the last convolutional layer before the final
 classification layers in the MobileNet V2 architecture. This layer is called a
 "bottleneck" because it represents a condensed version of the feature map before
 it's flattened and passed to the fully connected layers for classification.
 The features in this layer are more general and abstract compared to the final
 classification layer, making them useful for various tasks beyond the specific
 classes in ImageNet.


Include Top Argument:
 When you instantiate a MobileNet V2 model using a deep learning framework like Keras,
 you have the option to specify the include_top argument. By setting include_top=False,
 you're indicating that you want to exclude the final classification layers from the model.
 This is commonly done when using the model for feature extraction, as you're interested
 in the intermediate features rather than the final classification.


Instantiating Pre-trained Model:
 The next step involves creating an instance of the MobileNet V2 model pre-loaded
 with weights that were trained on the ImageNet dataset. These pre-trained weights
 contain knowledge about various features present in different images.
 By leveraging these weights, you save time and resources that would have been
 required to train the model from scratch.

"""


In [None]:
# Some important concepts related to BatchNormalization layers in the context
# of fine-tuning a deep learning model

"""
BatchNormalization Layer:
 BatchNormalization is a technique used in deep neural networks to stabilize and
 accelerate training. It helps mitigate issues like internal covariate shift by
 normalizing the activations of a layer across mini-batches during training.
 This leads to faster convergence and better generalization.


Fine-Tuning:
 Fine-tuning involves taking a pre-trained neural network (often on a large dataset)
 and adapting it to a new task or dataset. Instead of training the entire network
 from scratch, you adjust the existing weights to make them suitable for the new problem.


layer.trainable = False:
 In the context of fine-tuning, you might want to freeze certain layers (make them
 non-trainable) in the pre-trained model to avoid overfitting or losing the learned
 features. By setting layer.trainable = False for specific layers, you prevent
 them from being updated during training.


Inference Mode for BatchNormalization:
 When you set a BatchNormalization layer's trainable attribute to False, it switches
 to inference mode during training. In this mode, the BatchNormalization layer uses
 the previously computed mean and variance statistics to normalize activations.
 This is important because updating mean and variance statistics during fine-tuning
 could lead to instability.


Unfreezing and Inference Mode:
 When you decide to unfreeze (make trainable) certain layers in a model for fine-tuning,
 including BatchNormalization layers, it's crucial to ensure that the BatchNormalization
 layers stay in inference mode. This is done by explicitly passing training=False when
 calling the base model. If you fail to do this and let BatchNormalization layers update
 their statistics during fine-tuning, it could disrupt the knowledge learned by the model so far.

 """

In [None]:
# fetching the pre-trained model, while not including top layer
base_model = tf.keras.applications.MobileNetV2(input_shape=[160, 160, 3],
                                               include_top=False, # <---
                                               weights='imagenet')

# setting the trainable parameter of pre-trained model to False
base_model.trainable = False

# using preprocessing layer of pre-trained model
preprocess_layer = tf.keras.applications.mobilenet_v2.preprocess_input(inputs)

# set the trainable parameter to False for layers to be freezed while training
model = base_model(preprocess_layer, training = False)

# the above code will extract all the feature from the training image
# and output will be based on last layers except top of the pre-trained model
# e.g : [ batch_size, 5, 5, 1280]

In [None]:
# How to add a classification head to a feature block generated by a neural network.
# Head layer should be added as per custom problem with image data and number of classes

"""
Classification Head:
 A classification head is the final part of a neural network used to generate
 predictions for a specific task, such as image classification. It takes the
 extracted features from the previous layers and maps them to the target classes.


Global Average Pooling:
 Before passing the extracted features to the classification head, it's common to
 apply a global average pooling operation. This operation reduces the spatial
 dimensions of the feature maps while retaining the channel information. It involves
 taking the average of all the values in each channel, resulting in a single value
 for each channel. This reduces the overall number of parameters and provides a
 more compact representation.


tf.keras.layers.GlobalAveragePooling2D:
 tf.keras.layers.GlobalAveragePooling2D is a layer in TensorFlow that performs
 global average pooling on the feature maps. It converts the 2D spatial grid of
 values into a single value per channel. This is particularly useful when you have
 variable input sizes since it produces a fixed-size output regardless of the input size.


Dense Layer for Classification:
 After applying global average pooling, you use a tf.keras.layers.Dense layer to
 transform the pooled features into class predictions. This layer connects every
 element of the input vector to every output unit (class). It learns the weights
 that best map the features to class predictions.


Logit:
 The output of the tf.keras.layers.Dense layer is often referred to as logits.
 Logits are unnormalized prediction values. Positive logits indicate a higher
 likelihood of the presence of a certain class, while negative logits indicate a
 higher likelihood of the absence of a class. These logits will later be
 transformed using a softmax function to produce class probabilities.

""""

Activation Function for Logits:
 In the provided content, it's mentioned that you don't need an activation function
 after the tf.keras.layers.Dense layer. This is because the model will treat these
 predictions as raw logits. These logits will be passed through a softmax activation
 function during loss calculation to transform them into class probabilities.

In [None]:
# Creating Classification Head layers based on problem statement

# performs global average pooling on the feature maps. It converts the 2D spatial
# grid of values into a single value per channel.
avg_pool_layer = tf.keras.layers.GlobalAveragePooling2D()(model_layer)

# Dropout layer to avoid overfitting
dropout_layer = tf.keras.layers.Dropout(0.2)(avg_pool_layer)

# This layer connects every element of the input vector to every output unit (class)
# this is also called Logits (unnormalized prediction values)
output = tf.keras.layers.Dense(output_channels)(dropout_layer)

# this output value will be logits which further needs to be transformed using
# softmax function to produce class probablities

In [None]:
# Evaluating pre-trained model (base model) with validation dataset

loss_0, accuracy_0 = model.evaluate(validation_dataset)

# this will give the base accuracy and loss without training the model
# with our custom data

In [None]:
# After training the model, We can predict through model with test data

#Retrieve a batch of images from the test set
for img, label in test_dataset.take(1):
  image_batch, label_batch = img, label

# do the predictions on batch data
predictions = model.predict_on_batch(image_batch).flatten()

# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions) # as it has only 2 classes
predictions = tf.where(predictions < 0.5, 0, 1)

print('Predictions:\n', predictions.numpy())
print('Labels:\n', label_batch)

In [None]:
# Why validation metrics might appear better than training metrics in some cases
# during the training of a deep learning model

"""
BatchNormalization and Dropout:

  -> tf.keras.layers.BatchNormalization and tf.keras.layers.Dropout are regularization
     techniques used in neural networks to improve generalization and prevent overfitting.
  -> BatchNormalization normalizes the activations of a layer, which helps stabilize
     training by reducing internal covariate shift.
  -> Dropout randomly drops a fraction of the neurons during training, effectively
     preventing the network from relying too much on any single neuron.


Validation Metrics vs Training Metrics:

  -> During training, layers like BatchNormalization and Dropout are active, affecting
     the behavior of the network. This means that the training metrics (such as accuracy)
     calculated during training may be influenced by these layers.
  -> However, when evaluating the model on the validation set, these layers are
     typically turned off. This is because during validation or testing, you want a
     more representative evaluation of the model's generalization performance without
     the influence of Dropout or BatchNormalization.
  -> As a result, the validation metrics might appear better compared to the training
     metrics because the validation set evaluates the model under more controlled
     conditions without the regularization effects

Epoch-Level Averaging:

  -> Training metrics are often reported as averages over the course of an epoch.
    An epoch is completed when all training samples have been used once for training.
  -> Validation metrics, on the other hand, are evaluated at the end of each epoch
     after the model has seen the entire training dataset.
  -> Due to this timing difference, the validation metrics are evaluated on a slightly
     more trained model compared to the metrics reported during training.
  -> This difference in timing could contribute to the validation metrics appearing
     slightly better than the training metrics.
"""

# The content explains that the difference in behavior between training and validation
# metrics is influenced by the presence of regularization techniques like
# BatchNormalization and Dropout.

In [None]:
# Benefits of Using a pre-trained model for feature extraction:

# 1. It leverages the knowledge and features learned from a larger dataset, which
#    can improve performance on the small dataset.
# 2. It reduces the risk of overfitting by using features that are more generic
#    and transferable.
# 3. It requires fewer training iterations since the convolutional base is frozen,
#    saving both time and computational resources.