# MobileNet Architecture and Key Concepts

## **What is MobileNet?**
MobileNet is a family of lightweight deep learning models designed for mobile and embedded devices. It achieves efficiency and low computational cost through techniques like **depthwise separable convolutions**. These models are ideal for tasks requiring real-time predictions in resource-constrained environments.

### **MobileNet Versions**
1. **MobileNetV1**:
   - Introduced depthwise separable convolutions to reduce computations.
   - Lightweight but less accurate compared to newer versions.

2. **MobileNetV2**:
   - Added **inverted residual blocks** and **linear bottlenecks** for better feature extraction and efficiency.

3. **MobileNetV3**:
   - Combines ideas from **MobileNetV2** and **Neural Architecture Search (NAS)**.
   - Includes:
     - **Squeeze-and-Excitation (SE) Blocks** for channel-wise attention.
     - **Hard-Swish Activation** for improved gradients and efficiency.
     - **Small** and **Large** variants tailored for different resource requirements.

---

## **Key Architectural Features**

### **Depthwise Separable Convolutions**
- **Depthwise Convolution**: Applies one filter per input channel to reduce computation.
- **Pointwise Convolution**: Uses 1x1 convolutions to combine outputs from depthwise convolutions.

### **Inverted Residuals and Linear Bottlenecks** (MobileNetV2 and V3)
- **Inverted Residuals**: Shortcut connections between thin bottleneck layers.
- **Linear Bottleneck**: Prevents information loss by using linear activation instead of ReLU in bottleneck layers.

### **Squeeze-and-Excitation (SE) Blocks**
- Channel-wise attention mechanism that learns to focus on important features while ignoring irrelevant ones.

### **Hard-Swish Activation**
- A computationally efficient alternative to standard Swish activation, improving model performance with minimal cost.

---

## **Weights and Layers in MobileNet**

### **How Weights Work in MobileNet**
- MobileNet uses pre-trained weights when loaded with `weights='imagenet'`. These weights are optimized for recognizing features in ImageNet images, such as edges, textures, and shapes.
- The pre-trained weights are found in trainable layers (e.g., convolutional layers).

### **Accessing Weights in Layers**
1. **Inspecting Layers**:
   - Use a loop to inspect all layers in the model and identify which ones contain weights:
     ```python
     for i, layer in enumerate(base_model.layers):
         print(i, layer.name, type(layer), len(layer.weights))
     ```

2. **Extracting Weights**:
   - Locate a specific layer and retrieve its weights and biases:
     ```python
     conv_layer = base_model.layers[1]  # Replace with correct index
     weights, biases = conv_layer.get_weights()
     print("Weights shape:", weights.shape)
     print("Biases shape:", biases.shape)
     ```

3. **Handling Empty Weights**:
   - Layers like input or preprocessing layers do not have trainable weights, so `get_weights()` may return an empty list.
   - Ensure the model is initialized by passing a dummy input:
     ```python
     dummy_input = tf.random.normal([1, 224, 224, 3])
     base_model(dummy_input)  # Forward pass to build the model
     ```

### **Impact of Replacing Non-Trainable Layers**
- Non-trainable layers, such as input or activation layers, do not significantly affect training as they lack learnable parameters.
- Replacing them is generally safe but ensure downstream layers can handle the changes (e.g., modified input shapes).

### **When to Modify Weights**
- Modify trainable layers when adapting the model to new tasks (e.g., handling 5-channel input: RGB + mask + texture).
- Ensure new weights for additional channels are initialized properly (e.g., using random initialization) and fine-tuned during training.

---

## **Adapting MobileNet for 5-Channel Input**

### **Problem**
Standard MobileNet expects 3-channel input (RGB). To handle 5-channel input (e.g., RGB + mask + texture):
- Modify the first convolutional layer to accept 5 channels.

### **Steps to Modify**
1. **Inspect the Model**:
   - Identify the first convolutional layer with weights.

2. **Extend Weights**:
   - Add random weights for the additional channels:
     ```python
     new_weights = tf.concat([weights, tf.random.normal([3, 3, 2, weights.shape[-1]])], axis=2)
     ```

3. **Replace the Layer**:
   - Create a new input layer and attach the modified convolutional layer to the model.

4. **Fine-Tune the Model**:
   - Train the model on your dataset to adapt it to the new input format.

---

## **Conclusion**
- MobileNet is highly flexible and efficient, making it ideal for mobile and embedded applications.
- Pre-trained weights can be adapted to new tasks by modifying the architecture.
- Replacing non-trainable layers has minimal impact, but ensure compatibility with downstream layers.
- When adding new input channels, extend the first layer's weights and fine-tune the model for optimal performance.



In [1]:
import torch
from torchvision import models
from torchvision.models import MobileNet_V3_Large_Weights
from torch.nn import Conv2d

# Load the pre-trained MobileNetV3 model
mobilenet = models.mobilenet_v3_large(weights=MobileNet_V3_Large_Weights.IMAGENET1K_V1)

# Extract the first convolutional layer properties
first_conv_layer = mobilenet.features[0][0]

# Create a new Conv2d layer with 5 input channels instead of 3
new_first_conv_layer = Conv2d(
    in_channels=5,
    out_channels=first_conv_layer.out_channels,
    kernel_size=first_conv_layer.kernel_size,
    stride=first_conv_layer.stride,
    padding=first_conv_layer.padding,
    bias=(first_conv_layer.bias is not None)  # Preserve the use of bias if it was used
)

# Initialize the new convolutional layer weights
with torch.no_grad():
    # Copy weights from the first three channels
    new_first_conv_layer.weight[:, :3, :, :] = first_conv_layer.weight.clone()

    # Initialize the weights for the two new channels by averaging the original three channels
    new_channel_weights = first_conv_layer.weight.mean(dim=1, keepdim=True).expand(-1, 2, -1, -1)
    new_first_conv_layer.weight[:, 3:5, :, :] = new_channel_weights

# Replace the first conv layer in the model with the new layer
mobilenet.features[0][0] = new_first_conv_layer

# Verify the changes by printing the model summary or a specific part
print(mobilenet.features[0])



Downloading: "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v3_large-8738ca79.pth
100%|██████████| 21.1M/21.1M [00:00<00:00, 109MB/s]


Conv2dNormActivation(
  (0): Conv2d(5, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (1): BatchNorm2d(16, eps=0.001, momentum=0.01, affine=True, track_running_stats=True)
  (2): Hardswish()
)


In [3]:
from torchsummary import summary
summary(mobilenet, (5, 224, 224))


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 16, 112, 112]             720
       BatchNorm2d-2         [-1, 16, 112, 112]              32
         Hardswish-3         [-1, 16, 112, 112]               0
            Conv2d-4         [-1, 16, 112, 112]             144
       BatchNorm2d-5         [-1, 16, 112, 112]              32
              ReLU-6         [-1, 16, 112, 112]               0
            Conv2d-7         [-1, 16, 112, 112]             256
       BatchNorm2d-8         [-1, 16, 112, 112]              32
  InvertedResidual-9         [-1, 16, 112, 112]               0
           Conv2d-10         [-1, 64, 112, 112]           1,024
      BatchNorm2d-11         [-1, 64, 112, 112]             128
             ReLU-12         [-1, 64, 112, 112]               0
           Conv2d-13           [-1, 64, 56, 56]             576
      BatchNorm2d-14           [-1, 64,