<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/mobilenet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MobileNet Architecture

## Overview

MobileNet is a class of lightweight convolutional neural networks designed for mobile and embedded vision applications. These models are specifically engineered to be efficient in terms of size and speed while maintaining reasonable accuracy, making them ideal for deployment on resource-constrained devices.

## Architecture

The key innovation in MobileNet is the use of **depthwise separable convolutions** instead of standard convolutions. Each standard convolution is factored into:

1. A **depthwise convolution** - applies a single filter per input channel
2. A **pointwise convolution** (1×1 convolution) - combines the outputs from depthwise convolution

This factorization dramatically reduces computation and model size:
- Standard convolution: O(D_K · D_K · M · N · D_F · D_F)
- Depthwise separable: O(D_K · D_K · M · D_F · D_F + M · N · D_F · D_F)

Where:
- D_K: Kernel size
- M: Input channels
- N: Output channels
- D_F: Feature map size

## MobileNet Versions

### MobileNet V1
- Introduced depthwise separable convolutions
- Used ReLU6 activation
- Added width multiplier and resolution multiplier to further reduce computation

### MobileNet V2
- Added inverted residual blocks
- Linear bottlenecks between layers
- Skip connections for feature reuse

### MobileNet V3
- Combined automated architecture search with human design
- Squeeze-and-Excitation blocks
- Redesigned expensive layers
- Hard-Swish activation

## Applications

- Mobile and embedded computer vision
- Real-time object detection
- Facial recognition on edge devices
- Augmented reality
- Low-power IoT vision systems

## TensorFlow Implementation (MobileNet V2)

```python
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2

# Load pre-trained MobileNetV2
model = MobileNetV2(
    input_shape=(224, 224, 3),
    alpha=1.0,  # Width multiplier
    include_top=True,
    weights='imagenet',
    classes=1000
)

# For transfer learning
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')
base_model.trainable = False

# Add custom classification head
x = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
x = tf.keras.layers.Dense(1024, activation='relu')(x)
predictions = tf.keras.layers.Dense(10, activation='softmax')(x)
custom_model = tf.keras.Model(inputs=base_model.input, outputs=predictions)
```

## Performance Metrics

| Model | Parameters | MAdds | Top-1 Accuracy | Top-5 Accuracy |
|---|---|---|---|---|
| MobileNet V1 (1.0) | 4.2M | 569M | 70.6% | 89.5% |
| MobileNet V2 (1.0) | 3.4M | 300M | 72.0% | 91.0% |
| MobileNet V3-Large | 5.4M | 219M | 75.2% | 92.2% |
| MobileNet V3-Small | 2.5M | 66M | 67.4% | 86.4% |

## References

- Howard, A. G., et al. (2017). [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861). arXiv.
- Sandler, M., et al. (2018). [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). CVPR.
- Howard, A., et al. (2019). [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244). ICCV.