<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/efficient.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EfficientNet

## Introduction

EfficientNet is a family of convolutional neural networks introduced by Tan and Le from Google Research in 2019. The architecture set new state-of-the-art accuracy on ImageNet classification while being significantly more efficient in terms of parameters and computational cost than previous models.

## Key Innovation: Compound Scaling

EfficientNet's primary innovation is **compound scaling**, a principled method for scaling neural networks across three dimensions:

1. **Depth** - number of layers
2. **Width** - number of channels in each layer
3. **Resolution** - input image size

Previous approaches typically scaled only one dimension (e.g., making networks deeper), which leads to diminishing returns. EfficientNet scales all three dimensions simultaneously using a compound coefficient:

* Depth: $d = \alpha^\phi$
* Width: $w = \beta^\phi$
* Resolution: $r = \gamma^\phi$

Where $\alpha$, $\beta$, and $\gamma$ are constants determined through a grid search, and $\phi$ is the compound coefficient that controls the available resources.

![EfficientNet Compound Scaling](https://miro.medium.com/max/1400/1*CnNorCR4Zdq7pVchdsRGxA.png)

## EfficientNet Baseline Architecture

EfficientNet-B0 serves as the baseline architecture, designed through neural architecture search (NAS) optimizing for both accuracy and efficiency. It consists primarily of **mobile inverted bottleneck convolution (MBConv)** blocks, first introduced in MobileNetV2.

### Architecture of EfficientNet-B0

| Stage | Operator | Resolution | Channels | Layers |
|-------|----------|------------|----------|--------|
| 1 | Conv3x3 | 224×224 | 32 | 1 |
| 2 | MBConv1, k3x3 | 112×112 | 16 | 1 |
| 3 | MBConv6, k3x3 | 112×112 | 24 | 2 |
| 4 | MBConv6, k5x5 | 56×56 | 40 | 2 |
| 5 | MBConv6, k3x3 | 28×28 | 80 | 3 |
| 6 | MBConv6, k5x5 | 14×14 | 112 | 3 |
| 7 | MBConv6, k5x5 | 14×14 | 192 | 4 |
| 8 | MBConv6, k3x3 | 7×7 | 320 | 1 |
| 9 | Conv1x1 & Pooling & FC | 7×7 | 1280 | 1 |

## EfficientNet Variants

EfficientNet has multiple variants, scaled from the baseline EfficientNet-B0 using the compound scaling approach:

| Model | Resolution | Parameters | FLOPS | Top-1 Accuracy (%) |
|-------|------------|------------|-------|--------------------|
| EfficientNet-B0 | 224×224 | 5.3M | 0.39B | 77.1 |
| EfficientNet-B1 | 240×240 | 7.8M | 0.70B | 79.1 |
| EfficientNet-B2 | 260×260 | 9.2M | 1.0B | 80.1 |
| EfficientNet-B3 | 300×300 | 12M | 1.8B | 81.6 |
| EfficientNet-B4 | 380×380 | 19M | 4.2B | 82.9 |
| EfficientNet-B5 | 456×456 | 30M | 9.9B | 83.6 |
| EfficientNet-B6 | 528×528 | 43M | 19B | 84.0 |
| EfficientNet-B7 | 600×600 | 66M | 37B | 84.3 |

## MBConv Block

The core building block of EfficientNet is the **Mobile Inverted Bottleneck Convolution (MBConv)** with squeeze-and-excitation optimization:

1. **Expansion layer**: 1x1 convolution that increases the number of channels
2. **Depthwise convolution**: Spatial convolution applied to each channel separately
3. **Squeeze-and-excitation block**: Adaptively recalibrates channel-wise features
4. **Projection layer**: 1x1 convolution that reduces channels back to output size

MBConv also implements **residual connections** similar to ResNet when input and output dimensions match.

![MBConv Block](https://miro.medium.com/max/1400/1*ExS7GpW-C5zBJENw9-mZrQ.png)

## Implementation Example

### Using TensorFlow/Keras

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import EfficientNetB0

# Load pre-trained EfficientNet model
model = EfficientNetB0(weights='imagenet', include_top=True, input_shape=(224, 224, 3))

# Summary of the model architecture
model.summary()

In [None]:
# Inference example
import numpy as np
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.efficientnet import preprocess_input, decode_predictions

# Load and preprocess image
img_path = 'path_to_image.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Make prediction
preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=5)[0])

### Using PyTorch

In [None]:
import torch
import torchvision.models as models

# Load pre-trained EfficientNet model
model = models.efficientnet_b0(pretrained=True)
model.eval()

# Print model architecture
print(model)

## EfficientNetV2 and Beyond

In 2021, the authors released **EfficientNetV2**, which improves training speed and efficiency while maintaining accuracy. Key improvements include:

1. **Fused-MBConv blocks** that replace some MBConv blocks
2. **Progressive learning** with gradually increasing image sizes during training
3. **Improved training techniques** like smaller expansion ratios in early layers
4. **Reduced activation memory** usage through network architecture refinements

EfficientNetV2 achieves better performance with 6.8× faster training speed than the original EfficientNet.

## Applications

EfficientNet has found widespread use across numerous applications:

1. **Mobile vision tasks** - Due to its efficiency on resource-constrained devices
2. **Medical image analysis** - Adapting to various medical imaging modalities
3. **Transfer learning tasks** - As a feature extractor for downstream tasks
4. **Object detection backbones** - In frameworks like EfficientDet
5. **Edge computing** - For deploying computer vision on IoT devices

The efficiency-accuracy trade-off makes EfficientNet particularly valuable for real-world deployment scenarios where computational resources are limited.

## Comparison with Other Architectures

![Comparison Chart](https://miro.medium.com/max/1400/1*0J4QEJtXO_-lG-HQRUBQHw.png)

* **vs. ResNet**: Much higher accuracy with far fewer parameters
* **vs. MobileNet**: Higher accuracy while maintaining computational efficiency
* **vs. SENet**: Better performance with refined use of squeeze-and-excitation blocks
* **vs. NASNet**: Similar performance but with more principled scaling approach

## Key Takeaways

1. EfficientNet revolutionized network scaling with its compound scaling method
2. Scale depth, width, and resolution in a balanced, principled way
3. Utilize MBConv blocks with squeeze-and-excitation for efficiency
4. Consider the appropriate EfficientNet variant based on your resource constraints
5. Adapt scaling coefficients to your specific problem domain when fine-tuning

## References

1. Tan, M., & Le, Q. (2019). [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946). ICML.
2. Tan, M., & Le, Q. (2021). [EfficientNetV2: Smaller Models and Faster Training](https://arxiv.org/abs/2104.00298). ICML.
3. Sandler, M., et al. (2018). [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). CVPR.
4. Hu, J., et al. (2018). [Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507). CVPR.
5. Tan, M., et al. (2020). [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070). CVPR.