# EfficientNet Explained:
    
Main intution behind Efficient is to perform scaling on:

+ **Resolution:**  
    If resolution of picture is high it contains more information.
    + With higher resolution input images, ConvNets can potentially capture more fine-grained patterns. However, for very high resolutions, the accuracy gains disminishes.
    + How much resolution is needed fot out task?
+ **Depth:**  
    Depth of a network is responsible for models accuracy.
    + Deeper ConvNets capture more complex features and generalize well. However, more difficult to train due to vanishing gradient. Although techniques such as “skip connections” and “batch normalization” are alleviating the training problem, the accuracy gain diminishes for very deep network.
    + How much depth scaling is required for particular increment in resolution? 
+ **Width:**  
    Width scaling refers to wider networks (more channels) to capture more features
    + For high resolution images we can have more number of channels to capture fine grained patterns
    +  Wider networks tend to capture more fined-grained features and are easier to train. However, accuracy for such network tends to quickly saturate.
    + How much width scaling is required to increase the performance?
   
![](./fig/Efficient.png)   

Some observations made by the authors of paper: " Rethinking Model Scaling for Convolutional Neural Networks"

+ Scaling up any dimension of network width, depth or resolution improves accuracy, but the accuracy gains dimnishes for bigger models.

+ In order to pursue better accuracy and efficiency, It is critical to balance all dimensions of the network width, depth, and resolution during scaling.

We can then think about scaling multiple dimension at one time. It is possible to scale two or three dimensions arbitrarily, requiring manual tuning which often yields to sub-optimal accuracy and efficiency.

In this paper, they are trying to address the following issue:
+ “Is there a principled method to scale up ConvNets that can achieve better accuracy and efficiency ?”

Their empirical study shows that it is critical to balance all dimensions of network (width/depth/resolution) at the same time.
Such balance can be achieved by scaling each of them by a constant ratio. This method is called “compound scaling method”, which consists of uniformly scales the network width, depth and resolution with a set of fixed scaling coefficients.

The intuition comes from the following fact:

+ If the input image is bigger (resolution), then there is more complex-features and fine-grained patterns. To capture more complex-feature, the network needs bigger receptive field which is achieved by adding more layers (depth). To capture more fine-grained patterns, the network needs more channels.

## **Baseline Model: EfficientNet B0**

Before even understanding and performing compound scaling we need to have a baseline model to work with.
Here the baseline model is Efficientnet-B0

![](./fig/image2.png)

The baseline network is developed by performing a neural architecture search using the AutoML MNAS framework, which optimizes both accuracy and efficiency (FLOPS). The resulting architecture uses mobile inverted bottleneck convolution (MBConv), similar to MobileNetV2 and MnasNet, but is slightly larger due to an increased FLOP budget. Then we scale up the baseline network to obtain a family of models, called EfficientNets.

### **EfficientNet-B0 Architecture**
![](./fig/baseline.png)

The main building block of this network consists of MBConv to which squeeze-and-excitation optimization is added. MBConv is similar to the inverted residual blocks used in MobileNet v2. These form a shortcut connection between the beginning and end of a convolutional block. The input activation maps are first expanded using 1x1 convolutions to increase the depth of the feature maps. This is followed by 3x3 Depth-wise convolutions and Point-wise convolutions that reduce the number of channels in the output feature map. The shortcut connections connect the narrow layers whilst the wider layers are present between the skip connections. This structure helps in decreasing the overall number of operations required as well as the model size.

![](./fig/squeeze.png)

### **EfficientNet Performance**

In general, the EfficientNet models achieve both higher accuracy and better efficiency over existing CNNs, reducing parameter size and FLOPS by an order of magnitude. Below you can see the comparisions.

![](./fig/performance.png)

### **Compound Model Scaling: A Better Way to Scale Up CNNs**

While scaling individual dimensions improves model performance, we observed that balancing all dimensions of the network—width, depth, and image resolution—against the available resources would best improve overall performance.The first step in the compound scaling method is to perform a grid search to find the relationship between different scaling dimensions of the baseline network under a fixed resource constraint (e.g., 2x more FLOPS).his determines the appropriate scaling coefficient for each of the dimensions mentioned above. We then apply those coefficients to scale up the baseline network to the desired target model size or computational budget.

Finding a set of good coefficients to scale these dimensions for each layer is impossible, since the search space is huge. So, in order to restrict the search space, the authors lay down a set of ground rules.
+ All the layers/stages in the scaled models will use the same convolution operations as the baseline network
+ All layers must be scaled uniformly with constant ratio

![](./fig/dwr.png)

Intuitively, $\phi$ is a user-defined coeffecient that determines how much extra resources are available. The constants $\alpha, \beta, \gamma$ determine how to distribute these extra resources accross networks depth(d), width(w) and input resolution(r).

Given that we have some extra resources $\alpha, \beta, \gamma$ can be determined using a small grid search and thus we can scale networks depth, width and input resolution to get a bigger network.

Starting from the baseline EfficientNet-B0, we apply our compound scaling method to scale it up with two steps:

+ STEP 1: we first fix φ = 1, assuming twice more resources available, and do a small grid search of α, β, γ. In particular, we find the best values for EfficientNet-B0 are α = 1.2, β = 1.1, γ = 1.15, under constraint of α * β2 * γ2 ≈ 2.
+ STEP 2: we then fix α, β, γ as constants and scale up baseline network with different φ, to obtain EfficientNet-B1 to B7.