# MobileNet

In [2]:
import torch
import torch.nn as nn
import torchinfo

## Seperable Convolution

<img src="https://i.ibb.co/4scJtB9/image.png" alt="image" border="0">

Input feature maps $X$: $D_{in} \times D_{in} \times C_{in}$, 1 convolution kernel: $D_k \times D_k \times C_{in} \rightarrow$ output feature maps $G$: $D_{out} \times D_{out} \times N$. So the cost:
- Mults once: $D_K^2 \times C_{in}$
- Mults per conv kernel: $D_K^2 \times C_{in} \times D_{out}^2$
- Mults $C_{out}$ kernels: $D_K^2 \times C_{in} \times D_{out}^2 \times C_{out} \space (2)$

Depthwise Convolution: Filtering stage + Pointwise ConvolutionL Combining stage.
- Instead of using $D_{in} \times D_{in} \times C_{in}$, we use $M$ kernels $D_{in} \times D_{in} \times 1$. With each this kernel, we end up having $D_{out} \times D_{out} \times 1$, so stacking $M$ this output, we get $D_{out} \times D_{out} \times C_{in}$. So we have the cost: $D_K^2 \times D_{out}^2 \times C_{in}$
- We use $C_{out}$ convs $1 \times 1 \times C_{in}$, so we end up changing the channel size from $C_{in}$ to $C_{out}$. So we have the cost: $C_{in} \times D_{out}^2 \times C_{out}$
- Total cost is: $D_K^2 \times D_{out}^2 \times M + M \times D_{out}^2 \times N = M \times D_{out}^2 \times (D_K^2 + C_{out}) \space (1)$

So finally, we take:
$$(1) / (2) = \frac{D_K^2 + C_{out}}{D_K^2 \times C_{out}} = \frac{1}{C_{out}} + \frac{1}{D_K^2}$$

If $N=1024$ and $D_K=3$ (common), we get: $$\frac{1}{1024} + \frac{1}{9} = 0.112$$


In [3]:
class StandardConv(nn.Module):
  def __init__(self, C_in, C_out, K):
    super().__init__()
    self.conv = nn.Conv2d(in_channels=C_in, out_channels=C_out, kernel_size=K, stride=2, padding=1)

  def forward(self, x):
    return self.conv(x)

class SeperableConv(nn.Module):
  def __init__(self, C_in, C_out, K):
    super().__init__()
    self.depthwise = nn.Conv2d(in_channels=C_in, out_channels=C_in, kernel_size=K, groups=C_in, stride=2, padding=1)
    self.pointwise = nn.Conv2d(in_channels=C_in, out_channels=C_out, kernel_size=1)

  def forward(self, x):
    x = self.depthwise(x)
    x = self.pointwise(x)
    return x

In [4]:
x = torch.randn(1, 512, 128, 128)

standard_conv = StandardConv(C_in=512, C_out=1024, K=3)
standard_x = standard_conv(x)

seperable_conv = SeperableConv(C_in=512, C_out=1024, K=3)
sep_x = seperable_conv(x)

print(standard_x.shape, sep_x.shape)

output = torchinfo.summary(standard_conv, x.shape)
print(output.total_mult_adds, output.total_params)

output_sep = torchinfo.summary(seperable_conv, x.shape)
print(output_sep.total_mult_adds, output_sep.total_params)

print(f"Params: {output_sep.total_params / output.total_params}")
print(f"FLOP: {output_sep.total_mult_adds / output.total_mult_adds}")

torch.Size([1, 1024, 64, 64]) torch.Size([1, 1024, 64, 64])
19331547136 4719616
2172649472 530432
Params: 0.11238880451290953
FLOP: 0.11238880451290953


## Memory Cost Access (MCA)
- Memoy Cost Access refers to the time and energy required to read from or write to memory in a computing system.
- This cost vary depending on several factors: type of memory (e.g, cache, DRAM, or SSD), the location of data (e.g, on-chip or off-chip), and the access pattern (e.g., sequential or random access).
- Branching cause increased uintermediate data: each branch typically produces intermediate data that needs to be stored and accessed. This increases the amount of memory used and the frequency of memory accesses.