# Calculating score for Micronet Challange

Remark: We don't have to account for Batch Norm parameters and FLOPS - https://tehnokv.com/posts/fusing-batchnorm-and-conv/ since they can be fused with the preceding convolutional layer, Dropout is set to 0% so we don't need parameters/FLOPS for that too.

`counting.py` script is taken from https://github.com/google-research/google-research/blob/master/micronet_challenge/counting.py

### Imports

In [1]:
import torch
import torch.nn as nn
import numpy as np
import models
from counting import Conv2D, FullyConnected, GlobalAvg, Add, count_ops

# CIFAR100 submission

Let's look at WideResNet architecture which is a baseline architecture for calculating the score. It is also the architecture that we prune using our proposed method, hence the baseline model and our pruned model use the same architectures - the difference between them are sparsity levels per layer.

In [2]:
num_classes = 100
model_cfg = getattr(models, 'WideResNet28x10')
model = model_cfg.base(*model_cfg.args, num_classes=num_classes, **model_cfg.kwargs)
print(model)

WideResNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (layer1): Sequential(
    (0): WideBasic(
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(16, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (dropout): Dropout(p=0.0)
      (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (shortcut): Sequential(
        (0): Conv2d(16, 160, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (1): WideBasic(
      (bn1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (dropout): Dropout(p=0.0)
      (bn2): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(160, 160, kernel_size=(3, 3), s

Helper function

In [3]:
ADD_BIT_BASE = 32
MUL_BIT_BASE = 32
BASELINE_PARAMETER_BITS = 32

In [4]:
def process_counts(total_params, total_mults, total_adds, mul_bits, add_bits):
    # converting to Mbytes.
    total_params = int(total_params) / 8. / 1e6
    total_mults = total_mults * mul_bits / MUL_BIT_BASE / 1e6
    total_adds = total_adds * add_bits / ADD_BIT_BASE  / 1e6
    return total_params, total_mults, total_adds

Let's get all the operations in the WideResNet28x10 architecture - as stated earlier in the remark we can ignore BatchNorm and Dropout layers.

In [5]:
start_image_size = 32
#kernel_shape = k_size, _, c_in, c_out

In [6]:
model_ops = []
input_size = start_image_size
index = 0
input_sizes = [32]*13 + [16]*9 + [8]*7
residuals = [3, 5, 7, 9, 12, 14, 16, 18, 21, 23, 25, 27]
mask_for_sparsities = []
for child in list(model.modules()):
    if isinstance(child, nn.Conv2d):
        input_size = input_sizes[index]
        op = Conv2D(input_size=input_size, 
                    kernel_shape=child.kernel_size + (child.in_channels, child.out_channels), 
                    strides=child.stride,
                    padding='same' if child.padding==(1,1) else 'valid',
                    use_bias=True,
                    activation=None if (index-3)%9==0 else 'relu')
        model_ops.append(op)
        mask_for_sparsities.append(1)
        if index in residuals:
            op = Add(input_size=input_sizes[index+1],
                     n_channels=child.out_channels)
            model_ops.append(op)
            mask_for_sparsities.append(0)
        index += 1
    if isinstance(child, nn.Linear):
        op = GlobalAvg(input_size=input_sizes[-1],
                      n_channels=child.in_features)
        model_ops.append(op)
        mask_for_sparsities.append(0)
        op = FullyConnected(kernel_shape=(child.in_features, child.out_features),
                           use_bias=True,
                           activation=None)
        model_ops.append(op)
        mask_for_sparsities.append(1)

Now let's calculate the sparsities for each layer. Our method prunes the architecture on per-layer basis, hence the sparsity levels across layers differ. We can calculate the sparsity for each layer using `weigth_mask` (bit mask for the model weight). The percentage of 0s is equivalent to the sparsity level of a given layer. (Note: the order of layers in the `weight_mask` is the same as the order of layers in the `model_ops` list).

In [7]:
path = "weightsmasks_wideresnet.bin"
weight_mask = torch.load(path, map_location='cpu')
total_weights_per_layer = [l.numel() for l in weight_mask]
zeros_per_layer = [torch.sum(l==0).item() for l in weight_mask]
sparsity_per_layer = [zeros/total for (zeros, total) in zip(zeros_per_layer, total_weights_per_layer)]

# Pad with zeros for Add and Pool layers
sparsity_per_layer_temp = sparsity_per_layer
sparsity_per_layer = []
index = 0
for i in mask_for_sparsities:
    if i == 0:
        sparsity_per_layer.append(0)
    else:
        sparsity_per_layer.append(sparsity_per_layer_temp[index])
        index += 1

### Counts for the baseline WideResNet28x10

In [8]:
total_params_baseline, total_mults_baseline, total_adds_baseline = [0] * 3
for op in model_ops:
    param_count, flop_mults, flop_adds = count_ops(op=op, sparsity=0, param_bits=BASELINE_PARAMETER_BITS)
    total_params_baseline += param_count
    total_mults_baseline += flop_mults
    total_adds_baseline += flop_adds

total_params_baseline, total_mults_baseline, total_adds_baseline = process_counts(
                                             total_params = total_params_baseline,
                                             total_mults = total_mults_baseline,
                                             total_adds = total_adds_baseline, 
                                             mul_bits = MUL_BIT_BASE,
                                             add_bits = ADD_BIT_BASE)

### Counts for our pruned model of WideResNet28x10

For our pruned model we are using the 'freebie' 16-bit quantization.

In [9]:
PARAMETER_BITS = 16
ADD_BITS = 32
MULT_BITS = 16

In [10]:
total_params, total_mults, total_adds = [0] * 3
for op, sparsity in zip(model_ops, sparsity_per_layer):
    param_count, flop_mults, flop_adds = count_ops(op, sparsity, PARAMETER_BITS)
    total_params += param_count
    total_mults += flop_mults
    total_adds += flop_adds
    
total_params, total_mults, total_adds = process_counts(total_params = total_params,
                                                           total_mults = total_mults,
                                                           total_adds = total_adds, 
                                                           mul_bits = MULT_BITS,
                                                           add_bits = ADD_BITS)

### Parameter storage score

In [11]:
param_score = total_params/total_params_baseline
print('Our parameter score is: {}.'.format(param_score))

Our parameter score is: 0.06181214293465461.


### Math operations score

In [12]:
total_flops_baseline = total_adds_baseline + total_mults_baseline
total_flops = total_adds + total_mults
flops_score = total_flops/total_flops_baseline
print('Our math operations score is: {}.'.format(flops_score))

Our math operations score is: 0.04401098197357224.


### Total score

In [13]:
total_score = param_score + flops_score
print('Our total score is: {}'.format(total_score))

Our total score is: 0.10582312490822685
