<a href="https://colab.research.google.com/github/pvtien96/CORING/blob/main/notebooks/Random_vs_Norm_based_vs_Similarity_based_Pruning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Section 1: Introduction**

This notebook is the second episode in the serie of the project filter pruning.

## Purpose:
Accelerate the project.

## Key Components:
1. **Load baseline model**: Load a pretrained model instead of training the baseline from scratch, to save resource. *You don't need to worry about these details initially*.

2. **Evaluate a model**: Evaluate a model through 3 criteria: accuracy, number of parameter and MACs.

2. **Add functions**: Add 3 functions for random, norm-based and distance-based pruning.

3. **Fine-tuning**: Fine-tunes the pruned model to assess its performance in terms of accuracy and efficiency compared to the original model.

4. **Analysis & Conclusion**: Analyze the results, highlights insights gained from the experiment, and provides suggestions for further exploration or improvement.

## Prerequisite:
- [First notebook](https://github.com/pvtien96/CORING/blob/main/notebooks/Similarity_based_Filter_Pruning.ipynb)

Let's dive in and explore the exciting world of filter pruning and deep learning efficiency optimization!


# **Section 2: Setup**

Environment

In [113]:
!pip3 install torch torchvision torchaudio



Define VGG-16-BN model

In [114]:
import time
import torch
import torch.nn as nn
from collections import OrderedDict

defaultcfg = [
    64,
    64,
    "M",
    128,
    128,
    "M",
    256,
    256,
    256,
    "M",
    512,
    512,
    512,
    "M",
    512,
    512,
    512,
]


class VGG(nn.Module):
    def __init__(self, compress_rate=[0.0] * 13, cfg=None, num_classes=10):
        super(VGG, self).__init__()

        if cfg is None:
            cfg = defaultcfg

        self.compress_rate = compress_rate[:]

        self.features = self._make_layers(cfg)
        last_conv_out_channels = self.features[-3].out_channels
        self.classifier = nn.Sequential(
            OrderedDict(
                [
                    ("linear1", nn.Linear(last_conv_out_channels, cfg[-1])),
                    ("norm1", nn.BatchNorm1d(cfg[-1])),
                    ("relu1", nn.ReLU(inplace=True)),
                    ("linear2", nn.Linear(cfg[-1], num_classes)),
                ]
            )
        )

    def _make_layers(self, cfg):
        layers = nn.Sequential()
        in_channels = 3
        cnt = 0

        for i, x in enumerate(cfg):
            if x == "M":
                layers.add_module("pool%d" % i, nn.MaxPool2d(kernel_size=2, stride=2))
            else:
                x = int(x * (1 - self.compress_rate[cnt]))
                cnt += 1
                conv2d = nn.Conv2d(in_channels, x, kernel_size=3, padding=1)
                layers.add_module("conv%d" % i, conv2d)
                layers.add_module("norm%d" % i, nn.BatchNorm2d(x))
                layers.add_module("relu%d" % i, nn.ReLU(inplace=True))
                in_channels = x

        return layers

    def forward(self, x):
        x = self.features(x)
        x = nn.AvgPool2d(2)(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x


def vgg_16_bn(compress_rate=[0.0] * 13):
    return VGG(compress_rate=compress_rate)

Helper functions

In [115]:
import re

def get_cpr(compress_rate):
    cprate_str = compress_rate
    cprate_str_list = cprate_str.split("+")
    pat_cprate = re.compile(r"\d+\.\d*")
    pat_num = re.compile(r"\*\d+")
    cprate = []
    for x in cprate_str_list:
        num = 1
        find_num = re.findall(pat_num, x)
        if find_num:
            assert len(find_num) == 1
            num = int(find_num[0].replace("*", ""))
        find_cprate = re.findall(pat_cprate, x)
        assert len(find_cprate) == 1
        cprate += [float(find_cprate[0])] * num

    return cprate

In [116]:

import os
import sys
import shutil
import time, datetime
import logging
import numpy as np
from PIL import Image
from pathlib import Path

import torch
import torch.nn as nn
import torch.utils


'''record configurations'''
class record_config():
    def __init__(self, args):
        now = datetime.datetime.now().strftime('%Y-%m-%d-%H:%M:%S')
        today = datetime.date.today()

        self.args = args
        self.job_dir = Path(args.job_dir)

        def _make_dir(path):
            if not os.path.exists(path):
                os.makedirs(path)

        _make_dir(self.job_dir)

        config_dir = self.job_dir / 'config.txt'
        #if not os.path.exists(config_dir):
        if args.resume:
            with open(config_dir, 'a') as f:
                f.write(now + '\n\n')
                for arg in vars(args):
                    f.write('{}: {}\n'.format(arg, getattr(args, arg)))
                f.write('\n')
        else:
            with open(config_dir, 'w') as f:
                f.write(now + '\n\n')
                for arg in vars(args):
                    f.write('{}: {}\n'.format(arg, getattr(args, arg)))
                f.write('\n')


def get_logger(file_path):

    logger = logging.getLogger('gal')
    log_format = '%(asctime)s | %(message)s'
    formatter = logging.Formatter(log_format, datefmt='%m/%d %I:%M:%S %p')
    file_handler = logging.FileHandler(file_path)
    file_handler.setFormatter(formatter)
    stream_handler = logging.StreamHandler()
    stream_handler.setFormatter(formatter)

    logger.addHandler(file_handler)
    logger.addHandler(stream_handler)
    logger.setLevel(logging.INFO)

    return logger

#label smooth
class CrossEntropyLabelSmooth(nn.Module):

  def __init__(self, num_classes, epsilon):
    super(CrossEntropyLabelSmooth, self).__init__()
    self.num_classes = num_classes
    self.epsilon = epsilon
    self.logsoftmax = nn.LogSoftmax(dim=1)

  def forward(self, inputs, targets):
    log_probs = self.logsoftmax(inputs)
    targets = torch.zeros_like(log_probs).scatter_(1, targets.unsqueeze(1), 1)
    targets = (1 - self.epsilon) * targets + self.epsilon / self.num_classes
    loss = (-targets * log_probs).mean(0).sum()
    return loss


class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)


class ProgressMeter(object):
    def __init__(self, num_batches, meters, prefix=""):
        self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
        self.meters = meters
        self.prefix = prefix

    def display(self, batch):
        entries = [self.prefix + self.batch_fmtstr.format(batch)]
        entries += [str(meter) for meter in self.meters]
        print(' '.join(entries))

    def _get_batch_fmtstr(self, num_batches):
        num_digits = len(str(num_batches // 1))
        fmt = '{:' + str(num_digits) + 'd}'
        return '[' + fmt + '/' + fmt.format(num_batches) + ']'


def save_checkpoint(state, is_best, save):
    if not os.path.exists(save):
        os.makedirs(save)
    filename = os.path.join(save, 'checkpoint.pth.tar')
    torch.save(state, filename)
    if is_best:
        best_filename = os.path.join(save, 'model_best.pth.tar')
        shutil.copyfile(filename, best_filename)


def adjust_learning_rate(optimizer, epoch, args):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr


def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res



def progress_bar(current, total, msg=None):
    _, term_width = os.popen('stty size', 'r').read().split()
    term_width = int(term_width)

    TOTAL_BAR_LENGTH = 65.
    last_time = time.time()
    begin_time = last_time

    if current == 0:
        begin_time = time.time()  # Reset for new bar.

    cur_len = int(TOTAL_BAR_LENGTH*current/total)
    rest_len = int(TOTAL_BAR_LENGTH - cur_len) - 1

    sys.stdout.write(' [')
    for i in range(cur_len):
        sys.stdout.write('=')
    sys.stdout.write('>')
    for i in range(rest_len):
        sys.stdout.write('.')
    sys.stdout.write(']')

    cur_time = time.time()
    step_time = cur_time - last_time
    last_time = cur_time
    tot_time = cur_time - begin_time

    L = []
    L.append('  Step: %s' % format_time(step_time))
    L.append(' | Tot: %s' % format_time(tot_time))
    if msg:
        L.append(' | ' + msg)

    msg = ''.join(L)
    sys.stdout.write(msg)
    for i in range(term_width-int(TOTAL_BAR_LENGTH)-len(msg)-3):
        sys.stdout.write(' ')

    # Go back to the center of the bar.
    for i in range(term_width-int(TOTAL_BAR_LENGTH/2)+2):
        sys.stdout.write('\b')
    sys.stdout.write(' %d/%d ' % (current+1, total))

    if current < total-1:
        sys.stdout.write('\r')
    else:
        sys.stdout.write('\n')
    sys.stdout.flush()


def format_time(seconds):
    days = int(seconds / 3600/24)
    seconds = seconds - days*3600*24
    hours = int(seconds / 3600)
    seconds = seconds - hours*3600
    minutes = int(seconds / 60)
    seconds = seconds - minutes*60
    secondsf = int(seconds)
    seconds = seconds - secondsf
    millis = int(seconds*1000)

    f = ''
    i = 1
    if days > 0:
        f += str(days) + 'D'
        i += 1
    if hours > 0 and i <= 2:
        f += str(hours) + 'h'
        i += 1
    if minutes > 0 and i <= 2:
        f += str(minutes) + 'm'
        i += 1
    if secondsf > 0 and i <= 2:
        f += str(secondsf) + 's'
        i += 1
    if millis > 0 and i <= 2:
        f += str(millis) + 'ms'
        i += 1
    if f == '':
        f = '0ms'
    return f

In [117]:
def train(epoch, train_loader, model, criterion, optimizer, scheduler):
    losses = AverageMeter('Loss', ':.4e')
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')

    model.train()

    cur_lr = optimizer.param_groups[0]['lr']
    print('learning_rate: ' + str(cur_lr))

    num_iter = len(train_loader)
    print_freq = num_iter // 10
    for i, (images, target) in enumerate(train_loader):
        images = images.cuda()
        target = target.cuda()

        # compute output
        logits = model(images)
        loss = criterion(logits, target)

        # measure accuracy and record loss
        prec1, prec5 = accuracy(logits, target, topk=(1, 5))
        n = images.size(0)
        losses.update(loss.item(), n)  # accumulated loss
        top1.update(prec1.item(), n)
        top5.update(prec5.item(), n)

        # compute gradient and do SGD step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if i % print_freq == 0:
            print(
                'Epoch[{0}]({1}/{2}): '
                'Loss {loss.avg:.4f} '
                'Prec@1(1,5) {top1.avg:.2f}, {top5.avg:.2f} '
                'Lr {cur_lr:.4f}'.format(
                    epoch, i, num_iter, loss=losses,
                    top1=top1, top5=top5, cur_lr=cur_lr))
    scheduler.step()

    return losses.avg, top1.avg, top5.avg


def validate(val_loader, model, criterion):
    losses = AverageMeter('Loss', ':.4e')
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')

    # switch to evaluation mode
    model.eval()
    with torch.no_grad():
        for i, (images, target) in enumerate(val_loader):
            images = images.cuda()
            target = target.cuda()

            # compute output
            logits = model(images)
            loss = criterion(logits, target)

            # measure accuracy and record loss
            pred1, pred5 = accuracy(logits, target, topk=(1, 5))
            n = images.size(0)
            losses.update(loss.item(), n)
            top1.update(pred1[0], n)
            top5.update(pred5[0], n)

        print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
                    .format(top1=top1, top5=top5))

    return losses.avg, top1.avg, top5.avg

In [118]:
import torchvision
from torchvision import datasets, transforms

def load_data(batch_size=128):

    # load training data
    transform_train = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    transform_test = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    trainset = torchvision.datasets.CIFAR10(root="./", train=True, download=True,
                                            transform=transform_train)
    train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)
    testset = torchvision.datasets.CIFAR10(root="./", train=False, download=True, transform=transform_test)
    val_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_loader, val_loader

In [119]:
# parameters
epochs = 100
lr_warmup_epochs=5
lr=0.01
momentum=0.9
weight_decay=5e-4
lr_warmup_decay=0.01

In [120]:
def finetune(model, train_loader, val_loader, epochs, criterion):
    optimizer = torch.optim.SGD(model.parameters(
    ), lr=lr, momentum=momentum, weight_decay=weight_decay)
    main_lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=epochs-lr_warmup_epochs)
    warmup_lr_scheduler = torch.optim.lr_scheduler.LinearLR(
        optimizer, start_factor=lr_warmup_decay, total_iters=lr_warmup_epochs)
    scheduler = torch.optim.lr_scheduler.SequentialLR(
        optimizer, schedulers=[warmup_lr_scheduler, main_lr_scheduler], milestones=[lr_warmup_epochs])

    _, best_top1_acc, _ = validate(val_loader, model, criterion)
    best_model_state = copy.deepcopy(model.state_dict())
    epoch = 0
    while epoch < epochs:
        train(epoch, train_loader, model, criterion,
              optimizer, scheduler)
        _, valid_top1_acc, _ = validate(val_loader, model, criterion)

        if valid_top1_acc > best_top1_acc:
            best_top1_acc = valid_top1_acc
            best_model_state = copy.deepcopy(model.state_dict())


        epoch += 1
        print('=>Best accuracy {:.3f}'.format(best_top1_acc))

    model.load_state_dict(best_model_state)

    return model

# **Section 3: Load the pretrained baseline model**

In [121]:
!wget https://github.com/pvtien96/CORING/releases/download/v0.1.0/vgg_16_bn.pt

--2024-04-04 15:33:18--  https://github.com/pvtien96/CORING/releases/download/v0.1.0/vgg_16_bn.pt
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/572465934/6bb9aca3-1335-40ce-8a25-df08be78e4eb?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240404%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240404T153318Z&X-Amz-Expires=300&X-Amz-Signature=f6bea0f40e9058ac9104ac0a5b8a407d24ba226d1991c385e7b456bed9d44270&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=572465934&response-content-disposition=attachment%3B%20filename%3Dvgg_16_bn.pt&response-content-type=application%2Foctet-stream [following]
--2024-04-04 15:33:18--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/572465934/6bb9aca3-1335-40ce-8a25-df08be78e4eb?X-Amz-Algorithm

In [122]:
 import copy


 # initialize model
model_ori = vgg_16_bn(compress_rate=[0.0]*13).cuda()
print(model_ori)

# load training data
train_loader, val_loader = load_data()
criterion = nn.CrossEntropyLoss().cuda()

# load the baseline model
checkpoint = torch.load("./vgg_16_bn.pt", map_location=torch.device('cuda:0'))
model_ori.load_state_dict(checkpoint['state_dict'])



VGG(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(inplace=True)
    (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu3): ReLU(inplace=True)
    (conv4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu4): ReLU(inplace=True)
    (pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, di

<All keys matched successfully>

# **Section 3: Evaluate a model**

The model is assessed via three
dimensions: accuracy, required Multiply Accumulate Opera-
tions (MACs), and the number of parameters (Params). The
compression ratio (CR) is quantified as the percentage reduc-
tion in MACs/Params when compared to the original model.

There're many tools to asses the model. In this serie, we use [ptflop](https://github.com/sovrasov/flops-counter.pytorch) to assess the model. Please read [this](https://stackoverflow.com/questions/58498651/what-is-flops-in-field-of-deep-learning) and [this](https://towardsdatascience.com/understanding-and-calculating-the-number-of-parameters-in-convolution-neural-networks-cnns-fc88790d530d) to understand more.

In [123]:
print("Evaluating the baseline model:")
_, accuracy_model_ori, _ = validate(val_loader, model_ori, criterion)
print(f"This model's accuracy is {accuracy_model_ori}")

Evaluating the baseline model:
 * Acc@1 93.960 Acc@5 99.730
This model's accuracy is 93.95999908447266


In [124]:
! pip install ptflops



In [125]:
from ptflops import get_model_complexity_info
with torch.cuda.device(0):
  macs, params = get_model_complexity_info(model_ori, (3, 32, 32), as_strings=False, print_per_layer_stat=True, verbose=False)

VGG(
  14.99 M, 100.000% Params, 314.69 MMac, 99.872% MACs, 
  (features): Sequential(
    14.72 M, 98.207% Params, 314.43 MMac, 99.787% MACs, 
    (conv0): Conv2d(1.79 k, 0.012% Params, 1.84 MMac, 0.582% MACs, 3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(128, 0.001% Params, 131.07 KMac, 0.042% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(0, 0.000% Params, 65.54 KMac, 0.021% MACs, inplace=True)
    (conv1): Conv2d(36.93 k, 0.246% Params, 37.81 MMac, 12.001% MACs, 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(128, 0.001% Params, 131.07 KMac, 0.042% MACs, 64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(0, 0.000% Params, 65.54 KMac, 0.021% MACs, inplace=True)
    (pool2): MaxPool2d(0, 0.000% Params, 65.54 KMac, 0.021% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(73.86 k, 0.493% Params,

In [126]:
print(f"The number of parameter and MACs of this model are {params} and {macs}, respectively.")

The number of parameter and MACs of this model are 14991946 and 315096586, respectively.


# **Section 4: 3 methods to prune the model**
Install function to evaluate the importance of filters


*   Random
*   Norm-based
*   Distance-based



In [127]:
compress_rate = [0.25]*13 # prune 25% of all layers
model_prune = vgg_16_bn(compress_rate=compress_rate).cuda()
print(model_prune)


VGG(
  (features): Sequential(
    (conv0): Conv2d(3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (conv1): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(inplace=True)
    (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu3): ReLU(inplace=True)
    (conv4): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu4): ReLU(inplace=True)
    (pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilatio

## **Section 4.1: Random**


In [128]:
def prune_random(model, model_ori):
    oristate_dict = model_ori.state_dict()
    state_dict = model.state_dict()
    last_select_index = None  # Conv index selected in the previous layer

    cnt = 0
    for name, module in model.named_modules():
        name = name.replace('module.', '')

        if isinstance(module, nn.Conv2d):
            cnt += 1
            oriweight = oristate_dict[name + '.weight']
            curweight = state_dict[name + '.weight']
            orifilter_num = oriweight.size(0)
            currentfilter_num = curweight.size(0)
            print(f"Processing layer {cnt}, original layer has {orifilter_num} filters, pruning model has {currentfilter_num} filters")


            if orifilter_num != currentfilter_num:
                cov_id = cnt
                #************ rank the filter's importance here
                rank = np.arange(1, orifilter_num + 1)
                np.random.shuffle(rank)
                #********************
                print(f"rank {rank}")
                select_index = np.argsort(
                    rank)[orifilter_num-currentfilter_num:]  # preserved filter id
                select_index.sort()

                if last_select_index is not None:
                    for index_i, i in enumerate(select_index):
                        for index_j, j in enumerate(last_select_index):
                            state_dict[name + '.weight'][index_i][index_j] = \
                                oristate_dict[name + '.weight'][i][j]
                else:
                    for index_i, i in enumerate(select_index):
                        state_dict[name + '.weight'][index_i] = \
                            oristate_dict[name + '.weight'][i]

                last_select_index = select_index

            elif last_select_index is not None:
                for i in range(orifilter_num):
                    for index_j, j in enumerate(last_select_index):
                        state_dict[name + '.weight'][i][index_j] = \
                            oristate_dict[name + '.weight'][i][j]
            else:
                state_dict[name + '.weight'] = oriweight
                last_select_index = None

    model.load_state_dict(state_dict)

In [129]:
prune_random(model_prune, model_ori)

Processing layer 1, original layer has 64 filters, pruning model has 48 filters
rank [13 55 38 40 21  1 33 43 51 58 28  9  2 53 12 20  3  5 52 48  4 54 26 35
 34 41 25 50  7 27 17 64 56 22 15 45 59 19 32 36 63  8 39 62 30  6 49 10
 44 37 46 31 61 42 24 57 47 18 14 29 60 16 23 11]
Processing layer 2, original layer has 64 filters, pruning model has 48 filters
rank [57  7 34 27 46 33 29 13 35 49 40 55 45  8 32 56 25 10 47 42 48 50 41 38
 60 11 18 31 14 22 62 16 51  3 28 19 52  9 63 23 44 17 64 59 21 15 37 54
 20 36 61  6 53  4 24 58 30  1 12 39 43  5  2 26]
Processing layer 3, original layer has 128 filters, pruning model has 96 filters
rank [ 21  70  89 107  46  57  87 111   6  58  67   5  80  83  47 115  43 108
  16  93  85 109  56   2 125 120  79 113  82  48 110 128  53   8 123  18
   4  51  33  17  38  29   1  36  25  52 104  22  11  66 101  34  95  98
  81  23  59 122  94  30  60 112 121  63  76 126 119  69  27  78  32 105
  74  90  64  41  97  14  54 114  65  55 103  24   7  61 100

In [130]:
finetune(model_prune, train_loader, val_loader, epochs=1, criterion=criterion)

 * Acc@1 10.000 Acc@5 50.000
learning_rate: 0.0001
Epoch[0](0/391): Loss 2.3605 Prec@1(1,5) 3.91, 49.22 Lr 0.0001
Epoch[0](39/391): Loss 2.0935 Prec@1(1,5) 28.22, 73.89 Lr 0.0001
Epoch[0](78/391): Loss 1.8369 Prec@1(1,5) 45.62, 82.97 Lr 0.0001
Epoch[0](117/391): Loss 1.6610 Prec@1(1,5) 52.67, 86.84 Lr 0.0001
Epoch[0](156/391): Loss 1.5287 Prec@1(1,5) 57.10, 89.17 Lr 0.0001
Epoch[0](195/391): Loss 1.4258 Prec@1(1,5) 60.11, 90.74 Lr 0.0001
Epoch[0](234/391): Loss 1.3396 Prec@1(1,5) 62.58, 91.85 Lr 0.0001
Epoch[0](273/391): Loss 1.2704 Prec@1(1,5) 64.42, 92.69 Lr 0.0001
Epoch[0](312/391): Loss 1.2099 Prec@1(1,5) 66.10, 93.39 Lr 0.0001
Epoch[0](351/391): Loss 1.1588 Prec@1(1,5) 67.49, 93.93 Lr 0.0001
Epoch[0](390/391): Loss 1.1152 Prec@1(1,5) 68.62, 94.37 Lr 0.0001
 * Acc@1 78.740 Acc@5 98.300
=>Best accuracy 78.740


VGG(
  (features): Sequential(
    (conv0): Conv2d(3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (conv1): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(inplace=True)
    (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu3): ReLU(inplace=True)
    (conv4): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu4): ReLU(inplace=True)
    (pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilatio

In [131]:
with torch.cuda.device(0):
  macs_prune, params_prune = get_model_complexity_info(model_prune, (3, 32, 32), as_strings=False, print_per_layer_stat=True, verbose=False)

VGG(
  8.49 M, 100.000% Params, 177.63 MMac, 99.831% MACs, 
  (features): Sequential(
    8.28 M, 97.605% Params, 177.43 MMac, 99.716% MACs, 
    (conv0): Conv2d(1.34 k, 0.016% Params, 1.38 MMac, 0.773% MACs, 3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(96, 0.001% Params, 98.3 KMac, 0.055% MACs, 48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(0, 0.000% Params, 49.15 KMac, 0.028% MACs, inplace=True)
    (conv1): Conv2d(20.78 k, 0.245% Params, 21.28 MMac, 11.961% MACs, 48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(96, 0.001% Params, 98.3 KMac, 0.055% MACs, 48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(0, 0.000% Params, 49.15 KMac, 0.028% MACs, inplace=True)
    (pool2): MaxPool2d(0, 0.000% Params, 49.15 KMac, 0.028% MACs, kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(41.57 k, 0.490% Params, 10.64 M

## **Section 4.2: Norm**

In [132]:
def prune_norm(model, model_ori):
    oristate_dict = model_ori.state_dict()
    state_dict = model.state_dict()
    last_select_index = None  # Conv index selected in the previous layer

    cnt = 0
    for name, module in model.named_modules():
        name = name.replace('module.', '')

        if isinstance(module, nn.Conv2d):
            cnt += 1
            oriweight = oristate_dict[name + '.weight']
            curweight = state_dict[name + '.weight']
            orifilter_num = oriweight.size(0)
            currentfilter_num = curweight.size(0)
            print(f"Processing layer {cnt}, original layer has {orifilter_num} filters, pruning model has {currentfilter_num} filters")


            if orifilter_num != currentfilter_num:
                cov_id = cnt
                #************ rank the filter's importance here
                print(oristate_dict[name + '.weight'].shape)
                weight = oristate_dict[name + '.weight'].data
                weight = weight.reshape(weight.size(0), weight.size(1)*weight.size(2)*weight.size(3))
                norms = torch.norm(weight, dim=1)  # Compute norm along dimensions 1, 2, and 3
                print(norms)

                # Now, let's rank them based on the norms.
                # We'll get the indices that would sort the norms in descending order.
                sorted_indices = torch.argsort(norms, descending=True)

                # Print the ranks and corresponding norms
                for rank, index in enumerate(sorted_indices):
                    norm_value = norms[index]
                    # print(f"Rank {rank + 1}: Norm = {norm_value.item()}")

                # If you also want the indices of filters in descending order of their norms
                # print("Indices of filters in descending order of their norms:")
                # print(sorted_indices)
                rank = sorted_indices.cpu().numpy()
                #********************
                print(f"rank {rank}")
                select_index = np.argsort(
                    rank)[orifilter_num-currentfilter_num:]  # preserved filter id
                select_index.sort()

                if last_select_index is not None:
                    for index_i, i in enumerate(select_index):
                        for index_j, j in enumerate(last_select_index):
                            state_dict[name + '.weight'][index_i][index_j] = \
                                oristate_dict[name + '.weight'][i][j]
                else:
                    for index_i, i in enumerate(select_index):
                        state_dict[name + '.weight'][index_i] = \
                            oristate_dict[name + '.weight'][i]

                last_select_index = select_index

            elif last_select_index is not None:
                for i in range(orifilter_num):
                    for index_j, j in enumerate(last_select_index):
                        state_dict[name + '.weight'][i][index_j] = \
                            oristate_dict[name + '.weight'][i][j]
            else:
                state_dict[name + '.weight'] = oriweight
                last_select_index = None

    model.load_state_dict(state_dict)

In [133]:
prune_norm(model_prune, model_ori)


Processing layer 1, original layer has 64 filters, pruning model has 48 filters
torch.Size([64, 3, 3, 3])
tensor([1.5891e+00, 8.8759e-01, 5.5363e-01, 3.3376e-01, 2.2699e-01, 7.8806e-01,
        3.8824e-01, 3.9319e-01, 3.0613e-02, 6.7065e-01, 3.2999e-01, 1.5340e+00,
        4.5140e-02, 5.1688e-04, 1.4907e-01, 2.3611e-01, 1.9442e-01, 1.1847e+00,
        2.6520e-01, 1.0161e-01, 2.3032e-01, 6.9368e-01, 1.2299e-01, 1.0715e+00,
        5.8923e-01, 5.8216e-01, 2.8645e-01, 3.3000e-01, 1.6501e+00, 1.1890e+00,
        9.7305e-01, 5.8581e-01, 1.9635e+00, 7.2750e-04, 1.9415e-01, 3.9707e-04,
        6.7574e-01, 1.8930e-03, 7.1398e-01, 1.0104e+00, 1.0242e+00, 6.8070e-01,
        9.9616e-01, 5.0565e-01, 4.1435e-01, 3.0730e-01, 3.4557e-01, 1.3315e+00,
        8.0923e-04, 1.0445e+00, 6.0952e-04, 9.9565e-02, 7.0488e-01, 7.0627e-04,
        1.7248e+00, 3.9991e-01, 2.0378e-01, 1.7900e+00, 7.4258e-01, 5.9122e-01,
        1.4910e-01, 5.9228e-01, 1.1001e-01, 4.1941e-01], device='cuda:0')
rank [32 57 54 28  0

In [134]:
print("Evaluating the model after pruning, without finetuning:")
_, accuracy_model_prune, _ = validate(val_loader, model_prune, criterion)
print(f"This model's accuracy is {accuracy_model_prune}")

Evaluating the model after pruning, without finetuning:
 * Acc@1 10.000 Acc@5 50.000
This model's accuracy is 10.0


In [135]:
finetune(model_prune, train_loader, val_loader, epochs=1, criterion=criterion)

 * Acc@1 10.000 Acc@5 50.000
learning_rate: 0.0001
Epoch[0](0/391): Loss 2.2157 Prec@1(1,5) 21.09, 54.69 Lr 0.0001
Epoch[0](39/391): Loss 1.9014 Prec@1(1,5) 42.48, 81.00 Lr 0.0001
Epoch[0](78/391): Loss 1.6551 Prec@1(1,5) 53.70, 87.38 Lr 0.0001
Epoch[0](117/391): Loss 1.4829 Prec@1(1,5) 59.31, 90.37 Lr 0.0001
Epoch[0](156/391): Loss 1.3652 Prec@1(1,5) 62.75, 92.00 Lr 0.0001
Epoch[0](195/391): Loss 1.2689 Prec@1(1,5) 65.39, 93.12 Lr 0.0001
Epoch[0](234/391): Loss 1.1918 Prec@1(1,5) 67.44, 93.88 Lr 0.0001
Epoch[0](273/391): Loss 1.1291 Prec@1(1,5) 69.06, 94.47 Lr 0.0001
Epoch[0](312/391): Loss 1.0758 Prec@1(1,5) 70.39, 94.92 Lr 0.0001
Epoch[0](351/391): Loss 1.0293 Prec@1(1,5) 71.57, 95.33 Lr 0.0001
Epoch[0](390/391): Loss 0.9900 Prec@1(1,5) 72.56, 95.64 Lr 0.0001
 * Acc@1 79.470 Acc@5 98.350
=>Best accuracy 79.470


VGG(
  (features): Sequential(
    (conv0): Conv2d(3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (conv1): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(inplace=True)
    (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu3): ReLU(inplace=True)
    (conv4): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu4): ReLU(inplace=True)
    (pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilatio

## **Section 4.3: Similarity**


In [136]:
def prune_similarity(model, model_ori):
    oristate_dict = model_ori.state_dict()
    state_dict = model.state_dict()
    last_select_index = None  # Conv index selected in the previous layer

    cnt = 0
    for name, module in model.named_modules():
        name = name.replace('module.', '')

        if isinstance(module, nn.Conv2d):
            cnt += 1
            oriweight = oristate_dict[name + '.weight']
            curweight = state_dict[name + '.weight']
            orifilter_num = oriweight.size(0)
            currentfilter_num = curweight.size(0)
            print(f"Processing layer {cnt}, original layer has {orifilter_num} filters, pruning model has {currentfilter_num} filters")

            if orifilter_num != currentfilter_num:
                cov_id = cnt
                #************ rank the filter's importance here
                # print(oristate_dict[name + '.weight'].shape)
                weight = oristate_dict[name + '.weight'].data
                similarity_matrix = np.zeros((orifilter_num, orifilter_num))
                for i in range(orifilter_num):
                  for j in range(orifilter_num):
                    # print(f'Computing the distance between filter {i} and filter {j}:')
                    dist = torch.dist(weight[i], weight[j])
                    similarity_matrix[i, j] = dist
                    # print(dist)

                print(similarity_matrix)
                row_sums = np.sum(similarity_matrix, axis=1) # compute the sum of the distance of 1 filter to all other filters
                rank = row_sums
                #********************
                # print(f"rank {rank}")
                select_index = np.argsort(
                    rank)[orifilter_num-currentfilter_num:]  # preserved filter id
                select_index.sort()

                if last_select_index is not None:
                    for index_i, i in enumerate(select_index):
                        for index_j, j in enumerate(last_select_index):
                            state_dict[name + '.weight'][index_i][index_j] = \
                                oristate_dict[name + '.weight'][i][j]
                else:
                    for index_i, i in enumerate(select_index):
                        state_dict[name + '.weight'][index_i] = \
                            oristate_dict[name + '.weight'][i]

                last_select_index = select_index

            elif last_select_index is not None:
                for i in range(orifilter_num):
                    for index_j, j in enumerate(last_select_index):
                        state_dict[name + '.weight'][i][index_j] = \
                            oristate_dict[name + '.weight'][i][j]
            else:
                state_dict[name + '.weight'] = oriweight
                last_select_index = None

    model.load_state_dict(state_dict)

In [137]:
prune_similarity(model_prune, model_ori)

Processing layer 1, original layer has 64 filters, pruning model has 48 filters
[[0.         1.74139035 1.59087288 ... 1.72806299 1.57414818 1.61237991]
 [1.74139035 0.         1.0449996  ... 1.0066036  0.88404632 0.85878301]
 [1.59087288 1.0449996  0.         ... 0.66931009 0.51807326 0.4106819 ]
 ...
 [1.72806299 1.0066036  0.66931009 ... 0.         0.57510459 0.59303981]
 [1.57414818 0.88404632 0.51807326 ... 0.57510459 0.         0.43026134]
 [1.61237991 0.85878301 0.4106819  ... 0.59303981 0.43026134 0.        ]]
Processing layer 2, original layer has 64 filters, pruning model has 48 filters
[[0.         1.18950224 1.54022479 ... 1.15816128 1.24496245 1.15282464]
 [1.18950224 0.         1.11939645 ... 1.11556244 0.97027659 1.26536036]
 [1.54022479 1.11939645 0.         ... 1.34018302 1.27003932 1.45868313]
 ...
 [1.15816128 1.11556244 1.34018302 ... 0.         1.18896043 1.27326846]
 [1.24496245 0.97027659 1.27003932 ... 1.18896043 0.         1.27268112]
 [1.15282464 1.26536036 1.

In [138]:
print("Evaluating the model after pruning, without finetuning:")
_, accuracy_model_prune, _ = validate(val_loader, model_prune, criterion)
print(f"This model's accuracy is {accuracy_model_prune}")

Evaluating the model after pruning, without finetuning:
 * Acc@1 10.000 Acc@5 50.020
This model's accuracy is 10.0


In [139]:
finetune(model_prune, train_loader, val_loader, epochs=1, criterion=criterion)

 * Acc@1 10.000 Acc@5 50.020
learning_rate: 0.0001
Epoch[0](0/391): Loss 2.1867 Prec@1(1,5) 20.31, 67.19 Lr 0.0001
Epoch[0](39/391): Loss 1.9700 Prec@1(1,5) 33.50, 78.67 Lr 0.0001
Epoch[0](78/391): Loss 1.7097 Prec@1(1,5) 49.00, 85.36 Lr 0.0001
Epoch[0](117/391): Loss 1.5341 Prec@1(1,5) 55.89, 88.79 Lr 0.0001
Epoch[0](156/391): Loss 1.4014 Prec@1(1,5) 60.56, 90.80 Lr 0.0001
Epoch[0](195/391): Loss 1.2996 Prec@1(1,5) 63.79, 92.12 Lr 0.0001
Epoch[0](234/391): Loss 1.2187 Prec@1(1,5) 66.18, 93.06 Lr 0.0001
Epoch[0](273/391): Loss 1.1466 Prec@1(1,5) 68.31, 93.80 Lr 0.0001
Epoch[0](312/391): Loss 1.0864 Prec@1(1,5) 69.98, 94.41 Lr 0.0001
Epoch[0](351/391): Loss 1.0340 Prec@1(1,5) 71.46, 94.85 Lr 0.0001
Epoch[0](390/391): Loss 0.9885 Prec@1(1,5) 72.69, 95.26 Lr 0.0001
 * Acc@1 80.640 Acc@5 98.350
=>Best accuracy 80.640


VGG(
  (features): Sequential(
    (conv0): Conv2d(3, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm0): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (conv1): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu1): ReLU(inplace=True)
    (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (conv3): Conv2d(48, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu3): ReLU(inplace=True)
    (conv4): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (norm4): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu4): ReLU(inplace=True)
    (pool5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilatio

# **Section 4: Analysis & Conclusion**


Here are some questions for you to consider while working on this project:
1. **What is the relation between compression rate, reduced complexity/parameters, and accuracy?**
   - How does increasing the compression rate affect the model's accuracy?
   - Can you explain the trade-off between model complexity (number of parameters) and accuracy?

2. **How do you decide the importance of filters?**
   - What criteria can be used to determine the importance of filters in a convolutional neural network?
   - How do similarity-based filter pruning techniques identify redundant or less important filters?
   - Can you explain the concept of filter importance in the context of model efficiency and effectiveness?

3. **What are the implications of pruning on model performance and inference speed?**
   - How does pruning affect the inference speed of a model?
   - Can you discuss the impact of pruning on model accuracy during inference?
   - What strategies can be employed to mitigate any potential loss of accuracy after pruning?

4. **How does fine-tuning help improve the performance of pruned models?**
   - What is the purpose of fine-tuning a pruned model?
   - How does fine-tuning help the model adapt to the changes introduced by pruning?
   - Can you explain any challenges or considerations when fine-tuning pruned models?

5. **What are some alternative pruning techniques, and how do they compare to similarity-based pruning?**
   - Can you describe magnitude-based pruning and its advantages/disadvantages compared to similarity-based pruning?
   - What is sensitivity-based pruning, and how does it differ from similarity-based pruning?
   - Are there any hybrid approaches that combine multiple pruning techniques for better results?