#Intro

This notebook is based on 'ImageNet training in PyTorch' [example](https://github.com/pytorch/examples/blob/master/imagenet/main.py).

The goal of this notebook is not to reach the best possible baselines for the implemented compression algorithms, but to demonstrate simple use cases of [NNCF](https://github.com/openvinotoolkit/nncf) with Pytorch. For more advanced usage refer to these [examples](https://github.com/openvinotoolkit/nncf/tree/develop/examples).

To make the demonstration easier and faster, we propose to use ResNet-18 model with CIFAR-10 dataset. But it is possible to change it.

Demonstrated algorithms:

- [Quantization](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md)

TODO:
- [Filter pruning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Pruning.md)

- [Sparsity](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Sparsity.md)

#Setup

In [None]:
!pip install nncf==1.6.0

In [None]:
import os
import random
import shutil
import time
import warnings
import json

import torch
import nncf  # Important - should be imported directly after torch
from nncf import create_compressed_model, NNCFConfig, register_default_init_args

import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import torchvision.datasets

Connect to google drive to save and get access to the pretrained model (optional)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# path to the saved model checkpoint, can be any
# PATH = ''
PATH = '/content/drive/MyDrive/Colab Notebooks/nncf/'

#Main function

In [None]:
def main(params):

    arch = params['arch']
    num_classes = params['num_classes']
    input_size = params['input_size']

    optimizer_type = params['optimizer_type'] = 'SGD'
    init_lr = params['init_lr']
    adjustable_lr = params['adjustable_lr']
    momentum = params['momentum']
    weight_decay = params['weight_decay']
    batch_size = params['batch_size']

    workers = params['workers']
    pretrained = params['pretrained']
    resume = params['resume']
    checkpoint_compressed = params['checkpoint_compressed']
    evaluate = params['evaluate']
    start_epoch = params['start_epoch']
    epochs = params['epochs']

    use_nncf = params['use_nncf']
    if use_nncf:
        nncf_config_file = params['nncf_config_file']
        epochs_tune = params['epochs_tune']

    best_acc1 = 0

    # create model
    if pretrained:
        print("=> using pre-trained model '{}'".format(arch))
        model = models.__dict__[arch](pretrained=True)
    else:
        print("=> creating model '{}'".format(arch))
        model = models.__dict__[arch]()
    # update the last FC layer for tiny-imagenet number of classes
    model.fc = nn.Linear(in_features=512, out_features=num_classes, bias=True)

    # define loss function (criterion)
    criterion = nn.CrossEntropyLoss()

    if not torch.cuda.is_available():
        print('using CPU, this will be slow')
    else:
        print('using GPU')
        model.cuda()
        model.fc = model.fc.cuda()
        criterion.cuda()

    # define optimizer
    if optimizer_type == 'SGD':
        optimizer = torch.optim.SGD(model.parameters(), lr=init_lr,
                              momentum=momentum, weight_decay=weight_decay)
    elif optimizer_type == 'Adam':
        optimizer = torch.optim.Adam(model.parameters(), lr=init_lr)
    else:
        print('Support only SGD and Adam optimizers')
        return
    if adjustable_lr:
        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

    # optionally: resume from a checkpoint
    if resume:
        if os.path.isfile(resume):
            print("=> loading checkpoint '{}'".format(resume))
            #
            # ** WARNING: torch.load functionality uses Python's pickling facilities that
            # may be used to perform arbitrary code execution during unpickling. Only load the data you
            # trust.
            #
            checkpoint = torch.load(resume)

            start_epoch = checkpoint['epoch']
            print('resumed start_epoch', start_epoch)
            best_acc1 = checkpoint['best_acc1']

            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {}, best_acc1 {:6.2f})"
                  .format(resume, checkpoint['epoch'], checkpoint['best_acc1']))
        else:
            print("=> no checkpoint found at '{}'".format(resume))

    # Data
    print('==> Preparing data..')
    transform_train = transforms.Compose([
        transforms.Resize(256),
        transforms.RandomCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])

    transform_test = transforms.Compose([
        transforms.Resize(input_size),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])

    trainset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=True, transform=transform_train)
    train_loader = torch.utils.data.DataLoader(
        trainset, batch_size=128, shuffle=True, num_workers=2, pin_memory=True)

    testset = torchvision.datasets.CIFAR10(
        root='./data', train=False, download=True, transform=transform_test)
    val_loader = torch.utils.data.DataLoader(
        testset, batch_size=100, shuffle=False, num_workers=2, pin_memory=True)

    # classes = ('plane', 'car', 'bird', 'cat', 'deer',
    #           'dog', 'frog', 'horse', 'ship', 'truck')

    if use_nncf:
        # Load a configuration file to specify compression
        nncf_config = NNCFConfig.from_json(nncf_config_file)
        # Provide data loaders for compression algorithm initialization, if necessary
        nncf_config = register_default_init_args(nncf_config, train_loader, criterion)
        # Apply the specified compression algorithms to the model
        print('=> compressing the model with {}'.format(nncf_config_file))
        compression_ctrl, model = create_compressed_model(model, nncf_config)

        epochs = start_epoch + epochs_tune

    if evaluate:
        validate(val_loader, model, criterion)
        return

    for epoch in range(start_epoch, epochs):
        adjust_learning_rate(optimizer, epoch, init_lr)

        if use_nncf:
            # update compression scheduler state at the begin of the epoch
            compression_ctrl.scheduler.epoch_step()
            # tune for one epoch with nncf
            train(train_loader, model, criterion, optimizer, epoch, use_nncf, compression_ctrl=compression_ctrl)
        else:
            # train for one epoch without nncf
            train(train_loader, model, criterion, optimizer, epoch, use_nncf, compression_ctrl=None)

        # evaluate on validation set
        acc1 = validate(val_loader, model, criterion)

        if adjustable_lr:
            scheduler.step()

        # remember best acc@1 and save checkpoint
        is_best = acc1 > best_acc1
        best_acc1 = max(acc1, best_acc1)

        if not use_nncf:
            save_checkpoint({
                'epoch': epoch + 1,
                'arch': arch,
                'state_dict': model.state_dict(),
                'best_acc1': best_acc1,
                'optimizer' : optimizer.state_dict(),
            }, is_best, batch_size, input_size)
    
    # Export the compressed model to ONNX format that is supported by the OpenVINO™ toolkit
    if use_nncf:
        compression_ctrl.export_model(PATH + "best_model_compressed.onnx")

#Train function

In [None]:
def train(train_loader, model, criterion, optimizer, epoch, use_nncf, compression_ctrl):
    batch_time = AverageMeter('Time', ':6.3f')
    data_time = AverageMeter('Data', ':6.3f')
    losses = AverageMeter('Loss', ':.4e')
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    progress = ProgressMeter(
        len(train_loader),
        [batch_time, data_time, losses, top1, top5],
        prefix="Epoch: [{}]".format(epoch))

    # switch to train mode
    model.train()

    end = time.time()
    for i, (images, target) in enumerate(train_loader):
        # measure data loading time
        data_time.update(time.time() - end)

        if use_nncf:
            compression_ctrl.scheduler.step()

        if torch.cuda.is_available():
            images = images.cuda()
            target = target.cuda()

        # compute output
        output = model(images)
        loss = criterion(output, target)

        if use_nncf:
            compression_loss = compression_ctrl.loss()
            loss += compression_loss

        # measure accuracy and record loss
        acc1, acc5 = accuracy(output, target, topk=(1, 5))
        losses.update(loss.item(), images.size(0))
        top1.update(acc1[0], images.size(0))
        top5.update(acc5[0], images.size(0))

        # compute gradient and do opt step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()

        print_frequency = 10
        if i % print_frequency == 0:
            progress.display(i)

#Validate function

In [None]:
def validate(val_loader, model, criterion):
    batch_time = AverageMeter('Time', ':6.3f')
    losses = AverageMeter('Loss', ':.4e')
    top1 = AverageMeter('Acc@1', ':6.2f')
    top5 = AverageMeter('Acc@5', ':6.2f')
    progress = ProgressMeter(
        len(val_loader),
        [batch_time, losses, top1, top5],
        prefix='Test: ')

    # switch to evaluate mode
    model.eval()

    with torch.no_grad():
        end = time.time()
        for i, (images, target) in enumerate(val_loader):
            if torch.cuda.is_available():
                images = images.cuda()
                target = target.cuda()

            # compute output
            output = model(images)
            loss = criterion(output, target)

            # measure accuracy and record loss
            acc1, acc5 = accuracy(output, target, topk=(1, 5))
            losses.update(loss.item(), images.size(0))
            top1.update(acc1[0], images.size(0))
            top5.update(acc5[0], images.size(0))

            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

            print_frequency = 10
            if i % print_frequency == 0:
                progress.display(i)

        # TODO: this should also be done with the ProgressMeter
        print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
              .format(top1=top1, top5=top5))

    return top1.avg

#Helpers

In [None]:
def save_checkpoint(state, is_best, batch_size, input_size, filename='checkpoint.pth.tar'):
    torch.save(state, os.path.join(PATH, filename))
    if is_best:
        shutil.copyfile(os.path.join(PATH, filename), os.path.join(PATH, 'model_best.pth.tar'))
        # Save in ONNX format
        x = torch.randn(batch_size, 3, input_size, input_size)
        torch.onnx.export(model, x, PATH + "model_best.onnx")

class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self, name, fmt=':f'):
        self.name = name
        self.fmt = fmt
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

    def __str__(self):
        fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
        return fmtstr.format(**self.__dict__)


class ProgressMeter(object):
    def __init__(self, num_batches, meters, prefix=""):
        self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
        self.meters = meters
        self.prefix = prefix

    def display(self, batch):
        entries = [self.prefix + self.batch_fmtstr.format(batch)]
        entries += [str(meter) for meter in self.meters]
        print('\t'.join(entries))

    def _get_batch_fmtstr(self, num_batches):
        num_digits = len(str(num_batches // 1))
        fmt = '{:' + str(num_digits) + 'd}'
        return '[' + fmt + '/' + fmt.format(num_batches) + ']'

In [None]:
def adjust_learning_rate(optimizer, epoch, init_lr):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = init_lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr


def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res

#Main

Pipeline:

- Train without NNCF, save best checkpoint in float precision

- Load best float checkpoint and compress it with selected algorithm (convert the model to NNCFNetwork format by enabling NNCF) TODO: be able to load the compressed model for further tuning

- Tune the compressed model (train with enabled NNCF, define the tune parameters in the corresponding configuraion files)

- Save compressed tuned model (or convert to ONNX)

## Configuration files for compression algorithms

In [None]:
def create_json_files(batch_size, input_size):
    """
    Define configurations for compression algorithms
    Create the json files
    Return the configurations as dictinary objects
    """

    config_dir = 'config_files'
    if not os.path.exists(config_dir):
        os.makedirs(config_dir)

    def write_json(json_obj, json_name):
        with open(os.path.join(config_dir, json_name), 'w') as jsonFile:
            json.dump(json_obj, jsonFile)

    # Define config objects below
    configs = {}

    # Quantization int8
    # https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md
    configs['quantization.json'] = {

            "input_info": {
              "sample_size": [batch_size, 3, input_size, input_size]
            },

            "epochs": 1, # number of epochs to tune

            "optimizer": { # optimizer used during tuning
                "type": "Adam",
                "base_lr": 1e-5
            },

            "compression": {
                    "algorithm": "quantization", # specify the algorithm here
            }
    }

    # create json files, that will be used by nncf later
    for config_key, config_val in configs.items():
        write_json(config_val, config_key)
    
    return configs

## Train/tune

In [None]:
params = {}

params['arch'] = 'resnet18'
params['num_classes'] = 10 # For Cifar10
params['input_size'] = 224

params['optimizer_type'] = 'SGD' # "Adam"
params['init_lr'] = 0.1
params['momentum'] = 0.9
params['weight_decay'] = 5e-4
params['adjustable_lr'] = True
params['batch_size'] = 128
params['workers'] = 4
params['start_epoch'] = 0 # updated automatically if training is resumed
params['epochs'] = 100 # for full precision training, will be updated (increased) in case of tuning with nncf 
params['pretrained'] = True # pretrained model on Imagenet

params['resume'] = PATH + 'model_best.pth.tar'  # path to latest checkpoint (or None)
params['checkpoint_compressed'] = False

params['evaluate'] = False # test on the validation set and exit

params['use_nncf'] = True # enable model compression and tuning

params['nncf_config_file'] = None
if params['use_nncf']:
    # create all config files
    configs = create_json_files(params['batch_size'], params['input_size'])

    # choose config file
    algorithm_config = 'quantization.json'
    params['nncf_config_file'] = 'config_files/' + algorithm_config

    # update tune params to fit certain compression algorithm
    params['optimizer_type'] = configs[algorithm_config]['optimizer']['type']
    params['init_lr'] = configs[algorithm_config]['optimizer']['base_lr']
    params['adjustable_lr'] = False
    # params['epochs'] = params['epochs'] + configs[algorithm_config]['epochs']
    params['epochs_tune'] = configs[algorithm_config]['epochs']


# Run certain algorithm once
main(params)

# Iterate over algorithms
# TODO: add more algorithms
algorithm_configs = ['quantization.json']
for algorithm_config in algorithm_configs:
    # Run the tuning procedure:
    # print(algorithm_config)
    # params['nncf_config_file'] = 'config_files/' + algorithm_config
    # update tune params
    # main(params)
    pass

=> using pre-trained model 'resnet18'
using GPU
=> loading checkpoint '/content/drive/MyDrive/Colab Notebooks/nncf/model_best.pth.tar'
resumed start_epoch 49
=> loaded checkpoint '/content/drive/MyDrive/Colab Notebooks/nncf/model_best.pth.tar' (epoch 49, best_acc1  91.33)
==> Preparing data..
Files already downloaded and verified
Files already downloaded and verified
=> compressing the model with config_files/quantization.json
INFO:nncf:Wrapping module ResNet/Conv2d[conv1] by ResNet/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv1] by ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv2] by ResNet/Sequential[layer1]/BasicBlock[0]/NNCFConv2d[conv2]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv1] by ResNet/Sequential[layer1]/BasicBlock[1]/NNCFConv2d[conv1]
INFO:nncf:Wrapping module ResNet/Sequential[layer1]/BasicBlock[1]/Conv2

  "metatype - will be ignored".format(hw_config_op_name))
  "metatype - will be ignored".format(hw_config_op_name))


INFO:nncf:Algorithm initialization ████████          | 1 / 2
INFO:nncf:Algorithm initialization ████████████████  | 2 / 2
INFO:nncf:Set sign: False and scale: [8.2491, ] for ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/RELU_1
INFO:nncf:Adding unsigned Activation Quantize in scope: ResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/RELU_1
INFO:nncf:Set sign: False and scale: [3.4222, ] for ResNet/AdaptiveAvgPool2d[avgpool]/adaptive_avg_pool2d_0
INFO:nncf:Adding unsigned Activation Quantize in scope: ResNet/AdaptiveAvgPool2d[avgpool]/adaptive_avg_pool2d_0
INFO:nncf:Set sign: True and scale: [2.7537, ] for /nncf_model_input_0
INFO:nncf:Adding signed Activation Quantize in scope: /nncf_model_input_0
INFO:nncf:Set sign: False and scale: [0.9314, ] for ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/RELU_0
INFO:nncf:Adding unsigned Activation Quantize in scope: ResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/RELU_0
INFO:nncf:Set sign: True and scale: [0.8328, ] for ResNet/Sequent

  self.shape = tuple(int(dim) for dim in shape)  # Handle cases when shape is a tuple of Tensors
  if not self.is_enabled_quantization():
  return self._num_bits.item()
  return self.signed_tensor.item() == 1


#Accuracy in Pytorch

###Full-presition model: 

Acc@1 91.330

### Compressed models:

- int8 quantization: Acc@1 91.480


#Export to OpenVINO™ Intermediate Representation (IR)
To export an ONNX model representation to the OpenVINO IR representation and run it using the Intel® Deep Learning Deployment Toolkit, refer to this [tutorial](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html).

#Accuracy check using OpenVINO™ Deep Learning accuracy validation framework (Accuracy Checker). 

The model was converted to Intermediate Representation (IR) before validation. For more detailes, refer to this [tutorial](https://github.com/openvinotoolkit/open_model_zoo/tree/master/tools/accuracy_checker).

###Full-presition IR model: 

Acc@1 91.32

### Compressed IR models:

- int8 quantization: Acc@1 91.49



#Measurement of the performance with OpenVINO™ Benchmark Python* Tool

To compare the performance between the full presition and compressed model refer to this [tutorial](https://github.com/openvinotoolkit/openvino/tree/master/inference-engine/tools/benchmark_tool).

##############################################

### BELOW IS DANGEROUS????????? ###

##############################################

### Set up:

- Hardware: Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz

- Model: Resnet-18

- Input shape: [1, 3, 224, 224]

- Batch size: 1

- Mode: asynchronous

### Results

More than 3x gain in performace after compression of the model:

#### Full-presicion ONNX model:
- Count:      28902 iterations
- Duration:   60015.29 ms
- Latency:    12.43 ms
- Throughput: 481.58 FPS

#### Compressed ONNX model:

##### int8 quantization:

- Count:      94014 iterations
- Duration:   60005.72 ms
- Latency:    3.81 ms
- Throughput: 1566.75 FPS
