<a href="https://colab.research.google.com/github/lrakotoarivony/Micronet_Challenge/blob/main/Project_Model_Cifar10.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook réalisé par Lucas Rakotoarivony & Jérémie Sicard

Ce Notebook présente les différents résultats et travaux que nous avons effectués dans le cadre du Micronet Challenge.  
Nous avons choisi de travailler avec l'architecture Densenet.  
Il est important de préciser que l'objectif de ce projet n'est pas d'obtenir l'accuracy la plus importante mais le score Micronet le plus faible tout en ayant un modèle avec une accuracy supérieure à 90%. Pour rappel le score de Micronet se base sur deux facteurs, le nombre de paramètres et le nombre de flops (floating points operations).

# Data & Imports

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data.dataloader import DataLoader
from torch.optim.lr_scheduler import MultiStepLR
import torch.nn.utils.prune as prune

import matplotlib.pyplot as plt
import numpy as np

from util import *
from Densenet import *
from IPython.display import clear_output

In [2]:
n_classes_cifar10 = 10
train_size = 0.8
R = 5


# Download the entire CIFAR10 dataset

from torchvision.datasets import CIFAR10
import numpy as np 
from torch.utils.data import Subset
from torch.utils.data.sampler import SubsetRandomSampler


import torchvision.transforms as transforms

## Normalization is different when training from scratch and when training using an imagenet pretrained backbone

normalize_scratch = transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))


# Data augmentation is needed in order to train from scratch
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    normalize_scratch,
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    normalize_scratch,
])



### The data from CIFAR10 will be downloaded in the following dataset
rootdir = './data/cifar10'

c10train = CIFAR10(rootdir,train=True,download=True,transform=transform_train)
c10test = CIFAR10(rootdir,train=False,download=True,transform=transform_test)



# CIFAR10 is sufficiently large so that training a model up to the state of the art performance will take approximately 3 hours on the 1060 GPU available on your machine. 


def train_validation_split(train_size, num_train_examples):
    # obtain training indices that will be used for validation
    indices = list(range(num_train_examples))
    np.random.shuffle(indices)
    idx_split = int(np.floor(train_size * num_train_examples))
    train_index, valid_index = indices[:idx_split], indices[idx_split:]

    # define samplers for obtaining training and validation batches
    train_sampler = SubsetRandomSampler(train_index)
    valid_sampler = SubsetRandomSampler(valid_index)

    return train_sampler,valid_sampler

def generate_subset(dataset,n_classes,reducefactor,n_ex_class_init):

    nb_examples_per_class = int(np.floor(n_ex_class_init / reducefactor))
    # Generate the indices. They are the same for each class, could easily be modified to have different ones. But be careful to keep the random seed! 

    indices_split = np.random.RandomState(seed=42).choice(n_ex_class_init,nb_examples_per_class,replace=False)

    all_indices = []
    for curclas in range(n_classes):
        curtargets = np.where(np.array(dataset.targets) == curclas)
        indices_curclas = curtargets[0]
        indices_subset = indices_curclas[indices_split]
        #print(len(indices_subset))
        all_indices.append(indices_subset)
    all_indices = np.hstack(all_indices)
    
    return Subset(dataset,indices=all_indices)
    


### These dataloader are ready to be used to train for scratch 
cifar10_train= generate_subset(dataset=c10train,n_classes=n_classes_cifar10,reducefactor=R,n_ex_class_init=5000)
num_train_examples=len(cifar10_train)
train_sampler,valid_sampler=train_validation_split(train_size, num_train_examples)

cifar10_test = generate_subset(dataset=c10test,n_classes=n_classes_cifar10,reducefactor=1,n_ex_class_init=1000) 



Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar10/cifar-10-python.tar.gz


0it [00:00, ?it/s]

Extracting ./data/cifar10/cifar-10-python.tar.gz to ./data/cifar10
Files already downloaded and verified


In [3]:
trainloader = DataLoader(c10train,batch_size=64,sampler=train_sampler)
validloader = DataLoader(c10train,batch_size=64,sampler=valid_sampler)
testloader = DataLoader(c10test,batch_size=64) 

# Device & Criterion

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device '+str(device))
criterion = nn.CrossEntropyLoss()

Using device cuda


# Model

In [8]:
model = densenet_cifar()
model.to(device=device)
clear_output(wait=True)

In [9]:
pytorch_total_params = sum(p.numel() for p in model.parameters())
print(f'Nombre de paramètre de ce modèle : {pytorch_total_params}')

Nombre de paramètre de ce modèle : 331226


Voici les paramètres que nous avons utilisé pour entrainer ce modèle from scratch. (très long à exécuter sans GPU)

In [7]:
optimizer = torch.optim.SGD(model.parameters(),lr=0.1, momentum=0.9,weight_decay=1e-4) 
scheduler = MultiStepLR(optimizer, milestones=[90, 110], gamma=0.1)

In [None]:
train_losses, valid_losses, train_acc, valid_acc = training(trainloader, validloader, model, criterion, optimizer,120,scheduler,'models\\Densenet_from_scratch')

In [None]:
plt.figure(figsize=(10,10))

plt.subplot(3,1,1)
plt.plot(range(n_epochs), train_losses)
plt.plot(range(n_epochs), valid_losses)

plt.legend(['train', 'validation'], prop={'size': 10})
plt.title('loss function', size=10)
plt.xlabel('epoch', size=10)
plt.ylabel('loss value', size=10)

plt.subplot(3,1,3)
plt.plot(range(n_epochs), train_acc)
plt.plot(range(n_epochs), valid_acc)

plt.legend(['train', 'validation'], prop={'size': 10})
plt.title('accuracy', size=10)
plt.xlabel('epoch', size=10)
plt.ylabel('acc value', size=10)
plt.savefig("Densenet_training_scratch.png")

Si vous désirez utiliser un modèle déjà entrainé.

In [15]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt', map_location=torch.device('cpu'))
model.load_state_dict(loaded_cpt)

<All keys matched successfully>

In [16]:
evaluation(model, testloader, criterion)

test Loss: 0.337965

test accuracy of plane: 92% (921/1000)
test accuracy of car: 96% (965/1000)
test accuracy of bird: 90% (901/1000)
test accuracy of cat: 83% (833/1000)
test accuracy of deer: 94% (945/1000)
test accuracy of dog: 86% (869/1000)
test accuracy of frog: 92% (926/1000)
test accuracy of horse: 94% (941/1000)
test accuracy of ship: 95% (956/1000)
test accuracy of truck: 94% (948/1000)

test accuracy (overall): 92.05% (9205/10000)


In [17]:
flops , params = score(model)
print("Score flops: {} Score Params: {}".format(flops,params))
print("Final score: {}".format(flops + params))

Score flops: 0.06795413525587332 Score Params: 0.02964266390023521
Final score: 0.09759679915610853


# Pruning

Avant de réaliser notre pruning, il est nécessaire d'avoir un modèle entrainé (nous pouvons alors d'utiliser le notre)

In [18]:
model_pruned = densenet_cifar()
model_pruned.to(device=device)
clear_output(wait=True)

In [31]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt', map_location=torch.device('cpu'))
model_pruned.load_state_dict(loaded_cpt)

<All keys matched successfully>

In [32]:
evaluation(model_pruned, testloader, criterion)

test Loss: 0.337965

test accuracy of plane: 92% (921/1000)
test accuracy of car: 96% (965/1000)
test accuracy of bird: 90% (901/1000)
test accuracy of cat: 83% (833/1000)
test accuracy of deer: 94% (945/1000)
test accuracy of dog: 86% (869/1000)
test accuracy of frog: 92% (926/1000)
test accuracy of horse: 94% (941/1000)
test accuracy of ship: 95% (956/1000)
test accuracy of truck: 94% (948/1000)

test accuracy (overall): 92.05% (9205/10000)


Nous allons réaliser de l'unstructured pruning de façon itérative.

A chaque itération, nous allons pruner de façon globale 20 % des poids du modèle qui ont la plus faible norme L1.
Nous allons également réentrainer en utilisant la technique du learning weight rewinding qui diffère nottament d'une technique plus conventionnelle (celle du fine tuning)

Vous trouverez davantage d'informations sur le lien suivant :
https://iclr.cc/virtual_2020/poster_S1gSj0NKvB.html

Nous allons également réentrainer entre chaque itération de pruning en utilisant la technique du learning rate rewinding. 
En considérant le modèle actuel à un instant T, le réentrainer en utilisant la technique du learning rate rewinding consiste à utiliser le modèle pruné actuel (instant T), les poids actuels associés (instant T) mais en utilisant le learning rate des X epochs précédentes (instant T-X)

Dans notre cas d'étude, nous avons choisi de réentrainer le modèle après chaque phase de pruning de 60 epochs.
Ainsi nous utilisons :

Lr = 0.1 pour les epochs de [1 à 29] ;
Lr = 0.01 pour les epochs de [30 à 49] ;
Lr = 0.001 pour les epochs de [50 à 60]

Enfin, nous avons également choisi de faire 7 étapes de pruning afin d'obtenir un ratio de compression égal environ à 4.76.

In [14]:
optimizer = torch.optim.SGD(model.parameters(),lr=0.1, momentum=0.9,weight_decay=1e-4) 
from torch.optim.lr_scheduler import MultiStepLR
scheduler = MultiStepLR(optimizer, milestones=[30, 50], gamma=0.1)

In [15]:
parameters_to_prune=[]
for name, module in model_pruned.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear) :
        parameters_to_prune.append((module,'weight'))

In [16]:
steps_pruning = 7
for steps in range (steps_pruning):
    prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)
    evaluation(model_pruned, testloader, criterion)
    training_pruning(trainloader,model_pruned, criterion, optimizer,60,scheduler)

test Loss: 0.336704

test accuracy of plane: 92% (924/1000)
test accuracy of car: 96% (960/1000)
test accuracy of bird: 89% (897/1000)
test accuracy of cat: 81% (818/1000)
test accuracy of deer: 94% (943/1000)
test accuracy of dog: 87% (872/1000)
test accuracy of frog: 93% (930/1000)
test accuracy of horse: 94% (949/1000)
test accuracy of ship: 95% (955/1000)
test accuracy of truck: 94% (949/1000)

test accuracy (overall): 91.97% (9197/10000)
test Loss: 0.373786

test accuracy of plane: 91% (911/1000)
test accuracy of car: 96% (964/1000)
test accuracy of bird: 90% (903/1000)
test accuracy of cat: 82% (828/1000)
test accuracy of deer: 94% (941/1000)
test accuracy of dog: 88% (881/1000)
test accuracy of frog: 92% (920/1000)
test accuracy of horse: 89% (893/1000)
test accuracy of ship: 95% (955/1000)
test accuracy of truck: 93% (933/1000)

test accuracy (overall): 91.29% (9129/10000)
test Loss: 0.388742

test accuracy of plane: 93% (931/1000)
test accuracy of car: 96% (960/1000)
test accu

L'accuracy au bout de la 7ème étape de pruning est égale à :

In [19]:
evaluation(model_pruned, testloader, criterion)

test Loss: 0.656300

test accuracy of plane: 74% (743/1000)
test accuracy of car: 78% (786/1000)
test accuracy of bird: 79% (795/1000)
test accuracy of cat: 76% (761/1000)
test accuracy of deer: 88% (880/1000)
test accuracy of dog: 73% (731/1000)
test accuracy of frog: 87% (874/1000)
test accuracy of horse: 80% (804/1000)
test accuracy of ship: 90% (903/1000)
test accuracy of truck: 88% (886/1000)

test accuracy (overall): 81.63% (8163/10000)


Nous observons que le pruning est bien réalisé : 

In [20]:
get_sparsity(model_pruned)

Sparsity in conv1t: 25.93%
Sparsity in dense1.0.conv1t: 64.84%
Sparsity in dense1.0.conv2t: 79.17%
Sparsity in dense1.1.conv1t: 69.40%
Sparsity in dense1.1.conv2t: 71.74%
Sparsity in dense1.2.conv1t: 70.02%
Sparsity in dense1.2.conv2t: 74.00%
Sparsity in dense1.3.conv1t: 80.78%
Sparsity in dense1.3.conv2t: 77.13%
Sparsity in dense1.4.conv1t: 79.49%
Sparsity in dense1.4.conv2t: 76.09%
Sparsity in dense1.5.conv1t: 73.72%
Sparsity in dense1.5.conv2t: 73.39%
Sparsity in trans1.convt: 48.34%
Sparsity in dense2.0.conv1t: 69.82%
Sparsity in dense2.0.conv2t: 72.70%
Sparsity in dense2.1.conv1t: 76.41%
Sparsity in dense2.1.conv2t: 77.65%
Sparsity in dense2.2.conv1t: 75.13%
Sparsity in dense2.2.conv2t: 70.18%
Sparsity in dense2.3.conv1t: 73.66%
Sparsity in dense2.3.conv2t: 73.09%
Sparsity in dense2.4.conv1t: 72.71%
Sparsity in dense2.4.conv2t: 70.31%
Sparsity in dense2.5.conv1t: 73.74%
Sparsity in dense2.5.conv2t: 69.14%
Sparsity in dense2.6.conv1t: 70.12%
Sparsity in dense2.6.conv2t: 69.70%
Spar

'79.0286865234375'

Si vous voulez travailler avec notre modèle déjà pruné.

In [33]:
parameters_to_prune=[]
for name, module in model_pruned.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear) :
        parameters_to_prune.append((module,'weight'))

In [34]:
prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)

In [35]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt', map_location=torch.device('cpu'))
model_pruned.load_state_dict(loaded_cpt)

<All keys matched successfully>

Il est nécessaire d'exécuter un forward afin que le modèle soit pruné de manière correcte.

In [36]:
evaluation(model_pruned, testloader, criterion)

test Loss: 0.304136

test accuracy of plane: 93% (939/1000)
test accuracy of car: 97% (974/1000)
test accuracy of bird: 90% (903/1000)
test accuracy of cat: 84% (843/1000)
test accuracy of deer: 95% (950/1000)
test accuracy of dog: 88% (881/1000)
test accuracy of frog: 94% (944/1000)
test accuracy of horse: 93% (935/1000)
test accuracy of ship: 94% (949/1000)
test accuracy of truck: 94% (947/1000)

test accuracy (overall): 92.65% (9265/10000)


In [37]:
get_sparsity(model_pruned)

Sparsity in conv1t: 26.85%
Sparsity in dense1.0.conv1t: 74.80%
Sparsity in dense1.0.conv2t: 76.65%
Sparsity in dense1.1.conv1t: 71.61%
Sparsity in dense1.1.conv2t: 69.97%
Sparsity in dense1.2.conv1t: 76.37%
Sparsity in dense1.2.conv2t: 77.17%
Sparsity in dense1.3.conv1t: 83.52%
Sparsity in dense1.3.conv2t: 77.86%
Sparsity in dense1.4.conv1t: 78.78%
Sparsity in dense1.4.conv2t: 73.52%
Sparsity in dense1.5.conv1t: 77.12%
Sparsity in dense1.5.conv2t: 74.78%
Sparsity in trans1.convt: 56.93%
Sparsity in dense2.0.conv1t: 71.29%
Sparsity in dense2.0.conv2t: 72.83%
Sparsity in dense2.1.conv1t: 78.52%
Sparsity in dense2.1.conv2t: 74.78%
Sparsity in dense2.2.conv1t: 77.28%
Sparsity in dense2.2.conv2t: 70.01%
Sparsity in dense2.3.conv1t: 79.63%
Sparsity in dense2.3.conv2t: 71.66%
Sparsity in dense2.4.conv1t: 76.32%
Sparsity in dense2.4.conv2t: 71.74%
Sparsity in dense2.5.conv1t: 76.56%
Sparsity in dense2.5.conv2t: 70.36%
Sparsity in dense2.6.conv1t: 71.68%
Sparsity in dense2.6.conv2t: 70.27%
Spar

'79.0286865234375'

In [38]:
print(model_pruned.conv1.weight)

tensor([[[[-9.8413e-01, -1.4156e+00, -6.8998e-01],
          [ 2.9501e-01,  2.5201e-01, -2.1341e-01],
          [ 1.1402e+00,  1.6780e+00,  2.3051e-01]],

         [[ 2.8061e-01,  1.4054e-01,  2.8852e-01],
          [-1.0003e-01, -1.5904e-01,  0.0000e+00],
          [-3.6462e-01, -1.8441e-01, -1.7617e-01]],

         [[ 7.9544e-01,  8.7149e-01,  5.3565e-01],
          [-4.3006e-02, -2.3282e-01,  2.8181e-01],
          [-9.3424e-01, -1.1104e+00, -2.4479e-01]]],


        [[[ 3.4305e-01,  1.0020e+00,  5.2035e-01],
          [ 6.1827e-01,  1.2533e+00,  6.1121e-01],
          [ 4.5072e-01,  9.9235e-01, -0.0000e+00]],

         [[-3.3692e-01, -5.0674e-01, -2.5158e-01],
          [-4.8995e-01, -6.7966e-01, -4.1674e-01],
          [-2.7760e-01, -3.8856e-01, -2.1192e-01]],

         [[ 0.0000e+00, -4.3839e-01, -7.1969e-02],
          [-2.5402e-01, -6.4782e-01, -1.4823e-01],
          [-2.6778e-01, -5.2079e-01,  0.0000e+00]]],


        [[[ 0.0000e+00, -4.7254e-01, -0.0000e+00],
          [-3.8

On voit que notre accuracy a augmenté (environ 0.6%), de plus le nombre de paramètres réduit à zéro est assez important.  
De ce fait notre score Micronet associé aux nombres de paramètres a diminué de manière importante .

In [39]:
# En exécutant cette cellule, nous perdons le mask associé au pruning, mais cela nous permet de pouvoir calculer le score micronet
for name, module in model_pruned.named_modules():
# prune X % of connections in all 2D-conv layers
    if isinstance(module, torch.nn.Conv2d):
        prune.remove(module, 'weight')

    elif isinstance(module, torch.nn.Linear):
        prune.remove(module, 'weight')

In [40]:
flops , params = score(model_pruned)
print("Score flops: {} Score Params: {}".format(flops,params))
print("Final score: {}".format(flops + params))

Score flops: 0.06795413525587332 Score Params: 0.007276917533816564
Final score: 0.07523105278968989


# Quantization

Nous allons tout d'abord présenter deux méthodes de Quantization que nous avons utilisé (Binary Connect et BWN). Vous trouverez plus d'informations sur les liens suivants.  
Binary Connect : https://proceedings.neurips.cc/paper/2015/hash/3e15cc11f979ed25912dff5b0669f2cd-Abstract.html$  
BWN : https://link.springer.com/chapter/10.1007/978-3-319-46493-0_32  
Ces méthodes sont intéressantes cependant leur utilisation ne nous permettaient pas d'obtenir une accuracy supérieure à 90% donc nous nous sommes tournés vers une autre méthode.
Nous allons quand même montrer un exemple d'utilisation.

In [None]:
modelbc = BC(model)
modelbc.model = modelbc.model.to(device)

optimizer_bc = torch.optim.SGD(modelbc.model.parameters(),lr = 0.00001)

In [None]:
train_losses, valid_losses, train_acc, valid_acc = training_binary(100, trainloader, validloader, modelbc, criterion, optimizer_bc)

In [None]:
evaluation_binary(modelbc,testloader,criterion)

Nous allons maintenant nous intéresser à une autre méthode de Quantization, l'ApOT Quantization : https://iclr.cc/virtual_2020/poster_BkgXT24tDS.html  
Pour résumer, cette méthode nous permet de quantizer les valeurs de paramères sur un n bits (dans notre cas 4 bits).

In [21]:
model_quant = densenet_cifar_quant()
model_quant.to(device=device)
clear_output(wait=True)

On peut bien entendu l'entrainer from scratch

In [22]:
optimizer = torch.optim.SGD(model.parameters(),lr=0.1, momentum=0.9,weight_decay=1e-4) 
scheduler = MultiStepLR(optimizer, milestones=[90, 110], gamma=0.1)

In [None]:
train_losses, valid_losses, train_acc, valid_acc = training(trainloader, validloader, model_quant, criterion, optimizer,120,scheduler,'models\\densenet_quantized.pt')

Ou loader un modèle déjà existant

In [41]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_quantized.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_quantized.pt', map_location=torch.device('cpu'))
model_quant.load_state_dict(loaded_cpt)

<All keys matched successfully>

Une autre méthode consiste à quantizer un modèle déjà performant post-training. Pour cela on modifie le state_dict du modèle pour rajouter les paramètres dont on a besoin.

In [33]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_trained.pt', map_location=torch.device('cpu'))
loaded_cpt_clone = loaded_cpt.copy()
for key in loaded_cpt.keys():
    if "conv" in key:
        if key.startswith("conv") == False:
            loaded_cpt_clone[key.replace("weight","act_alpha")] = torch.nn.Parameter(torch.tensor(8.0))
            loaded_cpt_clone[key.replace("weight","weight_quant.wgt_alpha")] = Parameter(torch.tensor(3.0))
model_quant.load_state_dict(loaded_cpt_clone)

<All keys matched successfully>

Un fine tuning est nécessaire mais cela est moins long.

In [34]:
optimizer = torch.optim.SGD(model.parameters(),lr=0.1, momentum=0.9,weight_decay=1e-4) 
scheduler = MultiStepLR(optimizer, milestones=[90, 110], gamma=0.1)

In [None]:
train_losses, valid_losses, train_acc, valid_acc = training(trainloader, validloader, model_quant, criterion, optimizer,120,scheduler,'densenet_quantized.pt')

In [38]:
bit = 4
for m in model_quant.modules():
    if isinstance(m, QuantConv2d):
        m.weight_quant = weight_quantize_fn(w_bit=bit)
        m.act_grid = build_power_value(bit)
        m.act_alq = act_quantization(bit, m.act_grid)

In [31]:
m = model_quant.dense1[0].conv1
print(m.weight_quant(m.weight))

tensor([[[[-0.3000]],

         [[-0.0000]],

         [[-0.9000]],

         [[ 0.3000]],

         [[-0.6000]],

         [[ 0.3000]],

         [[ 0.3000]],

         [[-0.3000]],

         [[-0.6000]],

         [[-1.2000]],

         [[-0.3000]],

         [[ 2.4000]],

         [[ 0.9000]],

         [[-0.3000]],

         [[-0.6000]],

         [[-0.3000]]],


        [[[-0.3000]],

         [[ 0.3000]],

         [[-0.6000]],

         [[-0.3000]],

         [[ 0.0000]],

         [[-2.4000]],

         [[ 3.0000]],

         [[ 0.9000]],

         [[-0.9000]],

         [[-0.9000]],

         [[ 0.9000]],

         [[ 0.9000]],

         [[-1.8000]],

         [[ 0.9000]],

         [[ 0.3000]],

         [[-1.2000]]],


        [[[ 0.6000]],

         [[-0.3000]],

         [[-0.3000]],

         [[-0.9000]],

         [[-0.3000]],

         [[ 2.4000]],

         [[-0.6000]],

         [[-0.3000]],

         [[ 0.6000]],

         [[ 0.3000]],

         [[-0.3000]],

       

On peut voir que les poids sont bien quantizés sur 4 bits.

In [None]:
evaluation(model_quant, testloader, criterion)

On voit que notre méthode nous donne de bons résultats (nous n'avons pas cherché à obtenir un modèle quantizé avec une précision parfaite car ce modèle n'est pas nécessaire pour le développement de notre modèle final).

Il est difficile d'implémenter le calcul du score pour notre modèle quantizé donc nous utilisons un modèle non quantizé mais avec la variable Quantization True pour prendre en compte la quantization.

In [29]:
model = densenet_cifar()
model.to(device=device)
flops , params = score(model,quantization = True)
print("Score flops: {} Score Params: {}".format(flops,params))
print("Final score: {}".format(flops + params))

Score flops: 0.06795413525587332 Score Params: 0.007912502297752578
Final score: 0.0758666375536259


On remarque que le score Micronet est plus faible que le score original donc il peut être intéressant de combiner le pruning et la quantization.

# Combinaison Apot Quantization et Pruning
Le principal challenge de notre projet était de combiner Quantization et Pruning. En modifiant la classe QuantConv2D, nous avons réussi et cela nous a permis d'obtenir un score au Micronet Challenge de 0.071

Les étapes qui vont suivre vont nous permettre de combiner la quantization et le pruning.  
Tout d'abord on définit un modèle quanitzé et on lui applique le prunage (on utilise le modèle pruné et non quantizé auquels on rajoute des paramètres utiles).

In [66]:
model_quant_pruned = densenet_cifar_quant()
model_quant_pruned.to(device=device)

DenseNet_Quant(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (dense1): Sequential(
    (0): Bottleneck_Quant(
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): QuantConv2d(
        16, 32, kernel_size=(1, 1), stride=(1, 1), bias=False
        (weight_quant): weight_quantize_fn()
      )
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): QuantConv2d(
        32, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (weight_quant): weight_quantize_fn()
      )
    )
    (1): Bottleneck_Quant(
      (bn1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv1): QuantConv2d(
        24, 32, kernel_size=(1, 1), stride=(1, 1), bias=False
        (weight_quant): weight_quantize_fn()
      )
      (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=Tr

In [67]:
parameters_to_prune=[]
for name, module in model_quant_pruned.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear) or isinstance(module, QuantConv2d):
        parameters_to_prune.append((module,'weight'))
prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)

In [68]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt', map_location=torch.device('cpu'))

loaded_cpt_clone = loaded_cpt.copy()
for key in loaded_cpt.keys():
  if key.startswith("conv1") == False:
      if "conv" in key and "orig" in key:
        loaded_cpt_clone[key.replace("weight_orig","act_alpha")] = torch.nn.Parameter(torch.tensor(8.0))
        loaded_cpt_clone[key.replace("weight_orig","weight_quant.wgt_alpha")] = Parameter(torch.tensor(3.0))

model_quant_pruned.load_state_dict(loaded_cpt_clone)

<All keys matched successfully>

On doit ensuite définir certains paramètres de la quantization et effectuer un forward

In [69]:
optimizer = torch.optim.SGD(model_quant_pruned.parameters(),lr=0.1, momentum=0.9,weight_decay=1e-4) 
scheduler = MultiStepLR(optimizer, milestones=[80, 110,130], gamma=0.1)

evaluation(model_quant_pruned, testloader, criterion)

bit = 4
for m in model_quant_pruned.modules():
  if isinstance(m, QuantConv2d):
    m.weight_quant = weight_quantize_fn(w_bit=bit)
    m.act_grid = build_power_value(bit)
    m.act_alq = act_quantization(bit, m.act_grid)


test Loss: 569.035003

test accuracy of plane: 72% (726/1000)
test accuracy of car: 20% (201/1000)
test accuracy of bird:  0% ( 2/1000)
test accuracy of cat:  1% (10/1000)
test accuracy of deer:  0% ( 0/1000)
test accuracy of dog:  0% ( 0/1000)
test accuracy of frog:  0% ( 0/1000)
test accuracy of horse:  0% ( 0/1000)
test accuracy of ship:  0% ( 0/1000)
test accuracy of truck:  0% ( 0/1000)

test accuracy (overall): 9.39% (939/10000)


In [None]:
train_losses, valid_losses, train_acc, valid_acc = training(trainloader, validloader, model_quant_pruned, criterion, optimizer,150,scheduler,"Densenet_pruned_quantized")


Il est également possible de loader notre modèle final.

In [70]:
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned_quantized.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned_quantized.pt', map_location=torch.device('cpu'))

loaded_cpt_clone = loaded_cpt.copy()
for key in loaded_cpt.keys():
  if key.startswith("conv1") == False:
      if "conv" in key and "orig" in key:
        loaded_cpt_clone[key.replace("weight_orig","act_alpha")] = torch.nn.Parameter(torch.tensor(8.0))
        loaded_cpt_clone[key.replace("weight_orig","weight_quant.wgt_alpha")] = Parameter(torch.tensor(3.0))

model_quant_pruned.load_state_dict(loaded_cpt_clone)

<All keys matched successfully>

In [72]:
m = model_quant_pruned.dense1[0].conv1
print(m.weight_quant(m.weight,m.weight_mask))

tensor([[[[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.3000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.9000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]]],


        [[[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[-1.8000]],

         [[ 2.4000]],

         [[ 0.6000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[-0.9000]]],


        [[[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[-0.3000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

         [[ 0.0000]],

       

On voit que les poids sont bien prunés et quantizés

In [73]:
evaluation(model_quant_pruned, testloader, criterion)

test Loss: 0.399867

test accuracy of plane: 90% (908/1000)
test accuracy of car: 96% (960/1000)
test accuracy of bird: 85% (855/1000)
test accuracy of cat: 76% (767/1000)
test accuracy of deer: 92% (922/1000)
test accuracy of dog: 87% (877/1000)
test accuracy of frog: 92% (925/1000)
test accuracy of horse: 91% (914/1000)
test accuracy of ship: 94% (944/1000)
test accuracy of truck: 93% (937/1000)

test accuracy (overall): 90.09% (9009/10000)


On peut noter que notre modèle atteint un score supérieur au 90% requis. Cependant cela a demandé quelques essais donc nous sommes vraiment à la limite.

In [41]:
model_pruned = densenet_cifar()
model_pruned.to(device=device)

parameters_to_prune=[]
for name, module in model_pruned.named_modules():
    if isinstance(module, torch.nn.Conv2d) or isinstance(module, torch.nn.Linear) or isinstance(module, QuantConv2d):
        parameters_to_prune.append((module,'weight'))
prune.global_unstructured(parameters_to_prune, pruning_method=prune.L1Unstructured, amount=0.2)
        
if torch.cuda.is_available():
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt')
else:
    loaded_cpt=torch.load('models\\Densenet_cifar_pruned.pt', map_location=torch.device('cpu'))
model_pruned.load_state_dict(loaded_cpt)
evaluation(model_pruned, testloader, criterion)


# En exécutant cette cellule, nous perdons le mask associé au pruning, mais cela nous permet de pouvoir calculer le score micronet
for name, module in model_pruned.named_modules():
# prune X % of connections in all 2D-conv layers
    if isinstance(module, torch.nn.Conv2d):
        prune.remove(module, 'weight')
    elif isinstance(module, QuantConv2d):
        prune.remove(module, 'weight')
    elif isinstance(module, torch.nn.Linear):
        prune.remove(module, 'weight')

flops , params = score(model_pruned,quantization = True)
print("Score flops: {} Score Params: {}".format(flops,params))
print("Final score: {}".format(flops + params))

test Loss: 0.304136

test accuracy of plane: 93% (939/1000)
test accuracy of car: 97% (974/1000)
test accuracy of bird: 90% (903/1000)
test accuracy of cat: 84% (843/1000)
test accuracy of deer: 95% (950/1000)
test accuracy of dog: 88% (881/1000)
test accuracy of frog: 94% (944/1000)
test accuracy of horse: 93% (935/1000)
test accuracy of ship: 94% (949/1000)
test accuracy of truck: 94% (947/1000)

test accuracy (overall): 92.65% (9265/10000)
Score flops: 0.06795413525587332 Score Params: 0.0028913423904609664
Final score: 0.07084547764633428


On obtient finalement en combinant ces deux techniques un score Micronet de 0.071   
On aurait pu améliorer notre score en quantizant sur 2 bits ou en effectuant du Structured Pruning.

# Conclusion
Ce projet nous as permis d'assimiler de nombreuses notions importantes en Deep Learning tel que la a quantization, le pruning, la factorization ou la distillation.  
Notre modèle peut évidemment être encore amélioré en utilisant des techniques comme la distillation ou la factorization.  
Si vous voulez plus de détails sur notre projet en général nous vous invitons à regarder le fichier Micronet_Challenge.pdf qui résume nos expériences.