# Network Pruning

When you sparsify your network, you don't really remove the parameters so you don't take advantage of it. There are a lot of research being done to accelerate computation on sparse matrices and we expect to have it implemented in PyTorch soon: [see `torch.sparse`](https://pytorch.org/docs/stable/sparse.html)

What I mean here by pruning is the process of completely remove the sparsified parameters. This can thus only be done when the granularity is at the level of complete filters as it doesn't make sense to remove a single parameter.

 > Note: only Sequential feed-forward networks are supported for now.

In [1]:
from fastai.vision import *
from fastai.callbacks import *

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
path = untar_data(URLs.IMAGENETTE_160)

In [4]:
data = (ImageList.from_folder(path)
                .split_by_folder(train='train', valid='val')
                .label_from_folder()
                .transform(get_transforms(), size=128)
                .databunch(bs=64)
                .normalize(imagenet_stats))

In [5]:
import sys
sys.path.append('../')

from fasterai.sparsifier import *

## Network

In [49]:
class Net(nn.Module):
    def __init__(self, mnist=True):
        super().__init__()
          
        self.conv1 = nn.Conv2d(3, 32, 5, 1)
        self.conv2 = nn.Conv2d(32, 64, 5, 1)
        self.conv3 = nn.Conv2d(64, 128, 5, 1)
        self.pool = nn.AdaptiveAvgPool2d((1))
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 10)
    
    def forward(self, x):

        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        x = x.view(x.shape[0], -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [38]:
learn = Learner(data, Net().cuda(), metrics=[accuracy])

In [40]:
learn.fit_one_cycle(3, 1e-3, callbacks=[SparsifyCallback(learn, sparsity=30, granularity='filter', method='local', criteria='l1', sched_func=annealing_cos)])

Pruning of filter until a sparsity of 30%


epoch,train_loss,valid_loss,accuracy,time
0,2.001169,1.92613,0.330701,00:10
1,1.769364,1.678919,0.421146,00:11
2,1.669904,1.637165,0.430828,00:10


Sparsity at epoch 0: 7.59%
Sparsity at epoch 1: 22.59%
Sparsity at epoch 2: 30.00%
Final Sparsity: 30.00


Let's double check that we correctly removed the parameters:

In [41]:
for k,m in enumerate(learn.model.modules()):
    if isinstance(m, nn.Conv2d):
        print(f"Sparsity in {m.__class__.__name__} {k}: {100. * float(torch.sum(m.weight == 0))/ float(m.weight.nelement()):.2f}%")

Sparsity in Conv2d 1: 28.12%
Sparsity in Conv2d 2: 29.69%
Sparsity in Conv2d 3: 29.69%


In [42]:
learn.model.conv1.bias

Parameter containing:
tensor([-0.0767,  0.0534,  0.0922, -0.0256,  0.0816,  0.0000, -0.1097,  0.0327,
        -0.0207,  0.0377, -0.0981, -0.0050, -0.0000, -0.0649,  0.0000, -0.0225,
        -0.0531, -0.0358,  0.0000,  0.0666,  0.0279,  0.0000, -0.0000, -0.0608,
        -0.0000, -0.0635,  0.0000,  0.0631, -0.0203,  0.0000, -0.1003,  0.0700],
       device='cuda:0', requires_grad=True)

In [43]:
from fasterai.pruner import *

In [44]:
pruner = Pruner()

In [45]:
pruned_model = pruner.prune_model(learn.model)

In [46]:
pruned_learn = Learner(data, pruned_model, metrics =[accuracy])

In [47]:
print(f'The original model had {100*learn.validate()[1]:.2f} % accuracy')

The original model had 43.08 % accuracy


In [48]:
print(f'The pruned model has {100*pruned_learn.validate()[1]:.2f} % accuracy')

The pruned model has 43.08 % accuracy


In [None]:
learn.summary()

In [None]:
pruned_learn.summary()

We can see now that our network has a lot of parameters removed, because we removed the filters that were not useful anymore (all of their weight were 0).

## VGG16

In [50]:
learn = Learner(data, models.vgg16_bn(num_classes=10).cuda(), metrics=[accuracy])

In [8]:
learn.fit_one_cycle(3, 1e-3, callbacks=[SparsifyCallback(learn, sparsity=30, granularity='filter', method='local', criteria='l1', sched_func=annealing_cos)])

Pruning of filter until a sparsity of 30%


epoch,train_loss,valid_loss,accuracy,time
0,2.070591,2.644525,0.238981,00:50
1,1.737292,1.674513,0.440764,00:51
2,1.416635,1.234864,0.594395,00:50


Sparsity at epoch 0: 7.59%
Sparsity at epoch 1: 22.59%
Sparsity at epoch 2: 30.00%
Final Sparsity: 30.00


In [9]:
for k,m in enumerate(learn.model.modules()):
    if isinstance(m, nn.Conv2d):
        print(f"Sparsity in {m.__class__.__name__} {k}: {100. * float(torch.sum(m.weight == 0))/ float(m.weight.nelement()):.2f}%")

Sparsity in Conv2d 2: 29.69%
Sparsity in Conv2d 5: 29.69%
Sparsity in Conv2d 9: 29.69%
Sparsity in Conv2d 12: 29.69%
Sparsity in Conv2d 16: 29.69%
Sparsity in Conv2d 19: 29.69%
Sparsity in Conv2d 22: 29.69%
Sparsity in Conv2d 26: 29.88%
Sparsity in Conv2d 29: 29.88%
Sparsity in Conv2d 32: 29.88%
Sparsity in Conv2d 36: 29.88%
Sparsity in Conv2d 39: 29.88%
Sparsity in Conv2d 42: 29.88%


In [10]:
pruner = Pruner()

In [11]:
pruned_model = pruner.prune_model(learn.model)

	nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
	nonzero(Tensor input, *, bool as_tuple)


In [12]:
pruned_learn = Learner(data, pruned_model, metrics =[accuracy])

In [30]:
print(f'The original model had {100*learn.validate()[1]:.2f} % accuracy')

The original model had 63.24 % accuracy


In [31]:
print(f'The pruned model has {100*pruned_learn.validate()[1]:.2f} % accuracy')

The pruned model has 56.48 % accuracy


In [15]:
learn.summary()

VGG
Layer (type)         Output Shape         Param #    Trainable 
Conv2d               [64, 128, 128]       1,792      True      
______________________________________________________________________
BatchNorm2d          [64, 128, 128]       128        True      
______________________________________________________________________
ReLU                 [64, 128, 128]       0          False     
______________________________________________________________________
Conv2d               [64, 128, 128]       36,928     True      
______________________________________________________________________
BatchNorm2d          [64, 128, 128]       128        True      
______________________________________________________________________
ReLU                 [64, 128, 128]       0          False     
______________________________________________________________________
MaxPool2d            [64, 64, 64]         0          False     
__________________________________________________________

In [16]:
pruned_learn.summary()

VGG
Layer (type)         Output Shape         Param #    Trainable 
Conv2d               [45, 128, 128]       1,260      True      
______________________________________________________________________
BatchNorm2d          [45, 128, 128]       90         True      
______________________________________________________________________
ReLU                 [45, 128, 128]       0          False     
______________________________________________________________________
Conv2d               [45, 128, 128]       18,270     True      
______________________________________________________________________
BatchNorm2d          [45, 128, 128]       90         True      
______________________________________________________________________
ReLU                 [45, 128, 128]       0          False     
______________________________________________________________________
MaxPool2d            [45, 64, 64]         0          False     
__________________________________________________________