# Network Pruning

When you sparsify your network, you don't really remove the parameters so you don't take advantage of it. There are a lot of research being done to accelerate computation on sparse matrices and we expect to have it implemented in PyTorch soon: [see `torch.sparse`](https://pytorch.org/docs/stable/sparse.html)

What I mean here by pruning is the process of completely remove the sparsified parameters. This can thus only be done when the granularity is at the level of complete filters as it doesn't make sense to remove a single parameter.

 > Note: only Sequential feed-forward networks are supported for now.

In [1]:
from fastai.vision import *
from fastai.callbacks import *

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [3]:
path = untar_data(URLs.IMAGENETTE_160)

In [4]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
data = (ImageList.from_folder(path)
                .split_by_folder(train='train', valid='val')
                .label_from_folder()
                .transform(get_transforms(), size=64)
                .databunch(bs=64)
                .normalize(imagenet_stats))

In [6]:
import sys
sys.path.append('../')

from fasterai.sparsifier_test import *

## Network

In [13]:
class Net(nn.Module):
    def __init__(self, mnist=True):
        super().__init__()
          
        self.conv1 = nn.Conv2d(3, 32, 5, 1)
        self.conv2 = nn.Conv2d(32, 64, 5, 1)
        self.conv3 = nn.Conv2d(64, 128, 5, 1)
        self.pool = nn.AdaptiveAvgPool2d((1,1))
        self.fc1 = nn.Linear(128, 64)
        self.fc2 = nn.Linear(64, 10)
    
    def forward(self, x):

        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        x = x.view(x.shape[0], -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [14]:
learn = Learner(data, Net().cuda(), metrics=[accuracy])

In [15]:
learn.fit_one_cycle(3, 1e-3, callbacks=[SparsifyCallback(learn, sparsity=30, granularity='filter', method='local', criteria='l1', sched_func=annealing_cos)])

Pruning of filter until a sparsity of 30%


epoch,train_loss,valid_loss,accuracy,time
0,2.046987,1.906558,0.329172,00:07
1,1.775226,1.666416,0.426242,00:07
2,1.640379,1.592098,0.471338,00:07


Saving Weights at epoch 0
Sparsity at the end of epoch 0: 7.50%
Sparsity at the end of epoch 1: 22.50%
Sparsity at the end of epoch 2: 30.00%
Final Sparsity: 30.00


Let's double check that we correctly removed the parameters:

In [16]:
for k,m in enumerate(learn.model.modules()):
    if isinstance(m, nn.Conv2d):
        print(f"Sparsity in {m.__class__.__name__} {k}: {100. * float(torch.sum(m.weight == 0))/ float(m.weight.nelement()):.2f}%")

Sparsity in Conv2d 1: 31.25%
Sparsity in Conv2d 2: 29.69%
Sparsity in Conv2d 3: 30.47%


In [17]:
from fasterai.pruner import *

In [18]:
pruner = Pruner()
pruned_model = pruner.prune_model(learn.model)

In [19]:
pruned_learn = Learner(data, pruned_model.cuda(), metrics=[accuracy])

In [20]:
pruned_learn.validate()

[1.5920982, tensor(0.4713)]

In [21]:
print(f'The original model had {100*learn.validate()[1]:.2f} % accuracy')

The original model had 47.13 % accuracy


In [22]:
learn.summary()

Net
Layer (type)         Output Shape         Param #    Trainable 
Conv2d               [32, 60, 60]         2,432      True      
______________________________________________________________________
Conv2d               [64, 26, 26]         51,264     True      
______________________________________________________________________
Conv2d               [128, 22, 22]        204,928    True      
______________________________________________________________________
AdaptiveAvgPool2d    [128, 1, 1]          0          False     
______________________________________________________________________
Linear               [64]                 8,256      True      
______________________________________________________________________
Linear               [10]                 650        True      
______________________________________________________________________

Total params: 267,530
Total trainable params: 267,530
Total non-trainable params: 0
Optimized with 'torch.optim.adam.Adam

We can see now that our network has a lot of parameters removed, because we removed the filters that were not useful anymore (all of their weight were 0).

## VGG16

In [23]:
learn = Learner(data, models.vgg16_bn(num_classes=10).cuda(), metrics=[accuracy])

In [24]:
learn.fit_one_cycle(3, 1e-3, callbacks=[SparsifyCallback(learn, sparsity=30, granularity='filter', method='local', criteria='l1', sched_func=annealing_cos)])

Pruning of filter until a sparsity of 30%


epoch,train_loss,valid_loss,accuracy,time
0,2.199146,2.65628,0.216051,00:16
1,1.975814,1.741973,0.384713,00:16
2,1.634306,1.458824,0.477962,00:17


Saving Weights at epoch 0
Sparsity at the end of epoch 0: 7.50%
Sparsity at the end of epoch 1: 22.50%
Sparsity at the end of epoch 2: 30.00%
Final Sparsity: 30.00


In [25]:
for k,m in enumerate(learn.model.modules()):
    if isinstance(m, nn.Conv2d):
        print(f"Sparsity in {m.__class__.__name__} {k}: {100. * float(torch.sum(m.weight == 0))/ float(m.weight.nelement()):.2f}%")

Sparsity in Conv2d 2: 29.69%
Sparsity in Conv2d 5: 29.69%
Sparsity in Conv2d 9: 30.47%
Sparsity in Conv2d 12: 30.47%
Sparsity in Conv2d 16: 30.08%
Sparsity in Conv2d 19: 30.08%
Sparsity in Conv2d 22: 30.08%
Sparsity in Conv2d 26: 30.08%
Sparsity in Conv2d 29: 30.08%
Sparsity in Conv2d 32: 30.08%
Sparsity in Conv2d 36: 30.08%
Sparsity in Conv2d 39: 30.08%
Sparsity in Conv2d 42: 30.08%


In [26]:
pruner = Pruner()

In [27]:
pruned_model = pruner.prune_model(learn.model)

In [28]:
pruned_learn = Learner(data, pruned_model.cuda(), metrics=[accuracy])

In [29]:
pruned_learn.validate()

[1.4587042, tensor(0.4769)]