<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In [None]:
#| include: false
from fastai.vision.all import *
from fastai.callback.all import *

from fasterai.sparse.all import *

import torch
import torch.nn as nn
import torch.nn.functional as F

:::{.callout-important}

The Pruner method currently works on fully-feedforward ConvNets, e.g. VGG16. Support for residual connections, e.g. ResNets is under development.

:::

When our network has filters containing zero values, there is an additional step that we may take. Indeed, those zero-filters can be **physically** removed from our network, allowing us to get a new, dense, architecture.

This can be done by reexpressing each layer, reducing the number of filter, to match the number of non-zero filters. However, when we remove a filter in a layer, this means that there will be a missing activation map, which should be used by all the filters in the next layer. So, not only should we physically remove the filter, but also its corresponding kernel in each of the filters in the next layer (see Fig. below)

![](imgs/pruning_filters.png "Pruning Filters")

Let's illustrate this with an example:

In [None]:
path = untar_data(URLs.PETS)

files = get_image_files(path/"images")

def label_func(f): return f[0].isupper()

In [None]:
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(64))

In [ ]:
#|output: asis
#| echo: false
show_doc(Pruner)

In [None]:
learn = Learner(dls, vgg16_bn(num_classes=2), metrics=accuracy)

In [None]:
#| include: false
def count_parameters(model):
    return sum(p.numel() for p in model.parameters())

In [None]:
count_parameters(learn.model)

134277186

Our initial model, a VGG16, possess more than 134 million parameters. Let's see what happens when we make it sparse, on a filter level

In [None]:
sp_cb=SparsifyCallback(end_sparsity=50, granularity='filter', method='local', criteria=large_final, sched_func=sched_onecycle)

In [None]:
learn.fit_one_cycle(3, 3e-4, cbs=sp_cb)

Pruning of filter until a sparsity of 50%
Saving Weights at epoch 0


epoch,train_loss,valid_loss,accuracy,time
0,0.897482,0.611214,0.698241,00:14
1,0.658607,0.561114,0.70636,00:13
2,0.555238,0.527486,0.718539,00:13


Sparsity at the end of epoch 0: 10.43%
Sparsity at the end of epoch 1: 48.29%
Sparsity at the end of epoch 2: 50.00%
Final Sparsity: 50.00


In [None]:
count_parameters(learn.model)

134277186

The total amount of parameters hasn't changed! This is because we only replaced the values by zeroes, leading to a sparse model, but they are still there.

The `Pruner` will take care of removing those useless filters.

In [None]:
pruner = Pruner()
pruned_model = pruner.prune_model(learn.model)

Done! Let's see if the performance is still the same

In [None]:
pruned_learn = Learner(dls, pruned_model.cuda(), metrics=accuracy)

In [None]:
pruned_learn.validate()

(#2) [0.5265399813652039,0.7212449312210083]

In [None]:
count_parameters(pruned_learn.model)

71858210

Now we have 71 million of parameters, approximately 50% of the initial parameters as we asked!