### Pruning Untrained Networks
This notebook shows that removing units with negative attributions can boost the accuracy of untrained networks on simple datasets significantly beyond random.

In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("./..")

In [2]:
import torch
import numpy as np
from torchsummary import summary, torchsummary
from torchpruner import (Pruner, ShapleyAttributionMetric)
import experiments.models.mnist as mnist
import experiments.models.cifar10 as cifar10
from experiments.utils import train, test

# Fix seed for reproducibility
# Since we do not perform any training, the accuracy after pruning is largely 
# affected by the network initialization (i.e. whether the randomly initialized 
# network contains good sub-graphs for the task).
# While different seeds might give quite different results, in all cases
# an increase in accuracy should be observed.
torch.manual_seed(1)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(1)

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print (f"Using device: {device}")

Using device: cuda


### MNIST
**Load a simple dense network with 2 hidden layers. 5.7M parameters and initial test accuracy ~10%**

In [3]:
## Load dataset
train_loader, val_loader, test_loader = mnist.get_dataset_and_loaders(val_split=1000, val_batch_size=1000)
loss = mnist.loss
input_size = (1, 28, 28)

# Print layer architecture and test performance
model, name = mnist.get_fc_model_with_name()
model.to(device)
summary(model, input_size=input_size, device=device.type)
test(model, device, loss, test_loader);

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                 [-1, 2024]       1,588,840
         LeakyReLU-3                 [-1, 2024]               0
            Linear-4                 [-1, 2024]       4,098,600
         LeakyReLU-5                 [-1, 2024]               0
            Linear-6                   [-1, 10]          20,250
Total params: 5,707,690
Trainable params: 5,707,690
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.07
Params size (MB): 21.77
Estimated Total Size (MB): 21.84
----------------------------------------------------------------
Test set: Average loss: 2.3018, Accuracy: 716/10000 (7.160%)



### Prune hidden layers using Shapley Value attributions.

**The resulting network has ~41% of the original parameters and ~50% accuracy.**

In [4]:
%%time
# Define prunable layers
layers = list(model.fc.children())
prunable_layers = [
    # (module_to_prune -> [modules_for_cascading_pruning])
    (layers[1], [layers[3]]),
    (layers[3], [layers[5]]),
]

pruner = Pruner(model, input_size, device)
attribution = ShapleyAttributionMetric(model, val_loader, mnist.loss, device, sv_samples=5)
    
# Prune layers starting from the outermost
for module, cascading_modules in prunable_layers[::-1]:
    # Compute Shapley Value attributions
    attr, _ = attribution.run([module])[0]
    # Select indices corresponding to negative attributions
    pruning_indices = np.argwhere(attr < 0).flatten()
    # Perform pruning
    pruner.prune_model(module, pruning_indices, cascading_modules=cascading_modules)

# Test final model
summary(model, input_size=input_size, device=device.type)
test(model, device, loss, test_loader);

Computing Shapley values on Linear(in_features=2024, out_features=2024, bias=True)...
--> can run with partials
Considering cascading modules [Linear(in_features=2024, out_features=10, bias=True)]
Pruning 998 units from Linear(in_features=2024, out_features=10, bias=True) (in)
Pruning 998 units from Linear(in_features=2024, out_features=2024, bias=True) (out)
Computing Shapley values on Linear(in_features=784, out_features=2024, bias=True)...
--> can run with partials
Considering cascading modules [Linear(in_features=2024, out_features=2024, bias=True)]
Pruning 693 units from Linear(in_features=2024, out_features=2024, bias=True) (in)
Pruning 693 units from Linear(in_features=784, out_features=2024, bias=True) (out)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                 [-1, 1331]       1,044,835
         LeakyReL

### CIFAR-10
**Same experiment on CIFAR-10. We use the same pruning procedure and network architecture (except the first layer now is equipped with 32x32x3 input units).**

In [5]:
%%time
## Load dataset
train_loader, val_loader, test_loader = cifar10.get_dataset_and_loaders(val_split=1000, val_batch_size=1000)
loss = cifar10.loss
input_size = (3, 32, 32)

# Print layer architecture and test performance
model, name = cifar10.get_fc_model_with_name()
model.to(device)
summary(model, input_size=input_size, device=device.type)
test(model, device, loss, test_loader);

# Define prunable layers
layers = list(model.fc.children())
prunable_layers = [
    # (module_to_prune -> [modules_for_cascading_pruning])
    (layers[1], [layers[3]]),
    (layers[3], [layers[5]]),
]

pruner = Pruner(model, input_size, device)
attribution = ShapleyAttributionMetric(model, val_loader, loss, device, sv_samples=5)
    
# Prune layers starting from the outermost
for module, cascading_modules in prunable_layers[::-1]:
    # Compute Shapley Value attributions
    attr, _ = attribution.run([module])[0]
    # Select indices corresponding to negative attributions
    pruning_indices = np.argwhere(attr < 0).flatten()
    # Perform pruning
    pruner.prune_model(module, pruning_indices, cascading_modules=cascading_modules)
    test(model, device, loss, test_loader);

# Test final model
summary(model, input_size=input_size, device=device.type)
test(model, device, loss, test_loader);

Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                 [-1, 3072]               0
            Linear-2                 [-1, 2024]       6,219,752
         LeakyReLU-3                 [-1, 2024]               0
            Linear-4                 [-1, 2024]       4,098,600
         LeakyReLU-5                 [-1, 2024]               0
            Linear-6                   [-1, 10]          20,250
Total params: 10,338,602
Trainable params: 10,338,602
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.09
Params size (MB): 39.44
Estimated Total Size (MB): 39.54
----------------------------------------------------------------
Test set: Average loss: 2.3085, Accuracy: 1099/10000 (10.990%)

Computing Shapley values on Linear(in_features=2024, out_fea