### IMPORT
*   `import torch` imports fundamental PyTorch library
*   `from torch import nn` imports nn module from PyTorch that contains all building blocks to define neural networks
*   `import torch.nn.utils.prune as prune` imports prune module that enables applying prune techniques.
*   `import torch.nn.functional as F` this module provides more functional aproach to building neural netorks and it contains functions for activation functions, pooling operations or loss calculations.


In [None]:
import torch
from torch import nn
import torch.nn.utils.prune as prune
import torch.nn.functional as F

### CREATE A MODEL
*   `device = torch.device("cuda" if torch.cuda.is_available() else "cpu")` if GPU available then sets device to cuda or else set it to cpu
*   `class LeNet(nn.Module):` create class LeNet of the LeNet architecture that inherits the nn module that we defined above
*   `super(LeNet, self).__init__()` calls constructor of parent class nn.Module to initialize base module
*   `self.conv1 = nn.Conv2d(1, 6, 5)` creates a 2D convolutional layer. 1 is no. of input channels for grayscale images. 6 is no. of output channels i.e. feature maps. 5 is size of convolutional kernel i.e. 5x5 square.
*   `self.conv2 = nn.Conv2d(6, 16, 5)` it has 6 input channels that takes input from output of conv1 so no. of input channels should match no. of outut channels. the no. of output channels are 16 and its increased to 16 cause it allows the network to learn more complex and diverse features. the kernel size is set to 5 i.e. same as conv1 to make it 5x5.
*   `self.fc1 = nn.Linear(16*5*5,120)` the input size is 400 i.e. 16*5*5. the input layer is first fully connected layer is flattened output from previous convolutional layer. the output size is 120 cause it determines no. of neurons in this fully connected layer.
*   `self.fc2 = nn.Linear(120, 84)` input size set to 120 cause output of fc1 is 120. output size is set to 84
*   `self.fc3 = nn.Linear(84, 10)` input size set to 84 cause output of fc2 is 84. output size is set to 10 that is the number of classes being used.
*   `def forward(self, x)` its forward pass of LeNet model
*   `x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))` it applies first convolutional layer that extracts features from input images via *(self.conv1)* ReLU activation is applied to introduce non-linearity *(F.relu)* and max-pooling using *(F.max_pool2d)* with a 2x2 kernel and stride of 2
*   `x = F.max_pool2d(F.relu(self.conv2(x)), 2)` mirrors structure of 1st layer and the only difference is that it has kernel size of 2.
*   `x = x.view(-1, int(x.nelement()/x.shape[0]))` this flattens multi-dimensional output from convolutional layers into 1-dimensional vector. *x.view(...)* reshapes tensor x. *-1* is a placeholder for PyTorch that infers size of dimension based on other dimensions and total no. of elements. *int(x.nel../..)* calculates size of flattened dimension. *x.nelement* gives total no. of elements in tensor and *x.shape[0]* gives batch size
*   `x = F.relu(self.fc1(x)) and F.relu(self.fc2(x))` applies 1st and 2nd fully connected layer followed by ReLU activation. the fully connected layers process flattened features to learn higher-level representations.  
*   `return x` its final output of network
*   `model = LeNet().to(device=device)` creates instance of the class and then moves model's parameters and buffers to specified device.




In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class LeNet(nn.Module):
  def __init__(self): #constructor
    super(LeNet, self).__init__()
    # 1 input image channel, 6 output channels, 5x5 square conv kernel
    self.conv1 = nn.Conv2d(1, 6, 5)
    self.conv2 = nn.Conv2d(6, 16, 5)
    self.fc1 = nn.Linear(16 * 5 * 5, 120) #5x5 image dimension
    self.fc2 = nn.Linear(120, 84)
    self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
      x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
      x = F.max_pool2d(F.relu(self.conv2(x)), 2)
      x = x.view(-1, int(x.nelement() / x.shape[0]))
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      return x

model = LeNet().to(device = device)

### INSPECTING MODEL
*   `module = model.conv1` it accesses specific layer of conv1 that we have to prune
*   `print(list(module.named_parameters()))` it prints list of trainable parameters associated with *conv1* layer of the model. it iterates through module and returns each parameter's name along with its corresponding tensors
*   `print(list(module.named_buffers()))` it prints all non-trainable parameters i.e. buffers who store values that are updated through training but arent considered model's learnable parameters.


In [None]:
module = model.conv1
print(list(module.named_parameters()))
print(list(module.named_buffers()))

### PRUNING MODULE
*   `prune.rando_unstructured` is pruning method where pruning is applied randomly accross all weight specified tensor without considering any specific pattern.
*   `module` it is the conv1 that we defined before.
*   `name = "weight"` this is the parameter within that module that is going to be pruned.
*   `amount = 0.3` it indicates the percentage of connections to prune and its a float between 0 and 1.
*   `print(list(module.named_parameters()))` pruning removes *weight* from parameters and replaces it with new parameter called *weight_orig* that stores unpruned version of tensor. *bias* doesnt get pruned.
*   `print(list(module.named_buffers()))` pruning mask generated by pruning technique selected above is saved as module buffer named *weight_mask*
*   `print(module.weight)` for forward pass to work without modification *weight* attribute has to be there. the pruned version of weight is stored in the attribute *weight*
*   `print(module._forward_pre_hooks)` pruning is applied to each forward pass using PyTorch's *forward_pre_hooks*. when module is pruned, it will acquire *forward_pre_hook* for each parameter associated with gets pruned.
*   `prune.l1_unstructured(module, name = "bias", amount = 3)` *l1_unstructured* means that pruning is applied individually to each element of specified tensor without considering specific pattern. this one is based on l1 norm magnitude assumes that elements with smaller magnitudes are less important and can be pruned without significantly impacting ,odels performance. we now prune *bias* to see how parameters, buffers, hooks and attributes of module change.
*   `print(list(module.named_parameters())) and print(list(module.named_buffers())` now parameters will include both weight_orig and bias_orig. buffers will include weight_mask and bias_mask. module will now have 2 forward_pre_hooks



In [None]:
prune.random_unstructured(module, name = "weight", amount = 0.3)
print(list(module.named_parameters()))

In [None]:
print(list(module.named_buffers()))

In [None]:
print(module.weight)

In [None]:
print(module._forward_pre_hooks)

In [None]:
prune.l1_unstructured(module, name = "bias", amount = 3)
print(list(module.named_parameters()))

In [None]:
print(list(module.named_buffers()))

In [None]:
print(module.bias)

In [None]:
print(module._forward_pre_hooks)

### ITERATIVE PRUNING
*   `prune.ln_structured` aims to remove entire rows or columns of weight tensor in structured manner.
*   `name = "weight"` specifies the parameter i.e. weight tensor that has to be pruned
*   `amount = 0.5` sets sparsity level to 50% for pruned parameter
*   `n = 2` pruning will be performed in a 2d structured manner
*   `dim = 0` means that prunning will be applied along 1st dimension of weight tensor that typically correspond to output channels.
*   `print(module.weight)` pruned version of weight stored in this attribute and printing it.
*   `for hook in module._forward_pre_hooks.values():` iterates over values of *_forward_pre_hooks* dictionary that contains hooks registered of module object.
*   `if hook._tensor_name == "weight":`checks if *_tensor_name* attribute of current hook is equal to weight. basically selects specific hook associated with weight tensor.
*   `print(list(hook))` prints pruning history stored in hook object that provides information about pruning process





In [None]:
prune.ln_structured(module, name = "weight", amount = 0.5, n = 2, dim = 0)
print(module.weight)

In [None]:
for hook in module._forward_pre_hooks.values():
  if hook._tensor_name == "weight": #select out correct hook
    break

print(list(hook)) #pruning history in container


### SERIALIZING PRUNED MODEL
*   `print(model.state_dict().keys())` all tensors, mask buffers and original parameters used to compute pruned tensors are stored in model's *state_dict* and can be serialized easily.



In [None]:
print(model.state_dict().keys())

### REMOVING PRUNING RE-PARAMETRIZATION

> to make pruning permanent, remove re-parametrization in terms of *weight_orig* and *weight_mask* and remove *forward_pre_hook*

*   `print(list(module.named_parameters())) and print(list(module.named_buffers()))` to print the parameterized and non-parameterized values
*   `print(module.weight)` to print the weight tensor
*   `prune.remove(module, 'weight')
print(list(module.named_parameters()))` *prune.remove* is a function to remove re-parametrization from the mentioned module and parameter *weight*
*   `print(list(module.named_buffers()))` to print the buffers and check if the function did remove it.


In [None]:
print(list(module.named_parameters()))

In [None]:
print(list(module.named_buffers()))

In [None]:
print(module.weight)

In [None]:
prune.remove(module, 'weight')
print(list(module.named_parameters()))

In [None]:
print(list(module.named_buffers()))

### PRUNING MULTIPLE PARAMETERS IN A MODEL
*   `new_model = LeNet()` creating instance of LeNet model
*   `for name, module in new_model.named_modules():` iterates through all layers within the model. *name* holds name of current model. *module* holds actual module object.
*   `if isinstance(module, torch.nn.Conv2d):` checks if current module is 2d convolutional layer or not
*   `prune.l1_unstructured(module, name='weight', amount=0.2)` this removes smallest 20% of connections i.e. weights in convolutional layer based on their absolute magnitude so connections with least impact are removed
*   `elif isinstance(module, torch.nn.Linear):` checks if current module is fully connected linear layer
*   `prune.l1_unstructured(module, name='weight', amount=0.2)` if linear layer then apply l1 unstructured pruning to layer's weights, removing smallest 40% of connections.
*   `print(dict(new_model.named_buffers()).keys())` helps verify that pruning process has created pruning masks for each layer. pruning masks are binary tensors (0s and 1s) that indicate which connections are kept (1) and which are pruned (0)




In [None]:
new_model = LeNet()
for name, module in new_model.named_modules():
    # prune 20% of connections in all 2D-conv layers
    if isinstance(module, torch.nn.Conv2d):
        prune.l1_unstructured(module, name='weight', amount=0.2)
    # prune 40% of connections in all linear layers
    elif isinstance(module, torch.nn.Linear):
        prune.l1_unstructured(module, name='weight', amount=0.4)

print(dict(new_model.named_buffers()).keys())  # to verify that all masks exist

### GLOBAL PRUNING
*   `model = LeNet()` creates instanc of LeNet model
*   `parameters_to_prune=(..)` defines a tuple that contains all the layers of the module and its parameters that has to be pruned.
*   `prune.global_unstructured(..)` method to apply pruning globally across all parameters specified.
*   `pruning_method = prune.L1Unstructured` we are using L1 unstructured pruning method that basically removes individual weights based on their absolute magnitude.
*   `amount=0.2` 20% of weights should be pruned globally accross all specified parameters.


In [None]:
model = LeNet()

parameters_to_prune = (
    (model.conv1, 'weight'),
    (model.conv2, 'weight'),
    (model.fc1, 'weight'),
    (model.fc2, 'weight'),
    (model.fc3, 'weight'),
)

prune.global_unstructured(
    parameters_to_prune.
    pruning_method = prune.L1Unstructured,
    amount = 0.2,
)

### CHECKING SPARSITY AT EACH LAYER AND GLOBALLY

> here sparsity refers to percentage of weights that have been set to 0. higher sparsity means more weights have been pruned.

*   the calculation of sparsity in layers is calculated by first getting total no. of elements in weight tensor then divide no. of zero weights by total no. of weights and multiplied by 100 to get sparsity percentage.
*   to calculate global sparsity it sums up no. of zero weights accross all specified layers and divides it by total no.s of weights in these layers to calculate overall global sparsity.





In [None]:
print(
    "Sparsity in conv1.weight: {:.2f}%".format(
        100. * float(torch.sum(model.conv1.weight == 0))
        / float(model.conv1.weight.nelement())
    )
)
print(
    "Sparsity in conv2.weight: {:.2f}%".format(
        100. * float(torch.sum(model.conv2.weight == 0))
        / float(model.conv2.weight.nelement())
    )
)
print(
    "Sparsity in fc1.weight: {:.2f}%".format(
        100. * float(torch.sum(model.fc1.weight == 0))
        / float(model.fc1.weight.nelement())
    )
)
print(
    "Sparsity in fc2.weight: {:.2f}%".format(
        100. * float(torch.sum(model.fc2.weight == 0))
        / float(model.fc2.weight.nelement())
    )
)
print(
    "Sparsity in fc3.weight: {:.2f}%".format(
        100. * float(torch.sum(model.fc3.weight == 0))
        / float(model.fc3.weight.nelement())
    )
)
print(
    "Global sparsity: {:.2f}%".format(
        100. * float(
            torch.sum(model.conv1.weight == 0)
            + torch.sum(model.conv2.weight == 0)
            + torch.sum(model.fc1.weight == 0)
            + torch.sum(model.fc2.weight == 0)
            + torch.sum(model.fc3.weight == 0)
        )
        / float(
            model.conv1.weight.nelement()
            + model.conv2.weight.nelement()
            + model.fc1.weight.nelement()
            + model.fc2.weight.nelement()
            + model.fc3.weight.nelement()
        )
    )
)