# 模型剪枝
* 模型剪枝，其实是一种从神经网络中移除"不必要"权重或偏差（weigths/bias）的模型压缩技术。

## 非结构化剪枝
* 非结构化剪枝（Unstructured Puning）是指修剪参数的单个元素，比如全连接层中的单个权重、卷积层中的单个卷积核参数元素，剪枝权重对象是随机的，没有特定结构，因此被称为非结构化剪枝。

## 结构化剪枝
* 与非结构化剪枝相反，结构化剪枝会剪枝整个参数结构。比如，丢弃整行或整列的权重，或者在卷积层中丢弃整个过滤器（Filter）。

In [33]:
# PyTorch 的剪枝: 局部/全局
# def random_unstructured(module, name, amount)   
import torch  
import torch.nn as nn
import torch.nn.utils.prune as prune  

In [34]:
conv = torch.nn.Conv2d(1, 1, 4)  
prune.random_unstructured(conv, name="weight", amount=0.5)  # amount 0~1的小数
conv.weight  

tensor([[[[-0.2027,  0.1981, -0.1830, -0.0000],
          [-0.1800,  0.0947, -0.0000,  0.0000],
          [ 0.0759,  0.1382, -0.0000,  0.0000],
          [ 0.0000, -0.0000, -0.0000,  0.0378]]]], grad_fn=<MulBackward0>)

In [None]:
# def random_structured(module, name, amount, dim)  
# def ln_structured(module, name, amount, n, dim, importance_scores=None)  
# prune.ln_structured通过沿着具有最低 L范数的指定通道移除指定的(当前未修剪的)通道,修剪与调用name的参数相对应的张量
# n 表示剪枝的范数，dim 表示剪枝的维度。

In [None]:
# 对于 torch.nn.Linear：
# dim = 0：移除一个神经元。
# dim = 1：移除与一个输入的所有连接。

# 对于 torch.nn.Conv2d：
# dim = 0(Channels) : 通道 channels 剪枝/过滤器 filters 剪枝
# dim = 1（Neurons）: 二维卷积核 kernel 剪枝，即与输入通道相连接的 kernel

In [None]:
# class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)  

In [29]:
conv = torch.nn.Conv2d(2, 3, 3)  
norm1 = torch.norm(conv.weight, p=2, dim=[0,2,3])  
# print(conv.weight.shape) 
print(norm1)

tensor([0.7975, 0.7007], grad_fn=<NormBackward1>)


In [30]:
prune.ln_structured(conv, name="weight", amount=1, n=2, dim=1)  
print(conv.weight)  

tensor([[[[-0.1528, -0.0951, -0.2319],
          [-0.2225, -0.1376,  0.1903],
          [ 0.1235,  0.1266, -0.0169]],

         [[ 0.0000,  0.0000, -0.0000],
          [-0.0000, -0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]]],


        [[[ 0.0142,  0.0704, -0.1438],
          [ 0.1394,  0.1435,  0.2115],
          [ 0.2095, -0.1222, -0.1499]],

         [[-0.0000, -0.0000, -0.0000],
          [ 0.0000, -0.0000,  0.0000],
          [ 0.0000,  0.0000,  0.0000]]],


        [[[ 0.1998, -0.2259, -0.1784],
          [-0.0284,  0.0420,  0.2357],
          [ 0.1729, -0.1041,  0.0025]],

         [[ 0.0000, -0.0000, -0.0000],
          [ 0.0000,  0.0000, -0.0000],
          [ 0.0000,  0.0000,  0.0000]]]], grad_fn=<MulBackward0>)


In [42]:
# 全局非结构化剪枝
# def global_unstructured(parameters, pruning_method, **kwargs)  
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  
  
class LeNet(nn.Module):  
    def __init__(self):  
        super(LeNet, self).__init__()  
        # 1 input image channel, 6 output channels, 3x3 square conv kernel  
        self.conv1 = nn.Conv2d(1, 6, 3)  
        self.conv2 = nn.Conv2d(6, 16, 3)  
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5x5 image dimension  
        self.fc2 = nn.Linear(120, 84)  
        self.fc3 = nn.Linear(84, 10)  
  
    def forward(self, x):  
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))  
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)  
        x = x.view(-1, int(x.nelement() / x.shape[0]))  
        x = F.relu(self.fc1(x))  
        x = F.relu(self.fc2(x))  
        x = self.fc3(x)  
        return x  
    
model = LeNet().to(device=device)  
print(model)

LeNet(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [43]:
parameters_to_prune = (  
    (model.conv1, 'weight'),  
    (model.conv2, 'weight'),  
    (model.fc1, 'weight'),  
    (model.fc2, 'weight'),  
    (model.fc3, 'weight'),  
)  
  
prune.global_unstructured(  
    parameters_to_prune,  
    pruning_method=prune.L1Unstructured,  
    amount=0.2,  
)  
# 计算卷积层和整个模型的稀疏度  
# 其实调用的是 Tensor.numel 函数，返回输入张量中元素的总数  
print(  
    "Sparsity in conv1.weight: {:.2f}%".format(  
        100. * float(torch.sum(model.conv1.weight == 0))  
        / float(model.conv1.weight.nelement())  
    )  
)  
print(  
    "Global sparsity: {:.2f}%".format(  
        100. * float(  
            torch.sum(model.conv1.weight == 0)  
            + torch.sum(model.conv2.weight == 0)  
            + torch.sum(model.fc1.weight == 0)  
            + torch.sum(model.fc2.weight == 0)  
            + torch.sum(model.fc3.weight == 0)  
        )  
        / float(  
            model.conv1.weight.nelement()  
            + model.conv2.weight.nelement()  
            + model.fc1.weight.nelement()  
            + model.fc2.weight.nelement()  
            + model.fc3.weight.nelement()  
        )  
    )  
)  

Sparsity in conv1.weight: 7.41%
Global sparsity: 20.00%


In [None]:
# 前文的 local 剪枝的对象是特定网络层，而 global 剪枝是将模型看作一个整体去移除指定比例（数量）的参数，
# 同时 global 剪枝结果会导致模型中每层的稀疏比例是不一样的。
# 运行结果表明，虽然模型整体（全局）的稀疏度是 20%，但每个网络层的稀疏度不一定是 20%。

In [None]:
# torch.nn.utils.prune.is_pruned(module)
# torch.nn.utils.prune.remove(module, name)