# 4-3 nn.functional和nn.Module

## 一、nn.functional和nn.Module

前面介绍了Pytorch的低阶API, 可以利用这些API构建出神经网络相关的组件。

而Pytorch和神经网络相关的功能组件大多都封装在torch.nn模块下。

这些功能组件的绝大部分既有函数形式实现, 也有类形式实现。

其中nn.functional有各种功能的函数实现。

激活函数：
* F.relu
* F.sigmoid
* F.tanh
* F.softmax

模型层:
* F.linear
* F.conv2d
* F.max_pool2d
* F.dropout2d
* F.embedding

损失函数:
* F.binary_cross_entropy
* F.mse_loss
* F.cross_entropy

为了方便管理, 一般通过继承nn.Module转换成类的实现形式, 并直接封装在nn模块之间。

激活函数:
* nn.ReLU
* nn.Sigmoid
* nn.Tanh
* nn.Softmax

模型层:
* nn.Linear
* nn.Conv2d
* nn.MaxPool2d
* nn.Dropout2d
* nn.Embedding

损失函数:
* nn.BCELoss
* nn.MSELoss
* nn.CrossEntropyLoss

## 二、使用nn.Module来管理参数
在Pytorch中, 模型的参数是需要被优化器训练的, 因此, 通常需要设置参数为requires_grad=True的张量。

同时, 在一个模型中, 往往有许多参数, 要手动管理这些参数并不是一个统一的事情。

Pytorch一般讲参数用nn.Parameter来表示

In [1]:
import torch
from torch import nn
import torch.nn.functional as F
from matplotlib import pyplot as plt

In [2]:
w = nn.Parameter(torch.randn(2, 2))
print(w)
print(w.requires_grad)

Parameter containing:
tensor([[ 0.3199,  0.3305],
        [-0.5942,  0.0669]], requires_grad=True)
True


In [3]:
# nn.ParameterList可以将多个nn.Parameter组成一个列表
params_list = nn.ParameterList([nn.Parameter(torch.rand(8, i)) for i in range(1, 3)])
print(params_list)
print(params_list[0].requires_grad)

ParameterList(
    (0): Parameter containing: [torch.FloatTensor of size 8x1]
    (1): Parameter containing: [torch.FloatTensor of size 8x2]
)
True


In [4]:
# nn.ParameterDict可以将多个nn.Parameter组成一个字典
params_dict = nn.ParameterDict({"a": nn.Parameter(torch.rand(2, 2)), "b": nn.Parameter(torch.zeros(2))})
print(params_dict)
print(params_dict["a"].requires_grad)

ParameterDict(
    (a): Parameter containing: [torch.FloatTensor of size 2x2]
    (b): Parameter containing: [torch.FloatTensor of size 2]
)
True


In [5]:
# 可以用Module把它们管理起来
module = nn.Module()
module.w = w
module.params_list = params_list
module.params_dict = params_dict

num_param = 0
for param in module.parameters():
    print(param, "\n")
    num_param += 1
print("number of Parameters = ", num_param)

Parameter containing:
tensor([[ 0.3199,  0.3305],
        [-0.5942,  0.0669]], requires_grad=True) 

Parameter containing:
tensor([[0.8650],
        [0.6233],
        [0.4305],
        [0.7132],
        [0.7377],
        [0.5824],
        [0.9581],
        [0.9803]], requires_grad=True) 

Parameter containing:
tensor([[0.9732, 0.6801],
        [0.7689, 0.0200],
        [0.5271, 0.5367],
        [0.0412, 0.6533],
        [0.6310, 0.8141],
        [0.1876, 0.5978],
        [0.3673, 0.5978],
        [0.7242, 0.0620]], requires_grad=True) 

Parameter containing:
tensor([[0.2918, 0.9623],
        [0.5477, 0.3819]], requires_grad=True) 

Parameter containing:
tensor([0., 0.], requires_grad=True) 

number of Parameters =  5


In [6]:
# 实践当中, 一般通过继承nn.Module来构建模块类, 并将所有需要学习的参数的部分放在构造函数中。

# 以下为Pytorch中nn.Linear的源码简化版本

class Linear(nn.Module):
    __constants__ = ["in_features", "out_features"]

    def __init__(self, in_features, out_features, bias=True):
        super().__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter("bias", None)

    def forward(self, x):
        return F.linear(x, self.weight, self.bias)

## 三、使用nn.Module来管理子模块
一般情况下, 我们很少直接使用nn.Parameter来定义参数构建模型, 而是通过一些拼装一些常用模型来构造模型。

这些模型也是继承自nn.Module的对象, 本身也包括参数, 属于我们定义的模块的子模块。

nn.Module提供了一下方法可以来管理这些模块：
* children()方法, 返回生成器, 包括模块下所有的子模块
* named_children(), 包括模块下所有的子模块, 以及他们的名字
* modules(), 包括模块下的所有各个层级的模块, 包括模块本身
* named_modules(), 包括各个层级的模块以及它们的名字。

In [7]:
class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        
        self.embedding = nn.Embedding(num_embeddings = 10000,embedding_dim = 3,padding_idx = 1)
        self.conv = nn.Sequential()
        self.conv.add_module("conv_1",nn.Conv1d(in_channels = 3,out_channels = 16,kernel_size = 5))
        self.conv.add_module("pool_1",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_1",nn.ReLU())
        self.conv.add_module("conv_2",nn.Conv1d(in_channels = 16,out_channels = 128,kernel_size = 2))
        self.conv.add_module("pool_2",nn.MaxPool1d(kernel_size = 2))
        self.conv.add_module("relu_2",nn.ReLU())
        
        self.dense = nn.Sequential()
        self.dense.add_module("flatten",nn.Flatten())
        self.dense.add_module("linear",nn.Linear(6144,1))
        self.dense.add_module("sigmoid",nn.Sigmoid())
        
    def forward(self,x):
        x = self.embedding(x).transpose(1,2)
        x = self.conv(x)
        y = self.dense(x)
        return y

In [9]:
net = Net()
for child in net.children():
    print(child, "\n")

Embedding(10000, 3, padding_idx=1) 

Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

Sequential(
  (flatten): Flatten()
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
) 



In [10]:
for name, child in net.named_children():
    print(name, ":", child, "\n")

embedding : Embedding(10000, 3, padding_idx=1) 

conv : Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

dense : Sequential(
  (flatten): Flatten()
  (linear): Linear(in_features=6144, out_features=1, bias=True)
  (sigmoid): Sigmoid()
) 



In [11]:
for module in net.modules():
    print(module, "\n")

Net(
  (embedding): Embedding(10000, 3, padding_idx=1)
  (conv): Sequential(
    (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
    (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_1): ReLU()
    (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
    (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (relu_2): ReLU()
  )
  (dense): Sequential(
    (flatten): Flatten()
    (linear): Linear(in_features=6144, out_features=1, bias=True)
    (sigmoid): Sigmoid()
  )
) 

Embedding(10000, 3, padding_idx=1) 

Sequential(
  (conv_1): Conv1d(3, 16, kernel_size=(5,), stride=(1,))
  (pool_1): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_1): ReLU()
  (conv_2): Conv1d(16, 128, kernel_size=(2,), stride=(1,))
  (pool_2): MaxPool1d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (relu_2): ReLU()
) 

Conv1d(3, 16, kernel_size=(5,), stride=(1,)) 

MaxP

In [12]:
children_dict = {name: module for name, module in net.named_children()}

embedding = children_dict["embedding"]
embedding.requires_grad_(False)  # 冻结参数

Embedding(10000, 3, padding_idx=1)

In [13]:
#可以看到其第一层的参数已经不可以被训练了。
for param in embedding.parameters():
    print(param.requires_grad)
    print(param.numel())

False
30000


In [15]:
from torchkeras import summary
summary(net, input_shape=(200,), input_dtype=torch.LongTensor)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
         Embedding-1               [-1, 200, 3]          30,000
            Conv1d-2              [-1, 16, 196]             256
         MaxPool1d-3               [-1, 16, 98]               0
              ReLU-4               [-1, 16, 98]               0
            Conv1d-5              [-1, 128, 97]           4,224
         MaxPool1d-6              [-1, 128, 48]               0
              ReLU-7              [-1, 128, 48]               0
           Flatten-8                 [-1, 6144]               0
            Linear-9                    [-1, 1]           6,145
          Sigmoid-10                    [-1, 1]               0
Total params: 40,625
Trainable params: 10,625
Non-trainable params: 30,000
----------------------------------------------------------------
Input size (MB): 0.000763
Forward/backward pass size (MB): 0.287796
Params size (MB): 0.154