# Build model
## 什么是pytorch中的module,pytorch提供了哪些module类型？
· module是构建神经网络的基础模块。pytorch提供了一个modules库，也支持自定义modules。用他们可以很容易地构建多层神经网络。具体实现来看，<font color=green>**namespace**</font> **torch.nn**提供了layers, containers和utilities三种主要的module类型，以及tensor类型的nn.Parameter作为modules parameter。
1. <font color=lightblue>**Layers：**</font>NN通过layers对数据进行操作。pytorch用modules来表达这些layers,比如conv, affine, pooling, normalization, transformer和loss functions等
2. <font color=lightblue>**containers：**</font>有3类container，nn.Module，nn.Sequential和holders of submodules。
(1)**torch.nn.Module**。它是所有NN modules的base class，pytorch中所有的module都是**nn.Module**的子类\
(2)**torch.nn.Sequential**：以序列形式将1个或多个module顺序排列，体现了module的nestable\
(3)holders of submodules,其中：**nn.ModuleList，nn.ModuleDict**分别是以list和dictionary类型存储的module序列。**nn.ParamterList和nn.ParameterDict**分别是以list和dictionary形式存储的参数。
3. <font color=lightblue>**utilities：**</font>把一些数据处理的函数以modules的形式表达。<font color=red>【具体待使用后描述？？？】</font>

## module的特点
1. module和autograd system一起工作：modules使optimizer update参数非常方便<font color=red>【理解？？？】</font>
2. pytorch中的module可以nest：每个神经网络模型自身都是一个module，该module又由其他modules(layers)构成。这种nest structure可以很方便的构造复杂的网络架构。<font color=red>【理解？？？】</font>
3. **nn.Module**的子类会自动track参数，可以用两个method来查看：parameters()和named_parameters()
4. 很容易与Transform配合使用：modules的save和restore都很直接，在CPU/GPU之间移动，做prune，quantize和其他很多操作都很方便

In [1]:
import os
import torch
import torch.nn as nn           # for torch.nn.Module
import torch.nn.functional as F # for the activation function
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = ("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device} device")

Using cuda device


In [11]:
m = MyLinear(4, 3)

## 定义一个NN
1. 自定义model也得定义为**nn.Module**的子类
2. 每个子类都必须定义\__init__和\__forward__两个method。模型对input data的操作都放在\__forward__中

In [2]:
## 自定义一个简单的Module
class MyLinear(nn.Module):  # 必须是nn.Module的子类
  def __init__(self, in_features, out_features):
    super().__init__()

    # 参数定义成nn.Parameter的实例，此时autograd会自动tracking并让optimizer在迭代时update
    self.weight = nn.Parameter(torch.randn(in_features, out_features))
    self.bias = nn.Parameter(torch.randn(out_features))

  # 定义forward函数，指定要执行的computation，用的operation是nn.autograd.Function的子类的实例
  # 这些nn.autograd.Function的子类可以是pytorch已经定义好的，也可以是自定义的。
  # nn.autograd.Function的子类都定义好了forward()和backward() method
  # 执行这些operation时，autograd system会自动处理backward pass，因此不用再手动定义backward
  def forward(self, input):
    return input @ self.weight + self.bias

In [3]:
# 自定义NN
class RKNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

In [4]:
# 创建自定义NN的实例
model = RKNet().to(device)  # model要建在gpu上
print(model)                # 打印model的structure

RKNet(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [5]:
X = torch.rand(1, 28, 28, device=device)
scores = model(X)

prob = nn.Softmax(dim=1)(scores) # dim决定softmax求解的维度
y_pred = prob.argmax(1)
print(f'predict class:{y_pred}')

predict class:tensor([3], device='cuda:0')


## 典型layers

### nn.Flatten
1. 参数：torch.nn.Flatten(start_dim=1, end_dim=-1)
2. 压缩[start_dim, end_dim]范围的dims
2. 默认将输入的data压成2维数据，保留原第一维，压缩剩下的维度，比如输出(N, D)

In [6]:
input_image = torch.rand(3,28,28)
print(input_image.size())

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

flatten2 = nn.Flatten(0, 1)  # 压缩[0, 1]范围的dims
flat_image2 = flatten2(input_image)
print(flat_image2.size())

torch.Size([3, 28, 28])
torch.Size([3, 784])
torch.Size([84, 28])


### nn.Linear
1. affine layer
2. 参数：torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
   · in_features (int) – size of each input sample
   · out_features (int) – size of each output sample
   · bias (bool)取False时, 就不会learn bias. Default: True

In [7]:
layer1 = nn.Linear(in_features=28*28, out_features=6)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 6])


### nn.ReLU

In [8]:
print(f"Before ReLU:\n {hidden1}\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU:\n {hidden1}")

Before ReLU:
 tensor([[ 0.1400,  0.3509,  0.1541, -0.2322, -0.1627,  0.6068],
        [-0.1102,  0.2422,  0.0933, -0.0575,  0.2145,  0.6366],
        [ 0.1104,  0.0492, -0.2537,  0.0920, -0.2633,  0.5453]],
       grad_fn=<AddmmBackward0>)

After ReLU:
 tensor([[0.1400, 0.3509, 0.1541, 0.0000, 0.0000, 0.6068],
        [0.0000, 0.2422, 0.0933, 0.0000, 0.2145, 0.6366],
        [0.1104, 0.0492, 0.0000, 0.0920, 0.0000, 0.5453]],
       grad_fn=<ReluBackward0>)


### nn.Sequential
1. an ordered container of modules.
2. 数据会按照Sequential中定义的layer顺序做处理

In [9]:
seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(6, 10)
)
input_image = torch.rand(3,28,28)
scores = seq_modules(input_image)

softmax = nn.Softmax(dim=1)
pred_probab = softmax(scores)

## 模型参数
1. NN中的一些layers有参数，比如有的layers在training后都有weights和bias
2. 把model定义为nn.Module的子类后，nn.Module能自动track所有model object中定义的fields，而参数可以通过model的parameters()和named_parameters()两种method来获取。

In [10]:
print(f"Model structure: {model}\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure: RKNet(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[ 0.0321, -0.0283,  0.0348,  ...,  0.0050,  0.0042,  0.0318],
        [-0.0351, -0.0190, -0.0015,  ..., -0.0355, -0.0275,  0.0003]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([ 0.0079, -0.0152], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0342, -0.0194,  0.0037,  ..., -0.0280,  0.0307,  0.0070],
        [-0.0222,  0.0056, -0.0250,  ..., -0.0107,  0.0178, -0.0075]],
       device='cuda:0', grad_fn=<SliceBackw