# PyTorch入门

## 检查torch

In [100]:
pip show torch

Name: torch
Version: 2.3.1+cu118
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: d:\anaconda3\envs\yolov8\lib\site-packages
Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions
Required-by: thop, torchaudio, torchvision, ultralytics, ultralytics-thop
Note: you may need to restart the kernel to use updated packages.


In [103]:
#拓展会用到的, 可以暂时不管
!pip show einops tqdm

Name: einops
Version: 0.8.1
Summary: A new flavour of deep learning operations
Home-page: 
Author: Alex Rogozhnikov
Author-email: 
License: MIT
Location: d:\anaconda3\envs\yolov8\lib\site-packages
Requires: 
Required-by: 
---
Name: tqdm
Version: 4.66.4
Summary: Fast, Extensible Progress Meter
Home-page: 
Author: 
Author-email: 
License: MPL-2.0 AND MIT
Location: d:\anaconda3\envs\yolov8\lib\site-packages
Requires: colorama
Required-by: ultralytics


按M切换markdown，按Y切换Code    
按D+D删除一个块     
按tab补全    
按Shift+tab显示函数用法
按Shift+Enter运行当前块

In [2]:
# 导入torch包
import torch

## 矩阵/张量

### 构造矩阵/张量
构造一个5x3矩阵，不初始化 **torch.empty(5, 3)**

In [21]:
x = torch.empty(5, 3)
print(x)

tensor([[5.8062e+25, 1.5120e-42, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])


构造一个随机初始化的5x3矩阵 **torch.rand(5, 3)**

In [22]:
x = torch.rand(5, 3)
print(x)

tensor([[0.4846, 0.9098, 0.4612],
        [0.9075, 0.4201, 0.8777],
        [0.8954, 0.2958, 0.1461],
        [0.4384, 0.0722, 0.1780],
        [0.4325, 0.7753, 0.2879]])


构造一个全0的5x3矩阵，指定数据类型是long **torch.zeros(5, 3, dtype=torch.long)**
默认为float32

In [23]:
x = torch.zeros(5, 3, dtype=torch.long)
x, x.dtype

(tensor([[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]]),
 torch.int64)

构造一个张量 **torch.tensor()** 

In [28]:
x = torch.tensor([5.5, 3])
y = torch.tensor(5.55)
z = torch.tensor([[1,3],[5,7]])
x, y, z

(tensor([5.5000, 3.0000]),
 tensor(5.5500),
 tensor([[1, 3],
         [5, 7]]))

In [29]:
x.size(), y.size(), z.size()

(torch.Size([2]), torch.Size([]), torch.Size([2, 2]))

### 矩阵/张量 加减乘除
add substract multiply divide remainder取余

In [7]:
x = torch.rand(5, 3)
y = torch.rand(5, 3)
torch.add(x,y), x+y, y.add(x) #三种等价

(tensor([[0.5771, 1.0919, 1.6160],
         [0.6953, 0.6703, 0.4160],
         [1.2183, 1.5590, 1.3567],
         [0.7859, 0.8927, 0.9204],
         [1.8256, 1.6945, 0.5293]]),
 tensor([[0.5771, 1.0919, 1.6160],
         [0.6953, 0.6703, 0.4160],
         [1.2183, 1.5590, 1.3567],
         [0.7859, 0.8927, 0.9204],
         [1.8256, 1.6945, 0.5293]]),
 tensor([[0.5771, 1.0919, 1.6160],
         [0.6953, 0.6703, 0.4160],
         [1.2183, 1.5590, 1.3567],
         [0.7859, 0.8927, 0.9204],
         [1.8256, 1.6945, 0.5293]]))

索引操作

In [8]:
x[:, 1]

tensor([0.4528, 0.6357, 0.7651, 0.3312, 0.9078])

改变大小：如果你想改变一个 tensor 的大小或者形状，你可以使用 **torch.view**:

In [10]:
x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
x.size(), y.size(), z.size()

(torch.Size([4, 4]), torch.Size([16]), torch.Size([2, 8]))

使用 .item() 来获得tensor的value 

In [17]:
x = torch.randn(1)
x, x.dtype, x.item(), type(x.item())

(tensor([1.8441]), torch.float32, 1.8440992832183838, float)

### 拓展： Einops
**rearrange**：张量维度重排 / 拆分 / 合并

In [26]:
# pip install einops

In [27]:
import einops

In [37]:
x = torch.randint(1, 10, (2, 4, 3)) 
# 示例1：维度转置（深度学习常用）
x1 = einops.rearrange(x, 'b h c -> b c h')
# 示例2：拆分维度（将height拆分为2段）
x2 = einops.rearrange(x, 'b (h1 h2) c -> b h1 h2 c', h1=2)
# 示例3：合并维度（合并batch和height）
x3 = einops.rearrange(x, 'b h c -> (b h) c')
x, x1, x2, x3, x.size(), x1.size(), x2.size(), x3.size()

(tensor([[[9, 9, 3],
          [2, 2, 7],
          [2, 6, 5],
          [3, 2, 8]],
 
         [[9, 3, 6],
          [4, 6, 2],
          [9, 9, 9],
          [9, 9, 3]]]),
 tensor([[[9, 2, 2, 3],
          [9, 2, 6, 2],
          [3, 7, 5, 8]],
 
         [[9, 4, 9, 9],
          [3, 6, 9, 9],
          [6, 2, 9, 3]]]),
 tensor([[[[9, 9, 3],
           [2, 2, 7]],
 
          [[2, 6, 5],
           [3, 2, 8]]],
 
 
         [[[9, 3, 6],
           [4, 6, 2]],
 
          [[9, 9, 9],
           [9, 9, 3]]]]),
 tensor([[9, 9, 3],
         [2, 2, 7],
         [2, 6, 5],
         [3, 2, 8],
         [9, 3, 6],
         [4, 6, 2],
         [9, 9, 9],
         [9, 9, 3]]),
 torch.Size([2, 4, 3]),
 torch.Size([2, 3, 4]),
 torch.Size([2, 2, 2, 3]),
 torch.Size([8, 3]))

**reduce**：降维运算（求和 / 均值 / 最值等）

In [31]:
x = torch.randint(1, 10, (2, 3, 4))#1-10区间
# 示例1：全局平均池化（NHWC → NC）
x1 = einops.reduce(x, 'b h c -> b c', reduction='max')
# 示例2：求和运算（可省略reduction，默认'sum'）
x2 = einops.reduce(x, 'b h c -> h c', reduction='sum')
x, x1, x2, x.size(), x1.size(), x2.size()

(tensor([[[4, 1, 2, 7],
          [6, 7, 7, 7],
          [1, 4, 8, 2]],
 
         [[3, 4, 5, 4],
          [8, 1, 8, 5],
          [1, 8, 2, 8]]]),
 tensor([[6, 7, 8, 7],
         [8, 8, 8, 8]]),
 tensor([[ 7,  5,  7, 11],
         [14,  8, 15, 12],
         [ 2, 12, 10, 10]]),
 torch.Size([2, 3, 4]),
 torch.Size([2, 4]),
 torch.Size([3, 4]))

In [32]:
x = torch.rand((2, 3, 4))
# 示例3：按维度求最大值（保留height维度）
x3 = einops.reduce(x, 'b h c -> h c', reduction='mean')
x, x3, x.size(), x3.size()

(tensor([[[0.9435, 0.9011, 0.6886, 0.5216],
          [0.4661, 0.2249, 0.7914, 0.5291],
          [0.4664, 0.9044, 0.0911, 0.9349]],
 
         [[0.0037, 0.8471, 0.5063, 0.7578],
          [0.9693, 0.8180, 0.1576, 0.9940],
          [0.0889, 0.1091, 0.1799, 0.2930]]]),
 tensor([[0.4736, 0.8741, 0.5975, 0.6397],
         [0.7177, 0.5214, 0.4745, 0.7615],
         [0.2776, 0.5068, 0.1355, 0.6140]]),
 torch.Size([2, 3, 4]),
 torch.Size([3, 4]))

**repeat**：维度扩展（复制张量填充新维度）

In [41]:
x = torch.randint(1, 10, (2, 3, 2))  # [h,w,c]
# 示例1：单张图片扩展为batch维度
x1 = einops.repeat(x, 'h w c -> b h w c', b=1) 
# 示例2：在height维度复制2次
x2 = einops.repeat(x, 'h w c -> h (repeat w) c', repeat=2)
x, x1, x2, x.size(), x1.size(), x2.size()

(tensor([[[8, 4],
          [2, 3],
          [6, 4]],
 
         [[7, 4],
          [2, 6],
          [3, 9]]]),
 tensor([[[[8, 4],
           [2, 3],
           [6, 4]],
 
          [[7, 4],
           [2, 6],
           [3, 9]]]]),
 tensor([[[8, 4],
          [2, 3],
          [6, 4],
          [8, 4],
          [2, 3],
          [6, 4]],
 
         [[7, 4],
          [2, 6],
          [3, 9],
          [7, 4],
          [2, 6],
          [3, 9]]]),
 torch.Size([2, 3, 2]),
 torch.Size([1, 2, 3, 2]),
 torch.Size([2, 6, 2]))

## 自动微分

**autograd** 包是 PyTorch 中所有神经网络的核心。

### 创建张量表达式

In [59]:
import torch

创建一个张量，设置 requires_grad=True 来跟踪与它相关的计算

In [60]:
x = torch.ones(2, 2, requires_grad=True)
y = torch.ones(2, 2)
y.requires_grad_(True)
x, y

(tensor([[1., 1.],
         [1., 1.]], requires_grad=True),
 tensor([[1., 1.],
         [1., 1.]], requires_grad=True))

针对张量做一个操作

In [61]:
y = x + 1
w = x*x
y, w

(tensor([[2., 2.],
         [2., 2.]], grad_fn=<AddBackward0>),
 tensor([[1., 1.],
         [1., 1.]], grad_fn=<MulBackward0>))

In [62]:
z = 2*y*y
out = z.mean()
z, out

(tensor([[8., 8.],
         [8., 8.]], grad_fn=<MulBackward0>),
 tensor(8., grad_fn=<MeanBackward0>))

### 梯度，反向传播

In [66]:
x = torch.ones(2, 2, requires_grad=True)
y = x+1
z = y*y*2
out = z.sum()
out.backward()
out, z.grad, y.grad, x.grad
#PyTorch 默认只会保留「叶子张量（leaf Tensor）」的梯度,所以第二个第三个输出为None

  out, z.grad, y.grad, x.grad


(tensor(32., grad_fn=<SumBackward0>),
 None,
 None,
 tensor([[8., 8.],
         [8., 8.]]))

In [68]:
x = torch.ones(2, 2, requires_grad=True)#1 x 4y x 1（y=x+1）
y = x+1
y.retain_grad()# 强制保留y的梯度:1 x 4y
z = y*y*2
z.retain_grad()# 强制保留z的梯度：1
out = z.sum()
out.backward()
out, z.grad, y.grad, x.grad

(tensor(32., grad_fn=<SumBackward0>),
 tensor([[1., 1.],
         [1., 1.]]),
 tensor([[8., 8.],
         [8., 8.]]),
 tensor([[8., 8.],
         [8., 8.]]))

非标量求梯度方法

In [76]:
x = torch.ones(3, requires_grad=True)
y = x * 2
z = y * 3
v = torch.tensor([0.1, 1, 0.001], dtype=torch.float)
out = z.backward(v)
x.grad

tensor([0.6000, 6.0000, 0.0060])

通过将代码包裹在 with torch.no_grad()，来停止对从跟踪历史中 的 .requires_grad=True 的张量自动求导。

In [77]:
with torch.no_grad():
    print((x**2).requires_grad)

False


## 神经网络

### 神经网络的经典结构层

In [20]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

**卷积层**： 一维卷积、二维卷积、三维卷积

**必须参数**：输入通道数、输出通道数、卷积核大小

In [91]:
in_channels = 1
out_channels = 5
kernel_size = 3  #等效 (3, 3) 或者(3, 3, 3)
a = nn.Conv1d(in_channels, out_channels, kernel_size)
b = nn.Conv2d(in_channels, out_channels, kernel_size)
c = nn.Conv3d(in_channels, out_channels, kernel_size)
d = nn.ConvTranspose2d(in_channels, out_channels, kernel_size)  #转置卷积
a, b, c, d

(Conv1d(1, 5, kernel_size=(3,), stride=(1,)),
 Conv2d(1, 5, kernel_size=(3, 3), stride=(1, 1)),
 Conv3d(1, 5, kernel_size=(3, 3, 3), stride=(1, 1, 1)),
 ConvTranspose2d(1, 5, kernel_size=(3, 3), stride=(1, 1)))

**全连接层**

**必须参数**: 输入特征数、输出特征数

In [87]:
in_features = 100
out_features = 10
a = nn.Linear(in_features, out_features)
a

Linear(in_features=100, out_features=10, bias=True)

**激活层**

**必须参数**: 基本无，也可以传入一个网络层如conv2d

In [132]:
nn.ReLU(),    #最常用的 ReLU 激活，修正线性单元
nn.Softmax(), #多分类任务输出层，将输出转为概率分布
nn.Sigmoid(), #输出 0~1，多用于二分类输出层
nn.GELU(),    #BERT 等 Transformer 模型的核心激活函数
nn.SiLU()     #结合了ReLU和Sigmoid

x = torch.randn(2, 3)
a = F.relu(x)
b = F.softmax(x, dim=1)
c = F.sigmoid(x)
d = F.silu(x)
x, a, b, c, d

(tensor([[-1.0358, -3.0907,  0.7388],
         [-0.4331, -0.2649,  0.0335]]),
 tensor([[0.0000, 0.0000, 0.7388],
         [0.0000, 0.0000, 0.0335]]),
 tensor([[0.1423, 0.0182, 0.8394],
         [0.2647, 0.3132, 0.4221]]),
 tensor([[0.2620, 0.0435, 0.6767],
         [0.3934, 0.4342, 0.5084]]),
 tensor([[-0.2713, -0.1344,  0.4999],
         [-0.1704, -0.1150,  0.0170]]))

**池化层**: 最大池化（下采样）、平均池化、自适应平均/最大池化、逆最大池化（上采样）

**必须参数**: 池化核尺寸 或者 目标输出尺寸（自适应的用）

In [94]:
kernel_size = 3
a = nn.MaxPool2d(kernel_size)
b = nn.AvgPool2d(kernel_size)
c = nn.AdaptiveAvgPool2d(kernel_size)
d = nn.MaxUnpool2d(kernel_size)
a, b, c, d

(MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False),
 AvgPool2d(kernel_size=3, stride=3, padding=0),
 AdaptiveAvgPool2d(output_size=3),
 MaxUnpool2d(kernel_size=(3, 3), stride=(3, 3), padding=(0, 0)))

In [144]:
x = torch.randint(1, 10, (5,5)).float()
kernel_size = 3
a, indices = F.max_pool1d(x, kernel_size, stride=1, return_indices=True)
b = F.max_unpool1d(a, indices, kernel_size, stride=1)
x, a, b

(tensor([[7., 9., 7., 8., 4.],
         [4., 9., 8., 3., 3.],
         [7., 6., 4., 5., 9.],
         [1., 9., 7., 6., 5.],
         [3., 8., 4., 1., 2.]]),
 tensor([[9., 9., 8.],
         [9., 9., 8.],
         [7., 6., 9.],
         [9., 9., 7.],
         [8., 8., 4.]]),
 tensor([[0., 9., 0., 8., 0.],
         [0., 9., 8., 0., 0.],
         [7., 6., 0., 0., 9.],
         [0., 9., 7., 0., 0.],
         [0., 8., 4., 0., 0.]]))

**归一化层**：加速模型训练、缓解梯度消失，提升稳定性：批量归一化、层归一化、分组归一化、

**必须参数**：通道数或特征数

In [96]:
a = nn.BatchNorm2d(num_features=10)
b = nn.LayerNorm(normalized_shape=[10, 10]) #归一化的维度形状
c = nn.GroupNorm(num_groups=2, num_channels=10)#分组数 和 输入通道数
a, b, c

(BatchNorm2d(10, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
 LayerNorm((10, 10), eps=1e-05, elementwise_affine=True),
 GroupNorm(2, 10, eps=1e-05, affine=True))

**Dropout 层（正则化）**:

**必须参数**：神经元丢弃概率

In [97]:
nn.Dropout(p = 0.5)#默认为0.5

Dropout(p=0.5, inplace=False)

**展平层**: 展平特征成一维

**必须参数**：无

In [99]:
nn.Flatten()

Flatten(start_dim=1, end_dim=-1)

**注意力机制层**：Transformer相关

完整Transformer     d_model 模型维度默认512， nhead注意力头数默认8

In [106]:
nn.Transformer() 

Transformer(
  (encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-5): 6 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
        )
        (linear1): Linear(in_features=512, out_features=2048, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=2048, out_features=512, bias=True)
        (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
    (norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (layers): ModuleList(
      (0-5): 6 x TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=512, o

多头自注意力层

In [104]:
nn.MultiheadAttention(embed_dim=512, num_heads=8)#输入嵌入维度 和 注意力头数（需能被 embed_dim 整除）

MultiheadAttention(
  (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
)

编码器层 和 编码器模块

In [112]:
a = nn.TransformerEncoderLayer(d_model=512, nhead=8)
b = nn.TransformerEncoder(num_layers=10, encoder_layer=a) # 10层
a, b

(TransformerEncoderLayer(
   (self_attn): MultiheadAttention(
     (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
   )
   (linear1): Linear(in_features=512, out_features=2048, bias=True)
   (dropout): Dropout(p=0.1, inplace=False)
   (linear2): Linear(in_features=2048, out_features=512, bias=True)
   (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
   (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
   (dropout1): Dropout(p=0.1, inplace=False)
   (dropout2): Dropout(p=0.1, inplace=False)
 ),
 TransformerEncoder(
   (layers): ModuleList(
     (0-9): 10 x TransformerEncoderLayer(
       (self_attn): MultiheadAttention(
         (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
       )
       (linear1): Linear(in_features=512, out_features=2048, bias=True)
       (dropout): Dropout(p=0.1, inplace=False)
       (linear2): Linear(in_features=2048, out_features=512, bias=True)
    

解码器层 和 解码器模块

In [114]:
a = nn.TransformerDecoderLayer(d_model=512, nhead=8)
b = nn.TransformerDecoder(num_layers=5, decoder_layer=a) #5层
a, b

(TransformerDecoderLayer(
   (self_attn): MultiheadAttention(
     (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
   )
   (multihead_attn): MultiheadAttention(
     (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True)
   )
   (linear1): Linear(in_features=512, out_features=2048, bias=True)
   (dropout): Dropout(p=0.1, inplace=False)
   (linear2): Linear(in_features=2048, out_features=512, bias=True)
   (norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
   (norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
   (norm3): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
   (dropout1): Dropout(p=0.1, inplace=False)
   (dropout2): Dropout(p=0.1, inplace=False)
   (dropout3): Dropout(p=0.1, inplace=False)
 ),
 TransformerDecoder(
   (layers): ModuleList(
     (0-4): 5 x TransformerDecoderLayer(
       (self_attn): MultiheadAttention(
         (out_proj): NonDynamicallyQuantizableLinear(

**损失函数**

In [115]:
nn.MSELoss()          #均方误差损失（回归任务）
nn.BCELoss()          #二分类交叉熵损失（二分类任务）
nn.NLLLoss()          #负对数似然损失
nn.CrossEntropyLoss() #交叉熵损失（多分类任务）

CrossEntropyLoss()

**优化器**：根据反向传播得到的 “损失函数对参数的梯度”，按照特定策略自动更新模型的可学习参数

**必须参数**：net.parameters(), 学习率lr 默认0.001

In [24]:
net = nn.Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), 
    nn.Sigmoid()
)

optim.SGD(net.parameters(), lr=0.001)                 #随机梯度下降
optim.SGD(net.parameters(), lr=0.001, momentum=0.9)   #带动量的随机梯度下降（惯性）
optim.Adam(net.parameters(), lr=0.001)                #深度学习主流，权重衰减默认0
optim.AdamW(net.parameters(), lr=0.001)               #Adam改进版，权重衰减默认0.01（正则化）
optim.RMSprop(net.parameters(), lr=0.01)              #均方根传播，用于时序任务, lr默认0.01

RMSprop (
Parameter Group 0
    alpha: 0.99
    centered: False
    differentiable: False
    eps: 1e-08
    foreach: None
    lr: 0.01
    maximize: False
    momentum: 0
    weight_decay: 0
)

**其他不常用或过时网络层**

nn.RNN  循环神经网络

nn.LSTM 长短期记忆网络
 
nn.GRU 门控循环单元

**神经网络顺序堆叠器**

In [121]:
net = nn.Sequential(      #顺序堆叠网络的结构
    nn.Conv2d(1, 6, kernel_size=5, padding=2), 
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), 
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120), 
    nn.Sigmoid(),
    nn.Linear(120, 84), 
    nn.Sigmoid(),
    nn.Linear(84, 10)
)
net

Sequential(
  (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (1): Sigmoid()
  (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (4): Sigmoid()
  (5): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (6): Flatten(start_dim=1, end_dim=-1)
  (7): Linear(in_features=400, out_features=120, bias=True)
  (8): Sigmoid()
  (9): Linear(in_features=120, out_features=84, bias=True)
  (10): Sigmoid()
  (11): Linear(in_features=84, out_features=10, bias=True)
)

### 简单神经网络测试
**一个典型的神经网络训练过程包括以下几点：**

**1.定义一个包含可训练参数的神经网络**

**2.迭代整个输入**

**3.通过神经网络处理输入**

**4.计算损失(loss)**

**5.反向传播梯度到神经网络的参数**

**6.更新网络的参数，典型的用一个简单的更新方法：weight = weight - learning_rate \*gradient**

#### **1、定义一个神经网络**

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d( F.relu( self.conv1(x) ), (2, 2) )
        x = F.max_pool2d( F.relu( self.conv2(x) ), 2 )
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu( self.fc1(x) )
        x = F.relu( self.fc2(x) )
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
net

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

获取参数量

In [13]:
params = list(net.parameters())
len(params), params[0].size() # conv1's .weight

(10, torch.Size([6, 1, 5, 5]))

#### **2、创建一个输入 3、用神经网络处理输入**

In [14]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
out, out.size()

(tensor([[-0.0276,  0.0138, -0.0138,  0.0147,  0.0652,  0.1065,  0.0360, -0.0031,
           0.0333,  0.0173]], grad_fn=<AddmmBackward0>),
 torch.Size([1, 10]))

#### **4.计算损失(loss)**

In [15]:
#创建一个目标值用于计算损失
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
#选择损失函数
criterion = nn.MSELoss()
#计算损失
loss = criterion(out, target)
loss, loss.size()#损失值是个标量

(tensor(0.4764, grad_fn=<MseLossBackward0>), torch.Size([]))

获取损失的反向传播路径

In [16]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward0 object at 0x00000212113633A0>
<AddmmBackward0 object at 0x0000021211363BE0>
<AccumulateGrad object at 0x00000212113633A0>


#### **5.反向传播梯度到神经网络的参数**

In [17]:
# 梯度归零
net.zero_grad()
# 对损失值反向传播
loss.backward()
#查看第一层的梯度（默认只保存第一层的）
print(net.conv1.bias.grad)

tensor([-0.0015, -0.0029,  0.0003, -0.0066,  0.0057,  0.0079])


#### **6.更新网络的参数**

最简单的更新规则就是随机梯度下降。

weight = weight - learning_rate * gradient

In [18]:
# 学习率，小幅度更新梯度
learning_rate = 0.1
#梯度：让损失变大的方向，因此向反方向更新梯度
#每一层的参数更新 = 每一层的参数 - 学习率*梯度
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

In [19]:
#观察参数更新后的损失变化
out = net(input)
loss2 = criterion(out, target)
loss, loss2

(tensor(0.4764, grad_fn=<MseLossBackward0>),
 tensor(0.4345, grad_fn=<MseLossBackward0>))

可以观察到损失比更新前下降了

### **总结训练步骤**

In [74]:
#自动选择GPU/CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"use device: {device}")

#初始化核心组件:神经网络、损失函数、优化器
net = Net().to(device)
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)

#训练轮次、输入（train-dataset的x）、目标（train-dataset的y）
epochs = 10
inputs = torch.randn(1, 1, 32, 32).to(device)
targets = torch.randn(10).view(1, 10).to(device)

#切换训练模式
net.train()

use device: cuda


Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [75]:
for epoch in range(epochs):
    #训练循环内
    optimizer.zero_grad()            #优化器梯度归零
    output = net(inputs)              #前向传播
    loss = criterion(output, targets) #计算损失
    loss.backward()                  #反向传播
    optimizer.step()                 #优化器更新参数
    if (epoch + 1) % 5 == 0:  # 每5轮打印一次，避免日志刷屏
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

#训练完成：可选保存模型
# torch.save(net.state_dict(), "trained_model.pth")
# print("训练完成，模型已保存为 trained_model.pth")

Epoch [5/10], Loss: 0.7424
Epoch [10/10], Loss: 0.6998


### 拓展：tqdm（训练时显示进度条）

In [76]:
#pip install tqdm

In [77]:
#可接着直接运行
from tqdm import tqdm
import time

In [78]:
def train_epochs_tqdm(epochs, net, optimizer, criterion, inputs, targets,wait_time=1):
    pbar = tqdm(range(epochs), desc="训练进度")
    for epoch in pbar:
        #训练循环内
        optimizer.zero_grad()            #优化器梯度归零
        output = net(inputs)              #前向传播
        loss = criterion(output, targets) #计算损失
        loss.backward()                  #反向传播
        optimizer.step()                 #优化器更新参数
        pbar.set_postfix({"Loss": f"{loss.item():.4f}"})
        time.sleep(wait_time) #等 1 秒

train_epochs_tqdm(epochs, net, optimizer, criterion, inputs, targets)

训练进度: 100%|██████████| 10/10 [00:10<00:00,  1.01s/it, Loss=0.5799]


### 拓展：Dataset、DataLoader

In [84]:
from torch.utils.data import Dataset, DataLoader

In [93]:
class MyDataset(Dataset):
    def __init__(self, inputs, targets):
        self.inputs = inputs
        self.targets = targets

    def __getitem__(self, index):
        input = self.inputs[index]
        target = self.targets[index]
        return input, target

    def __len__(self):
        return len(self.inputs)

#==========================其他和前面一样==========================
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"use device: {device}")
net = Net().to(device)
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)
epochs = 100
#=================================================================

#划分数据集
num_samples = 1000
inputs = torch.randn(num_samples, 1, 32, 32)
targets = torch.randn(num_samples, 10)
train_size = int(0.8*num_samples)
val_size = int(0.2*num_samples)

#train and validation
train_inputs = inputs[:train_size]
train_targets = targets[:train_size]
val_inputs = inputs[val_size:]
val_targets = targets[val_size:]

train_dataset = MyDataset(inputs=train_inputs, targets=train_targets)
val_dataset = MyDataset(inputs=val_inputs, targets=val_targets)

use device: cuda


In [94]:
import random
# 固定随机种子（保证实验可复现）
def set_seed(seed=3407):            #3407 is all you need!
    random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

In [95]:
def different_batch_zize_train(batch_size):
    #创建dataloader
    train_dataloader = DataLoader(
        dataset=train_dataset,
        batch_size=batch_size,
        shuffle=True,  # 打乱数据（重要，避免过拟合）
        drop_last=True  # 丢弃最后不足一个batch的样本（可选）
    )
    val_dataloader = DataLoader(
        dataset=val_dataset,
        batch_size=batch_size,
        shuffle=False,  # 验证集无需打乱
        drop_last=False
    )

    #重新加载神经网络、损失函数、优化器
    net = Net().to(device)
    criterion = nn.MSELoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01)
    
    pbar = tqdm(range(epochs), desc="训练+验证进度")
    for epoch in pbar:
        # ---------------------- 训练阶段 ----------------------
        net.train()  # 切换训练模式（开启Dropout/BatchNorm训练）
        train_loss = 0.0
        for batch_input, batch_target in train_dataloader:
            batch_input, batch_target = batch_input.to(device), batch_target.to(device)
            
            optimizer.zero_grad()
            output = net(batch_input)
            loss = criterion(output, batch_target)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item() * batch_size
    
        # 计算本轮训练平均损失
        avg_train_loss = train_loss / len(train_dataset)
    
        # ---------------------- 验证阶段 ----------------------
        net.eval()  # 切换验证模式（关闭Dropout/BatchNorm训练）
        val_loss = 0.0
        with torch.no_grad():  # 关闭梯度计算，节省显存+加速
            for batch_input, batch_target in val_dataloader:
                batch_input, batch_target = batch_input.to(device), batch_target.to(device)
                
                #验证阶段只需要 前向传播 和 计算损失
                output = net(batch_input)
                loss = criterion(output, batch_target)
                val_loss += loss.item() * batch_size
    
        # 计算本轮验证平均损失
        avg_val_loss = val_loss / len(val_dataset)
    
        # ----------------------- 进度条 -----------------------
        pbar.set_postfix({
            "Train Loss": f"平均训练损失{avg_train_loss:.4f}",
            "Val Loss": f"平均验证损失{avg_val_loss:.4f}"
        })
        
print("batchsize=200:")
different_batch_zize_train(200)
print("batchsize=100:")
different_batch_zize_train(100)
print("batchsize=50:")
different_batch_zize_train(50)
print("batchsize=10:")
different_batch_zize_train(10)

batchsize=200:


训练+验证进度: 100%|██████████| 100/100 [00:03<00:00, 31.35it/s, Train Loss=平均训练损失1.0269, Val Loss=平均验证损失1.0100]


batchsize=100:


训练+验证进度: 100%|██████████| 100/100 [00:04<00:00, 21.66it/s, Train Loss=平均训练损失1.0252, Val Loss=平均验证损失1.0092]


batchsize=50:


训练+验证进度: 100%|██████████| 100/100 [00:07<00:00, 13.33it/s, Train Loss=平均训练损失1.0219, Val Loss=平均验证损失1.0066]


batchsize=10:


训练+验证进度: 100%|██████████| 100/100 [00:33<00:00,  3.00it/s, Train Loss=平均训练损失0.6577, Val Loss=平均验证损失0.7691]


**可以看到不同batchsize的对结果的影响，有时间可以仿照写出不同lr对结果的影响**

## 占位符（未来再写）