课程
1. CNN MNIST
2. 训练过程可视化
3. 模型的存储和可视化
4. RNN 代码课堂作业
5. Attention 代码实现(Optional )

In [None]:
!pip install tensorboard 

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import numpy as np
import time

# 设置设备
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"使用设备: {device}")

使用设备: cpu


加载数据

In [2]:
# 加载sklearn中的手写数字数据集
digits = load_digits()
X = digits.data  # 特征数据 (1797, 64)
y = digits.target  # 标签数据 (1797,)

# 数据预处理：重塑为图像格式并归一化
X = X.reshape(-1, 1, 8, 8)  # 重塑为 (样本数, 通道数, 高度, 宽度)
X = X / 16.0  # 将像素值归一化到 [0,1] 范围

# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 转换为PyTorch张量
X_train = torch.FloatTensor(X_train)
y_train = torch.LongTensor(y_train)
X_test = torch.FloatTensor(X_test)
y_test = torch.LongTensor(y_test)

# 创建数据加载器
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000)

RNN 的核心概念是循环，它通过在时间维度上对输入序列进行迭代来捕捉序列中的长期依赖关系。
对于图像信息， RNN 的应该循环什么？ 

In [3]:
# 定义CNN模型（适应8x8图像）
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=3)  # 输入通道1，输出通道10
        self.conv2 = nn.Conv2d(10, 20, kernel_size=3)  # 输入通道10，输出通道20
        self.conv2_drop = nn.Dropout2d()  # 卷积层的Dropout
        self.fc1 = nn.Linear(20 * 1 * 1, 50)  # 全连接层
        self.fc2 = nn.Linear(50, 10)  # 输出层

    def forward(self, x):
        x = self.conv1(x)  # 卷积操作 (8-3+1=6) -> 6x6
        x = nn.functional.max_pool2d(x, 2)  # 最大池化 -> 3x3
        x = nn.functional.relu(x)  # ReLU激活函数
        
        x = self.conv2(x)  # 第二次卷积 (3-3+1=1) -> 1x1
        x = self.conv2_drop(x)  # Dropout防止过拟合
        x = nn.functional.max_pool2d(x, 1)  # 池化 (保持1x1)
        x = nn.functional.relu(x)  # ReLU激活函数
        
        x = x.view(-1, 20 * 1 * 1)  # 展平为一维向量
        x = self.fc1(x)  # 全连接层
        x = nn.functional.relu(x)  # ReLU激活函数
        x = nn.functional.dropout(x, training=self.training)  # Dropout
        x = self.fc2(x)  # 输出层
        return nn.functional.log_softmax(x, dim=1)  # 对数Softmax激活函数

In [None]:
# 初始化模型、损失函数和优化器
# 定义RNN模型
class RNN(nn.Module):
    def __init__(self, input_size=8, hidden_size=128, num_layers=2, num_classes=10):
        pass
        
    def forward(self, x):
        pass

Train & Test

In [4]:
# 初始化RNN模型
model = CNN().to(device)
# criterion = nn.NLLLoss()
# optimizer = optim.Adam(model.parameters(), lr=0.001)

#model = CNN().to(device)
criterion = nn.NLLLoss()  # 负对数似然损失
optimizer = optim.Adam(model.parameters(), lr=0.001)
#criterion = CustomLoss(lambda_1=2.0, lambda_9=-0.5)  # 调整权重参数


In [6]:
def train(epochs):
    model.train()
    for epoch in range(epochs):
        start_time = time.time()
        running_loss = 0.0
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            #print(data.shape) # [64,1,8,8]
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
        
        end_time = time.time()
        print(f'Epoch {epoch+1}/{epochs}, 损失: {running_loss/len(train_loader):.4f}, 耗时: {end_time-start_time:.2f}秒')

# 评估模型

In [7]:

def test():
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            pred = output.argmax(dim=1, keepdim=True)  # 获取最大概率的索引
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader)
    accuracy = 100. * correct / len(test_loader.dataset)
    print(f'测试集: 平均损失: {test_loss:.4f}, 准确率: {correct}/{len(test_loader.dataset)} ({accuracy:.2f}%)')



In [8]:
# 训练和评估
print("开始训练CNN模型...")
train(epochs=20)  # 由于数据集较小，增加训练轮次
print("\n开始评估模型...")
test()   

开始训练CNN模型...
Epoch 1/20, 损失: 2.3023, 耗时: 0.25秒
Epoch 2/20, 损失: 2.2899, 耗时: 0.13秒
Epoch 3/20, 损失: 2.2550, 耗时: 0.13秒
Epoch 4/20, 损失: 2.1995, 耗时: 0.13秒
Epoch 5/20, 损失: 2.1011, 耗时: 0.13秒
Epoch 6/20, 损失: 1.9537, 耗时: 0.13秒
Epoch 7/20, 损失: 1.8108, 耗时: 0.12秒
Epoch 8/20, 损失: 1.6962, 耗时: 0.12秒
Epoch 9/20, 损失: 1.6287, 耗时: 0.13秒
Epoch 10/20, 损失: 1.5551, 耗时: 0.13秒
Epoch 11/20, 损失: 1.4406, 耗时: 0.13秒
Epoch 12/20, 损失: 1.4353, 耗时: 0.13秒
Epoch 13/20, 损失: 1.3782, 耗时: 0.13秒
Epoch 14/20, 损失: 1.2884, 耗时: 0.13秒
Epoch 15/20, 损失: 1.3061, 耗时: 0.14秒
Epoch 16/20, 损失: 1.2825, 耗时: 0.13秒
Epoch 17/20, 损失: 1.2583, 耗时: 0.13秒
Epoch 18/20, 损失: 1.2339, 耗时: 0.13秒
Epoch 19/20, 损失: 1.1933, 耗时: 0.14秒
Epoch 20/20, 损失: 1.1893, 耗时: 0.14秒

开始评估模型...
测试集: 平均损失: 0.6740, 准确率: 300/360 (83.33%)


模型训练过程中可视化
Tensorboard  & wandB
wandb 需要个人注册， 本次课堂就不展开了， 有兴趣的同学可以自己去了解。

TensorBoard 是由 Google 开发的深度学习可视化工具，是 TensorFlow 生态系统的重要组成部分。它能将训练过程中的数据（如损失值、准确率、网络结构等）以直观的图表、图形形式呈现，帮助开发者理解模型训练动态、调试问题并优化模型。

主要功能 （https://www.tensorflow.org/tensorboard?hl=zh-cn）
跟踪和可视化损失及准确率等指标
可视化模型图（操作和层）
查看权重、偏差或其他张量随时间变化的直方图
将嵌入投射到较低的维度空间
显示图片、文字和音频数据 

https://www.tensorflow.org/tensorboard/get_started?hl=zh-cn

In [9]:

from torch.utils.tensorboard import SummaryWriter
# 创建SummaryWriter对象，指定日志保存目录​
writer = SummaryWriter('./my_experiment')
def train(epochs):
    model.train()
    for epoch in range(epochs):
        start_time = time.time()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            
            # 计算准确率
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
        
        end_time = time.time()
        avg_loss = running_loss / len(train_loader)
        accuracy = 100. * correct / total
        
        # 记录训练损失和准确率到TensorBoard
        writer.add_scalar('Training Loss', avg_loss, epoch)
        writer.add_scalar('Training Accuracy', accuracy, epoch)
        # writer.add_scalars(
        # 'Loss Comparison',  # 图表标题
        # {
        #     'Train': avg_loss,
        #     'Test': accuracy
        # },
        # epoch
        # )   

        
        print(f'Epoch {epoch+1}/{epochs}, 损失: {avg_loss:.4f}, 准确率: {accuracy:.2f}%, 耗时: {end_time-start_time:.2f}秒')
# 评估模型

    return accuracy

In [10]:
print("开始训练CNN模型...")
train(epochs=20)  # 由于数据集较小，增加训练轮次
print("\n开始评估模型...")
test()    

开始训练CNN模型...
Epoch 1/20, 损失: 1.1662, 准确率: 56.58%, 耗时: 0.22秒
Epoch 2/20, 损失: 1.1116, 准确率: 58.11%, 耗时: 0.15秒
Epoch 3/20, 损失: 1.1313, 准确率: 59.50%, 耗时: 0.13秒
Epoch 4/20, 损失: 1.1299, 准确率: 58.66%, 耗时: 0.14秒
Epoch 5/20, 损失: 1.0866, 准确率: 60.54%, 耗时: 0.14秒
Epoch 6/20, 损失: 1.0741, 准确率: 61.52%, 耗时: 0.13秒
Epoch 7/20, 损失: 1.0169, 准确率: 62.91%, 耗时: 0.13秒
Epoch 8/20, 损失: 1.0083, 准确率: 62.56%, 耗时: 0.13秒
Epoch 9/20, 损失: 1.0287, 准确率: 62.49%, 耗时: 0.13秒
Epoch 10/20, 损失: 1.0112, 准确率: 64.16%, 耗时: 0.13秒
Epoch 11/20, 损失: 0.9769, 准确率: 65.48%, 耗时: 0.13秒
Epoch 12/20, 损失: 0.9608, 准确率: 65.97%, 耗时: 0.13秒
Epoch 13/20, 损失: 0.9814, 准确率: 64.02%, 耗时: 0.13秒
Epoch 14/20, 损失: 0.9657, 准确率: 66.04%, 耗时: 0.13秒
Epoch 15/20, 损失: 0.9621, 准确率: 65.48%, 耗时: 0.13秒
Epoch 16/20, 损失: 0.9175, 准确率: 66.46%, 耗时: 0.13秒
Epoch 17/20, 损失: 0.9229, 准确率: 65.83%, 耗时: 0.13秒
Epoch 18/20, 损失: 0.9255, 准确率: 66.25%, 耗时: 0.13秒
Epoch 19/20, 损失: 0.8892, 准确率: 66.32%, 耗时: 0.16秒
Epoch 20/20, 损失: 0.8884, 准确率: 68.48%, 耗时: 0.13秒

开始评估模型...
测试集: 平均损失: 0.3668, 准确率: 3

In [12]:
!tensorboard --logdir=./my_experiment

TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.19.0 at http://localhost:6008/ (Press CTRL+C to quit)
^C


课堂练习： 
1. Tensorboard 对比不同学习率的结果
2. 将 test 的结果拿进来做分析


TASK: 模型存储 & 可视化

In [None]:
# only save the weight 
torch.save(model, 'model.pth')
# 保存参数
torch.save(model.state_dict(), 'model_weights.pth')
# we visualzie it in https://netron.app/

TASK : 定制 LOSS， 假设我希望提升 label = 1 的准确率， 降低 label = 9 的准确率， 应该怎么实现？ 

In [None]:
class CustomLoss(nn.Module):
    def __init__(self, lambda_1=1.5, lambda_9=-0.5):
        super(CustomLoss, self).__init__()
        pass
        

        
    def forward(self, outputs, targets):
        pass
        

criterion = CustomLoss(lambda_1=2.0, lambda_9=0.01)  # 调整权重参数
