# 练习 TensorBoard
练习在 PyTorch 下使用 TensorBoard 记录标量指标、网络结构、图像等数据

## PyTorch 中产生 TensorBoard 日志
在 PyTorch 下主要利用 `torch.utils.tensorboard.SummaryWriter` 来将 TensorBoard 需要的日志数据写入磁盘。

- `add_scalar` 记录一个标量
- `add_scalars` 记录多个标量
- `add_histogram` 记录一个直方图
- `add_image` 记录一个图像
- `add_images` 记录多个图像
- `add_graph` 记录图格式的数据（可以用来绘制网络结构）
- 更多方法见[文档](https://pytorch.org/docs/stable/tensorboard.html)

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
import torchvision
from torchvision import datasets, models, transforms
from tqdm.notebook import tqdm
from datetime import datetime

初始化 `SummaryWriter`，可以指定 `log_dir`，如果不指定则放置在 `./runs/CURRENT_DATETIME_HOSTNAME` 目录下

In [15]:
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
log_dir = f"./logs/fashion-mnist-resnet50-{now}"
log_dir

'./logs/fashion-mnist-resnet50-2022-04-10 15:23:59'

In [3]:
writer = SummaryWriter(log_dir)

In [4]:
# 一些超参数
batch_size = 128
device="cuda:0"
lr = 1e-3
epochs = 10

## 加载数据集和模型

简单起见，这里使用 `torchvison` 内置的数据集和模型

### 加载 FashionMNIST 数据集并初始化 DataLoader

In [5]:
# 将 PIL image 转换为 tensor 并进行标准化
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

In [6]:
training_data = datasets.FashionMNIST(
    root="./FashionMNIST",
    train=True,
    download=True,
    transform=transform
)
test_data = datasets.FashionMNIST(
    root="./FashionMNIST",
    train=False,
    download=True,
    transform=transform
)

In [7]:
# shuffle=True表示在每个 epoch 之前重新打乱数据集，测试集没有必要打乱
training_dataloader = DataLoader(training_data, batch_size=batch_size, shuffle=True, drop_last=True)
test_dataloader = DataLoader(test_data, batch_size=batch_size, drop_last=True)

手动取出一个 batch 的数据并添加到 TensorBoard

In [8]:
images, labels = next(iter(training_dataloader))
images.size()

torch.Size([128, 1, 28, 28])

In [9]:
# 将64张图片拼成一个网格并添加到 TensorBoard
grid = torchvision.utils.make_grid(images)
writer.add_image('Training Images', grid, 0)

### 初始化模型

这里采用 `torchvision` 内置的 ResNet50，当然因为 ResNet 处理的是 RGB 格式的三通道图像，而 FashionMNIST 数据集是单通道的灰度图像，所以我们需要将ResNet 50 的第一个卷积层的输入通道修改成 1。

In [10]:
model = models.resnet18(
    pretrained=False,
    progress=False,
    num_classes=10,
)
# 修改第一个卷积层的输入通道
model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

将模型结构写入 TesorBoard，需要指定模型本身和模型输入，模型输入使用上面已经取出的一个 batch 的数据

In [11]:
writer.add_graph(model, images)

## 训练模型

In [12]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

In [13]:
def train_loop(
    model: nn.Module, 
    dataloader: DataLoader, 
    loss_fn,
    optimizer, 
    device: str, 
    writer: SummaryWriter, 
    epoch: int
    ):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    loop = tqdm(enumerate(dataloader), total =len(dataloader), leave =False)
    loop.set_description(f'Epoch [{epoch}/{epochs}]')
    for batch, (X, y) in loop:
        X, y = X.to(device), y.to(device)
        # 前向传播并计算loss
        pred = model(X)
        loss = loss_fn(pred, y)
        # 反向传播并优化模型参数
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # 让进度条显示 acc 和 loss
        acc = 100 * (pred.argmax(1) == y).type(torch.float).sum().item() / X.size()[0]
        loop.set_postfix(loss=loss.item(), acc=acc)
        # 写入 TensorBoard
        if batch % 100 == 0:
            writer.add_scalar("Loss/train", loss.item(), epoch * num_batches + batch)
            writer.add_scalar("Acc/train", acc, epoch * num_batches + batch)
        

def test_loop(model: nn.Module, dataloader: DataLoader, loss_fn, device: str, writer: SummaryWriter, epoch):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    # 每个 epoch 结束后往 TensorBoard 记录一次测试的 Loss 和 Acc
    writer.add_scalar("Loss/test", test_loss, epoch)
    writer.add_scalar("Acc/test", 100 * correct, epoch)

In [14]:
model.to(device)
for t in range(epochs):
    train_loop(model, training_dataloader, loss_fn, optimizer, device, writer, t)
    test_loop(model, test_dataloader, loss_fn, device, writer, t)
    torch.save(model.state_dict(), f"{log_dir}/epoch{t}.pth")
print("Done!")

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

  0%|          | 0/468 [00:00<?, ?it/s]

Done!
