批量归一化的提出正是为了应对深度模型训练的挑战。在模型训练时，批量归一化利用小批量上的均值和标准差，不断调整神经网络中间输出，从而使整个神经网络在各层的中间输出的数值更稳定。**批量归一化和下一节将要介绍的残差网络为训练和设计深度模型提供了两类重要思路**。

In [2]:
import torch
import sys
sys.path.append("..") 

In [3]:
X = torch.randn(2,3,32,32)
mean = X.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)

X =torch.tensor([[1,3,3,4], [1,2,3,4.0], [1,3,3,4]])
print(X.mean(dim = 0, keepdim = True).mean(dim=1, keepdim=True))
print(X.mean(dim = 0))

tensor([[2.6667]])
tensor([1.0000, 2.6667, 3.0000, 4.0000])


# Pytorch batchNorm

In [4]:
import torch.nn as nn
import dl_utils
net = nn.Sequential(
            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size
            nn.BatchNorm2d(6),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2), # kernel_size, stride
            nn.Conv2d(6, 16, 5),
            nn.BatchNorm2d(16),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2),
            dl_utils.FlattenLayer(),
            nn.Linear(16*4*4, 120),
            nn.BatchNorm1d(120),
            nn.Sigmoid(),
            nn.Linear(120, 84),
            nn.BatchNorm1d(84),
            nn.Sigmoid(),
            nn.Linear(84, 10)
        )

In [10]:
batch_size = 256
train_iter, test_iter = dl_utils.load_data_fashion_mnist(batch_size=batch_size)
device = 'cuda'

torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.enabled = True

lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
dl_utils.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

training on  cuda
epoch 1, loss 0.2503, train acc 0.909, test acc 0.838, time 9.7 sec
epoch 2, loss 0.1216, train acc 0.912, test acc 0.821, time 8.9 sec
epoch 3, loss 0.0789, train acc 0.914, test acc 0.704, time 9.1 sec
epoch 4, loss 0.0577, train acc 0.917, test acc 0.842, time 9.0 sec
epoch 5, loss 0.0451, train acc 0.919, test acc 0.686, time 8.7 sec


training on  cuda
epoch 1, loss 0.2957, train acc 0.897, test acc 0.835, time 9.1 sec
epoch 2, loss 0.1415, train acc 0.900, test acc 0.802, time 9.1 sec
epoch 3, loss 0.0904, train acc 0.904, test acc 0.857, time 9.1 sec
epoch 4, loss 0.0660, train acc 0.906, test acc 0.801, time 9.2 sec
epoch 5, loss 0.0514, train acc 0.908, test acc 0.805, time 9.4 sec

## 小结

- 在模型训练时，批量归一化利用小批量上的均值和标准差，不断调整神经网络的中间输出，从而使整个神经网络在各层的中间输出的数值更稳定。
- 对全连接层和卷积层做批量归一化的方法稍有不同。
- 批量归一化层和丢弃层一样，在训练模式和预测模式的计算结果是不一样的。
- PyTorch提供了BatchNorm类方便使用。