# Dropout

一种想法是以一种无偏向(unbiased)的方式注入噪声。这样在固 定住其他层时，每一层的期望值等于没有噪音时的值。
在毕晓普的工作中，他将高斯噪声添加到线性模型的输入中。在每次训练迭代中，他将从均值为零的分布$ε ∼ N (0, σ2)$ 采样噪声添加到输入$x$，从而产生扰动点$x′ = x + ε$，预期是$E[x′] = x$。

在标准暂退法正则化中，通过按保留(未丢弃)的节点的分数进行规范化来消除每一层的偏差。换言之，每个中间活性值$h$以暂退概率$p$由随机变量$h′$ 替换，如下所示

$$h' = \left \{ \begin{aligned}x&=0, &概率为p\\
x&=\frac{h}{1-p}，&else\\
\end{aligned}\right.$$

![title](attachment/dropout.png)

## 从零实现

In [12]:
import torch
import torch.nn as nn
from d2l import torch as d2l

In [2]:

def dropout_layer(X, dropout): 
    assert 0 <= dropout <= 1 # 在本情况中，所有元素都被丢弃 
    if dropout == 1:
        return torch.zeros_like(X) # 在本情况中，所有元素都被保留
    if dropout == 0:
        return X
    mask = (torch.rand(X.shape) > dropout).float() 
    return mask * X / (1.0 - dropout)

In [3]:
X= torch.arange(16, dtype = torch.float32).reshape((2, 8))
print(dropout_layer(X, 0.))
print(dropout_layer(X, 0.5))
print(dropout_layer(X, 1.))

tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11., 12., 13., 14., 15.]])
tensor([[ 0.,  0.,  0.,  6.,  8., 10.,  0., 14.],
        [ 0.,  0.,  0.,  0., 24.,  0.,  0.,  0.]])
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0.]])


## 模型

In [4]:
num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256

In [5]:
net = nn.Module()

In [9]:
dropout1, dropout2 = 0.2, 0.5
class Net(nn.Module):
    def __init__(self, num_inputs, num_outputs, num_hiddens1, num_hiddens2,
                is_training = True): 
        super(Net, self).__init__()
        self.num_inputs = num_inputs
        self.training = is_training
        self.lin1 = nn.Linear(num_inputs, num_hiddens1)
        self.lin2 = nn.Linear(num_hiddens1, num_hiddens2)
        self.lin3 = nn.Linear(num_hiddens2, num_outputs)
        self.relu = nn.ReLU()
    def forward(self, X):
        H1 = self.relu(self.lin1(X.reshape((-1, self.num_inputs)))) # 只有在训练模型时才使用dropout
        if self.training == True:
            # 在第一个全连接层之后添加一个dropout层
            H1 = dropout_layer(H1, dropout1) 
        H2 = self.relu(self.lin2(H1))
        if self.training == True:
            # 在第二个全连接层之后添加一个dropout层
            H2 = dropout_layer(H2, dropout2) 
        out = self.lin3(H2)
        return out

In [10]:
net = Net(num_inputs, num_outputs, num_hiddens1, num_hiddens2)

In [13]:
num_epochs, lr, batch_size = 10, 0.5, 256
loss = nn.CrossEntropyLoss(reduction='none')
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
trainer = torch.optim.SGD(net.parameters(), lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ../data/FashionMNIST/raw/train-images-idx3-ubyte.gz


0it [00:00, ?it/s]

KeyboardInterrupt: 