# Use Dropout and Regularization

## Dropout

Dropout is a regularization technique used in neural networks to prevent overfitting. During training, a random set of neurons are "dropped out" or ignored with a certain probability, which forces the network to learn more robust features and reduces its reliance on any individual neuron. This helps to improve the generalization of the model and make it more resistant to noise in the input data.

In [1]:
# Use BP Net as an example
import torch
from torch import nn, optim

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.dropout = nn.Dropout(0.5)    # Drop out 50% nodes randomly
        self.fc2 = nn.Linear(256, 10)
        self.activate = nn.ReLU()
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = x.view(x.size(0), -1)    # Flatten the images into a vector
        x = self.fc1(x)
        x = self.dropout(x)
        x = self.activate(x)
        x = self.fc2(x)
        return x

## Regularization

In neural networks, regularization methods are used to prevent overfitting, which occurs when the model performs well on the training data but poorly on new, unseen data. Regularization techniques impose constraints on the network's complexity or modify the learning process to encourage simpler models that generalize better. Common regularization methods include L1 and L2 regularization, dropout, early stopping, and data augmentation. These methods help improve the model's ability to generalize from the training data to unseen examples.

L1 regularization can achieve the effect of sparse model parameters.

$$
C = C_0 + \frac{\lambda}{n}\sum_{w}|w|,
$$

L2 regularization can make the weight of the model decay, so that the model parameter values are close to 0.

$$
C = C_0 + \frac{\lambda}{2n} \sum_{w}w^2,
$$


where

$$
C_0 \text{ --- original loss function,}\\
n \text{ --- number of samples,}\\
\lambda \text{ --- coefficient of regularization}

In [3]:
model = Net()
optimizer = optim.Adam(model.parameters(), lr=0.01, weight_decay=0.001) # Set L2 regulariation