# Implementation of Dropout from Scratch

To implement the dropout function for a single layer, we must draw as many samples from a Bernoulli
(binary) random variable as our layer has dimensions, where the random variable takes value 1 (keep) with
probability 1-p and 0 (drop) with probability p. One easy way to implement this is to first draw samples
from the uniform distribution U[0, 1]. Then we can keep those nodes for which the corresponding sample is
greater than p, dropping the rest.

In the following code, we implement a dropout function that drops out the elements in the tensor input
X with probability ``drop_prob``, rescaling the remainder as described above (dividing the survivors by 1.
0-drop_prob).

In [None]:
# import packages
import torch
import torchvision
from torch import nn, optim
import numpy as np
import d2l
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
from IPython import display
import utils

In [None]:
def dropout(X, drop_prob):
    # insert your code here

We can test out the dropout function on a few examples. In the following lines of code, we pass our input
X through the dropout operation, with probabilities 0, 0.5, and 1, respectively.

In [None]:
X = torch.arange(start=0, end=16, 
                 out=torch.FloatTensor()).reshape(2, 8)
# test your code

## Define the model

Again, we can use the Fashion-MNIST dataset. We will define a multilayer
perceptron with two hidden layers. The two hidden layers both have 256 outputs.

In [None]:
class ThreeLayerNet(torch.nn.Module):
    def __init__(self, num_inputs, num_hiddens1, 
                 num_hiddens2, num_outputs):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(ThreeLayerNet, self).__init__()
        self.num_inputs = num_inputs
        self.linear1 = torch.nn.Linear(num_inputs, num_hiddens1)
        self.linear2 = torch.nn.Linear(num_hiddens1, num_hiddens2)
        self.linear3 = torch.nn.Linear(num_hiddens2, num_outputs)
        self.nonlinear_func = torch.nn.ReLU()

    def forward(self, x):
        """
        In the forward function we accept a Tensor of input data and we must return
        a Tensor of output data. We can use Modules defined in the constructor as
        well as arbitrary (differentiable) operations on Tensors.
        """
        h_relu1 = self.nonlinear_func(self.linear1(x.reshape(-1, self.num_inputs)))
        # insert your coder here
        h_relu2 = self.nonlinear_func(self.linear2(h_relu1))
        # insert your coder here
        y_pred = self.linear2(h_relu2)
        return y_pred


In [None]:
net = ThreeLayerNet(num_inputs=784, num_hiddens1=256,
                  num_hiddens2=256, num_outputs=10)

In [None]:
drop_prob1, drop_prob2 = 0.2, 0.5
batch_size = 256
train_iter, test_iter = utils.load_data_fashion_mnist(batch_size)
num_epochs, lr = 10, 0.5
optimizer = optim.SGD(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss()

utils.train(net, train_iter, test_iter, loss, num_epochs, 
            optimizer)

In [None]:
utils.predict(net, test_iter)