# PyTorch Lesson 6


Things we will learn in this lesson:

    - We will be moving ahead from single hiddenlayer network to deep neural network building 
    - How to automate Deep neural network code creation with too many hidden layers?
    - How to apply regularization using "DROPOUT"
    - differnt types of initialization method in pytorch
    - Batch normalization in pytorch

# Deep neural network

Adding more neurons in a single hidden layer would result in overfiltting

To overcome this we would add more hidden layer and hence would improve the performance of the model

In [1]:
import torch
from torch import nn, optim
from torchvision import datasets, transforms
from torch.utils.data import Dataset, DataLoader

In [11]:
train  = datasets.MNIST(root ='./',train=True, download=False,transform=transforms.ToTensor())
test = datasets.MNIST(root ='./',train=True, download=False,transform=transforms.ToTensor())

train_dataset = DataLoader(dataset=train, batch_size=100)
test_dataset = DataLoader(dataset=test, batch_size=100)

In [12]:
class Net(nn.Module):
    def __init__(self, D_in, H1, H2, D_out):
        super(Net,self).__init__()
        self.Linear1 = nn.Linear(D_in, H1)
        self.Linear2 = nn.Linear(H1, H2)
        self.Linear3 = nn.Linear(H2, D_out)
    def forward(self,x):
        x = torch.relu(self.Linear1(x))
        x = torch.relu(self.Linear2(x))
        x = self.Linear3(x)
        return x

In [13]:
model = Net(784, 100, 100, 10)
loss = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [15]:
##Training
for epoch in range(10):
    total = 0
    for i, (x, y) in enumerate(train_dataset): 
        optimizer.zero_grad()
        yhat = model.forward(x.view(-1, 28 * 28))
        loss_ = loss(yhat, y)
        optimizer.zero_grad()
        loss_.backward()
        total = total + loss_.data.item()
        optimizer.step()
    print('The cost at the end of epoch {} is {}'.format(total, epoch))

The cost at the end of epoch 1260.1948161125183 is 0
The cost at the end of epoch 572.1300027370453 is 1
The cost at the end of epoch 311.073055498302 is 2
The cost at the end of epoch 249.4968118444085 is 3
The cost at the end of epoch 222.22233516722918 is 4
The cost at the end of epoch 205.63527696952224 is 5
The cost at the end of epoch 193.45121706277132 is 6
The cost at the end of epoch 183.55334806069732 is 7
The cost at the end of epoch 175.06302646175027 is 8
The cost at the end of epoch 167.50499663501978 is 9


In [17]:
correct = 0
for (x,y) in test_dataset:
    yhat = model.forward(x.view(-1,784))
    _,pred = torch.max(yhat,1)
    #print(pred,y)
    correct = correct + ((pred == y).sum().item())
    #print(correct)

#print(correct)
accuracy = (correct/len(test)) *100

## How to automate Deep neural network code creation with too many hidden layers?

Multi layers need not be defined by hand, it can be easily automated using **nn.ModuleList()**

Iteration can be then used to define based on the hidden layer neuron parameter as list

In [22]:
class Net(nn.Module):
    def __init__(self, Layer_list):
        super(Net,self).__init__()
        self.hidden = nn.ModuleList()

    def forward(self,x):
        for i,j in zip(Layer_list,Layer_list[1:]):
            if i != len(len(self.hidden))-2:
                self.hidden.append(torch.relu(nn.Linear(i,j)))
            else:
                self.hidden.append(nn.Linear(i,j))
        return x

## How to use dropout?

-- drop out is a regularization technique

-- we apply dropout during training

-- we turn off drop out during test/validation


In [23]:
class Net(nn.Module):
    def __init__(self, D_in, H1, H2, D_out):
        super(Net,self).__init__()
        self.Linear1 = nn.Linear(D_in, H1)
        self.Linear2 = nn.Linear(H1, H2)
        self.Linear3 = nn.Linear(H2, D_out)
        self.drop = nn.Dropout(0.2) #this line will drop 20% of the neurons from the layer output
    def forward(self,x):
        x = torch.relu(self.Linear1(x))
        x = self.drop(0.2)
        x = torch.relu(self.Linear2(x))
        x = self.drop(0.2)
        x = self.Linear3(x)
        return x

To use dropout there is a caution

-- while training the parameter has to be set as **model.train()** :
    
    this command will set the dropout function enables during the training process

-- while testing the paramter has to be set as **model.eval()** :
    
    this command will set the dropout function disabled during tetsing process

# Initialization methods in pytorch

Initialization is very important, since the initial weights that are assigned as selected in random

Since the values are selected in random, the correposding forward pass operation with activation might result in a huge value.

Thereby causing the derivative to be very close to zero (sigmoid/tanh).This would clearly lead to **Vanishing Gradient Problem**

Solution: thus the range of random value distribution shoulb be intutively sleected
    
    The range is fixed between -1/(num of neurons) to +1/(num of neurons)

#### Default method:

>> -1/(suqare root (num of neurons Lin))   to +1/(suqare root (num of neurons Lin)) 

#### Xavier method:

>> -square root(6)/(square root of Lin + Lout)  to  +square root(6)/(square root of Lin + Lout)

In [None]:
Linear = nn.Linear(in_,out_)

torch.nn.init.xavier_uniform_(Linear.weight)

#### He method

In [None]:
Linear = nn.Linear(in_,out_)

torch.nn.init.kaiming_uniform_(Linear.weight, nonlinearity='relu') #specially used when RELU is used

#### Momentum parameter to avoid saddle point/local minima in optimization

In [None]:
optim.SGD(model.paramters, lr=0.01, momentum=0.01)

## Batch Normalization

The output of every layer might have a differnt scale of values and hence it is very important to normalize them


Batch normalization is used for this

>> z-mean/square root(std.dev z)

In [26]:
class Net(nn.Module):
    def __init__(self, D_in, H1, H2, D_out):
        super(Net,self).__init__()
        self.Linear1 = nn.Linear(D_in, H1)
        self.Linear2 = nn.Linear(H1, H2)
        self.Linear3 = nn.Linear(H2, D_out)
        
        self.bn1 = nn.BatchNorm1d(H1)
        self.bn2 = nn.BatchNorm1d(H2)
        
    def forward(self,x):
        x = torch.relu(self.bn1(self.Linear1(x)))
        x = torch.relu(self.bn2(self.Linear2(x)))
        x = self.Linear3(x)
        return x