<a href="https://colab.research.google.com/github/monicasjsu/Reinforcement-Learning/blob/master/DL_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Use the convnet example provided on Canvas as a starting point and add the following two features:
1. Add He initialization and compare the training results with the base model.
2. Add Nadam optimization and compare the training results with the base model.
3. Combine the two modification and explain the overall impact of these two enhancements.

In [1]:
import torch 
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

In [2]:
pip install qhoptim



In [3]:
pip install pywick



In [4]:
# from torch.optim.optimizer import Optimizer
from qhoptim.pyt import QHM, QHAdam
import qhoptim
import pywick

In [5]:
# Device configuration
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

In [6]:
# Hyper parameters
num_epochs = 5
num_classes = 10
batch_size = 100
learning_rate = 0.001

In [7]:
#Getting MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True, 
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False, 
                                          transform=transforms.ToTensor())

In [8]:
#Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size, 
                                          shuffle=False)

In [9]:
#Convolutional neural network (two convolutional layers)
class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(7*7*32, num_classes)
        #He Initialization
        nn.init.kaiming_normal_(self.fc.weight)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

In [10]:
model = ConvNet(num_classes).to(device)

In [11]:
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# Nadam Optimiser
optimizer = qhoptim.pyt.QHAdam(model.parameters(), lr=0.001, betas=(0.9, 0.999), nus=(1.0, 1.0), weight_decay=0.0, decouple_weight_decay=False, eps=1e-08)

In [12]:
# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  exp_avg.mul_(beta1_adj).add_(1.0 - beta1_adj, d_p)


Epoch [1/5], Step [100/600], Loss: 0.1710
Epoch [1/5], Step [200/600], Loss: 0.0791
Epoch [1/5], Step [300/600], Loss: 0.1371
Epoch [1/5], Step [400/600], Loss: 0.0697
Epoch [1/5], Step [500/600], Loss: 0.0713
Epoch [1/5], Step [600/600], Loss: 0.0829
Epoch [2/5], Step [100/600], Loss: 0.0524
Epoch [2/5], Step [200/600], Loss: 0.0453
Epoch [2/5], Step [300/600], Loss: 0.0481
Epoch [2/5], Step [400/600], Loss: 0.0206
Epoch [2/5], Step [500/600], Loss: 0.0197
Epoch [2/5], Step [600/600], Loss: 0.0913
Epoch [3/5], Step [100/600], Loss: 0.0860
Epoch [3/5], Step [200/600], Loss: 0.0355
Epoch [3/5], Step [300/600], Loss: 0.0245
Epoch [3/5], Step [400/600], Loss: 0.0507
Epoch [3/5], Step [500/600], Loss: 0.0088
Epoch [3/5], Step [600/600], Loss: 0.0757
Epoch [4/5], Step [100/600], Loss: 0.0314
Epoch [4/5], Step [200/600], Loss: 0.1525
Epoch [4/5], Step [300/600], Loss: 0.0499
Epoch [4/5], Step [400/600], Loss: 0.0184
Epoch [4/5], Step [500/600], Loss: 0.0309
Epoch [4/5], Step [600/600], Loss:

In [13]:
# Test the model
model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))

Test Accuracy of the model on the 10000 test images: 98.79 %


In [14]:
# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

**Conclusion:** 


*   **Got an accuracy of 99.12 for the base model**
*   **Added He initialization and got an accuracy of 99.0%**
*   **Added Nadam initilization and got an accuracy of 98.94%**
*   **Combining both He and Nadam initilization got an accuracy of 98.79%**

