04 PyTorch optim and Intro to Convnets
=====================
### Date: Jan 12 2018
### Author: Farahana

Previously, we have manipulated **.data** from computed module training to do weights updates.
The optim package has simplified this step by providing many types of optimizers such as *Adam*, *AdaGrad* and *RMSPRop*.

What we have learnt;

1. **Tensor** : multi-dimensional array, just like numpy that utilized GPU 
2. **autograd.Variable**  : basically to initialize the input and outputs and automate the gradient (dy/dt) computations
3. **nn**        : full of neural networks type of layers and loss functions
    
Now, the next step is to utilize the **optim** package. 

In [1]:
import torch as tc
from torch.autograd import Variable

import torch.nn as nn

Let us use the previous example and re-define it using **optim**.

In [2]:
# Initialization of the example
N, D_in, H, D_out = 24, 1000, 100, 4
learning_rate = 1e-3
dtype = tc.cuda.FloatTensor

### autograd.Variable ###
x = Variable(tc.randn(N, D_in).type(dtype), requires_grad=False) # input
y = Variable(tc.randn(N, D_out).type(dtype), requires_grad=False) # output

### nn.Module ###
model = nn.Sequential(nn.Linear(D_in,H), nn.ReLU(), nn.Linear(H, D_out)).cuda() 
loss_fn = nn.MSELoss(size_average=False) 

We have to choose and define the optimizer. For now, let us try with Adam optimizer.

In [3]:
### optim modules
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

Adam optmizer module has its own default hyper-parameters that we could change while defining:
    
    tc.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)

In [4]:
for t in range(2000):
    y_pred = model(x)

    loss = loss_fn(y_pred, y)
    # Print loss every 100 epochs
    if (t)%100 == 0:
        print(t, loss.data[0])

    # zeroing the gradients each time before backward computation using initialized optimizer. 
    # Replacing the 'model.zero_grad()'
    optimizer.zero_grad()
    loss.backward()
    
    # Rather than using parameters.data, we just call optimizer to update the parameters
    optimizer.step()

0 102.30847930908203
100 0.000534254009835422
200 1.150724493470534e-08
300 6.961792253790122e-13
400 4.184430579812215e-13
500 3.8920255907015644e-13
600 6.22668583361019e-13
700 5.70835045898832e-13
800 9.910544607194538e-13
900 9.752892937697766e-13
1000 1.5639156636382268e-12
1100 1.2753686995381486e-12
1200 1.573019492440153e-12
1300 1.9269724704784608e-12
1400 2.28396468404668e-12
1500 3.328087805343216e-12
1600 8.167536091896466e-12
1700 1.2115213394281454e-05
1800 7.141022773105021e-10
1900 0.00040634864126332104


As usual, we will check the expected vs predicted output

In [5]:
print(y, y_pred)

Variable containing:
 0.6983 -1.4299  1.2260  0.4276
 0.0464 -0.3840 -0.1521  1.7557
 0.3922 -1.5628  1.5454 -1.3816
-0.3886  1.1611  0.6743  1.2763
 0.1159  0.6454 -0.1520 -0.5812
 1.5397  1.2766 -0.7831  1.1301
 0.9382 -1.3022  0.0480  0.1792
-0.8961  0.9348 -1.1036  1.2779
-0.2001  0.6951  0.9331  1.8725
 0.6215  1.4529 -0.2044 -0.5818
-0.7041 -1.5773 -0.6432  0.6037
 0.3944  1.0887 -0.7971 -2.1805
-0.7417  0.2959 -0.2065 -0.1824
-2.9820  1.5765  1.1189  0.0981
 0.3418  0.1637  0.7485  0.6669
-0.0893 -1.0414  0.2990 -0.4111
-0.5233  0.0322 -0.6136 -0.2325
-2.3458  0.6883 -0.3638 -0.8164
-0.3856  0.4345  1.3909 -1.3805
 1.2436 -2.2302  1.1033  1.4108
 0.1344 -0.7789 -0.1577 -0.6173
-1.0368  0.0850 -0.9124 -0.7727
-0.1032  0.8354  0.4000  0.0246
 0.7474  0.1244 -1.2696 -0.7908
[torch.cuda.FloatTensor of size 24x4 (GPU 0)]
 Variable containing:
 0.6983 -1.4299  1.2260  0.4276
 0.0464 -0.3840 -0.1521  1.7557
 0.3922 -1.5628  1.5454 -1.3816
-0.3886  1.1611  0.6743  1.2763
 0.1158  0.6457

We have optimized the example in the most simplest implementation.
***

## Convolutional neural network inside a class  using MNIST dataset
We will use **nn.Functional** where the modules have no trainable or configurable parameters. 

In [6]:
import torch.nn.functional as F

Then, let us define another complex neural network model convolution layers or also known as CNN. 
* The *initialization* or *constructor* is used to initialize the convolutional and fully connected layers
* And the **forward** object is another way to define nn similar to **nn.Sequential** definition.

The convolution layers for 2D has a default parameters too.

`nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
 dilation=1, groups=1, bias=True)`

In [7]:
class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 16, kernel_size=5, padding=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=5, padding=2)
        
        self.fc1 = nn.Linear(7*7*32, 10)
        self.fc2 = nn.Linear(10, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # input is the conv2d with 2by2 kernels, and output is max_pool
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2)) 
        # input is the max_pool with scalar number, the conv2d will be on scalar
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features    
            
net = Net()
print(net)

Net(
  (conv1): Conv2d (1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d (16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (fc1): Linear(in_features=1568, out_features=10)
  (fc2): Linear(in_features=10, out_features=84)
  (fc3): Linear(in_features=84, out_features=10)
)


In [8]:
net.cuda()

Net(
  (conv1): Conv2d (1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d (16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (fc1): Linear(in_features=1568, out_features=10)
  (fc2): Linear(in_features=10, out_features=84)
  (fc3): Linear(in_features=84, out_features=10)
)

We have learnt that backpropagation is done autonomously by Autograd. 
Thus, no backward definition needed in the **Net()** module.

What is still missing;
* Input and expected output
* Loss and optimizer definition
* loss.backward()
* Optimization step
* Training step

### MNIST data loading
We will define two types of datasets; train and test. In both types, there will be images as input and labels as output

In [9]:
import torchvision.datasets as dsets
import torchvision.transforms as transforms

In [10]:
# hyper parameters
num_epochs = 10
batch_size = 100
learning_rate = 0.001

In [11]:
# when train=True, it is training.pt
train_dataset = dsets.MNIST(root='./data', train=True, transform=transforms.ToTensor(),download=True)

# when train=False, it is test.pt
test_dataset = dsets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

We will use **tc.util** for data loading in PyTorch. However, we will learn in next part extensively.

`tc.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=<function default_collate at 0x7fa6d122d9d8>, pin_memory=False, drop_last=False)`

In [12]:
# For train set
train_set = tc.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

# For test set
test_set = tc.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

#### Loss and optimizer definition

In [13]:
loss_fn = nn.CrossEntropyLoss() 
optimizer = optim.Adam(net.parameters(), lr=learning_rate)

#### Training step
Where loss.backward(), optimization and traininig involve. 

In [14]:
for epoch in range(num_epochs):
    # define input as images, and output as labels into Variable to get autograd automatically
    for i, (images, labels) in enumerate(train_set):
        x = Variable(images).cuda()
        y = Variable(labels).cuda()
        
        y_pred = net(x)
        
        loss = loss_fn(y_pred, y)
        # Print loss every 100 epochs
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.data[0]))
            
        optimizer.zero_grad()
        loss.backward()
        
        optimizer.step()

Epoch [1/10], Iter [100/600] Loss: 0.4800
Epoch [1/10], Iter [200/600] Loss: 0.2308
Epoch [1/10], Iter [300/600] Loss: 0.3248
Epoch [1/10], Iter [400/600] Loss: 0.2311
Epoch [1/10], Iter [500/600] Loss: 0.0498
Epoch [1/10], Iter [600/600] Loss: 0.0796
Epoch [2/10], Iter [100/600] Loss: 0.0613
Epoch [2/10], Iter [200/600] Loss: 0.1252
Epoch [2/10], Iter [300/600] Loss: 0.0481
Epoch [2/10], Iter [400/600] Loss: 0.0391
Epoch [2/10], Iter [500/600] Loss: 0.0684
Epoch [2/10], Iter [600/600] Loss: 0.0306
Epoch [3/10], Iter [100/600] Loss: 0.0101
Epoch [3/10], Iter [200/600] Loss: 0.0537
Epoch [3/10], Iter [300/600] Loss: 0.1787
Epoch [3/10], Iter [400/600] Loss: 0.1271
Epoch [3/10], Iter [500/600] Loss: 0.0460
Epoch [3/10], Iter [600/600] Loss: 0.0911
Epoch [4/10], Iter [100/600] Loss: 0.0358
Epoch [4/10], Iter [200/600] Loss: 0.0802
Epoch [4/10], Iter [300/600] Loss: 0.0788
Epoch [4/10], Iter [400/600] Loss: 0.0429
Epoch [4/10], Iter [500/600] Loss: 0.0248
Epoch [4/10], Iter [600/600] Loss:

Then we can test the model

In [15]:
net.eval()

Net(
  (conv1): Conv2d (1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d (16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (fc1): Linear(in_features=1568, out_features=10)
  (fc2): Linear(in_features=10, out_features=84)
  (fc3): Linear(in_features=84, out_features=10)
)

In [16]:
correct = 0
total = 0
for images, labels in test_set:
    images = Variable(images).cuda()
    outputs = net(images)
    _, predicted = tc.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted.cpu() == labels).sum()

print('Test Accuracy of the model on the 10000 test images: %d %%' % (100 * correct / total))

Test Accuracy of the model on the 10000 test images: 98 %
