## Pytorch NN module

Deep learning consists of composing linearities with non-linearities in clever ways. The introduction of non-linearities allows for powerful models. In this section, we will play with these core components, make up an objective function, and see how the model is trained.

### Affine Maps

One of the core workhorses of deep learning is the affine map, which is a function f(x) where

                                            f(x)=Ax+b
for a matrix A and vectors x,b. The parameters to be learned here are A and b. Often, b is refered to as the bias term.

Pytorch and most other deep learning frameworks do things a little differently than traditional linear algebra. **It maps the rows of the input instead of the columns. That is, the i‘th row of the output below is the mapping of the i‘th row of the input under A, plus the bias term. Look at the example below.**

In [1]:
%%time
import torch
from torch.autograd import Variable

M, In_size, H_size, Out_size = 1000, 5, 4, 2

x = Variable(torch.rand(M, In_size), requires_grad = False)  # Row is taking different example
y = Variable(torch.rand(M, Out_size), requires_grad = False) # that's how it's defined in package, 
                                                            # so operation will be col major

model = torch.nn.Sequential(
        torch.nn.Linear(In_size, H_size),
        torch.nn.ReLU(),
        torch.nn.Linear(H_size, Out_size))

loss = torch.nn.MSELoss(size_average = False)

learning_rate = 1e-6

for t in range(1000):
    out = model(x)
    loss_out = loss(out, y)
    if t%100 == 1:
        print(t, loss_out.data[0])
    model.zero_grad()
    
    loss_out.backward()
    
    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

(1, 436.4664611816406)
(101, 280.26239013671875)
(201, 216.25485229492188)
(301, 192.68450927734375)
(401, 184.41600036621094)
(501, 181.48941040039062)
(601, 180.35386657714844)
(701, 179.81932067871094)
(801, 179.4915008544922)
(901, 179.2390594482422)
CPU times: user 696 ms, sys: 612 ms, total: 1.31 s
Wall time: 5.76 s


### Using Optimizer instead of manual update

In [2]:
%%time
import torch
from torch.autograd import Variable


M, In_size, H_size, Out_size = 1000, 5, 4, 2

x = Variable(torch.rand(M, In_size), requires_grad = False)  # Row is taking different example
y = Variable(torch.rand(M, Out_size), requires_grad = False) # that's how it's defined in package, 
                                                            # so operation will be col major

model = torch.nn.Sequential(
        torch.nn.Linear(In_size, H_size),
        torch.nn.ReLU(),
        torch.nn.Linear(H_size, Out_size))

loss = torch.nn.MSELoss(size_average = False)

learning_rate = 1e-6
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
for t in range(1000):
    out = model(x)
    loss_out = loss(out, y)
    if t%100 == 1:
        print(t, loss_out.data[0])
    model.zero_grad()
    
    loss_out.backward()
    
    optimizer.step()

(1, 495.200439453125)
(101, 494.8741455078125)
(201, 494.54766845703125)
(301, 494.22125244140625)
(401, 493.89483642578125)
(501, 493.568603515625)
(601, 493.2416687011719)
(701, 492.9151306152344)
(801, 492.5890197753906)
(901, 492.2625732421875)
CPU times: user 332 ms, sys: 28 ms, total: 360 ms
Wall time: 400 ms


## Custom net from NN module

In [11]:
class TwoLayerNet(torch.nn.Module):
    def __init__(self, In_size, H_size, Out_size):
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(In_size, H_size)
        self.linear2 = torch.nn.Linear(H_size, Out_size)
    def forward(self, x):
        h = self.linear1(x).clamp(min = 0)
        out = self.linear2(h)
        return out

M, In_size, H_size, Out_size = 1000, 5, 4, 2

x = Variable(torch.rand(M, In_size), requires_grad = False)  # Row is taking different example
y = Variable(torch.rand(M, Out_size), requires_grad = False) # that's how it's defined in package, 
                                                            # so operation will be col major

model = TwoLayerNet(In_size, H_size, Out_size)

loss = torch.nn.MSELoss(size_average = False)

learning_rate = 1e-6
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
for t in range(1000):
    out = model(x)
    loss_out = loss(out, y)
    if t%100 == 1:
        print(t, loss_out.data[0])
    optimizer.zero_grad()
    
    loss_out.backward()
    
    optimizer.step()

(1, 1700.6502685546875)
(101, 1700.131591796875)
(201, 1699.613525390625)
(301, 1699.096435546875)
(401, 1698.577392578125)
(501, 1698.0589599609375)
(601, 1697.540283203125)
(701, 1697.0238037109375)
(801, 1696.507080078125)
(901, 1695.988525390625)
