# Neural Networks in Pytorch

- Ref: Pytorch tutorial: https://pytorch.org/tutorials/
- Tensor replaces numpy.ndarray to allow GPU computation!

- Ref for this tutorial: https://hsaghir.github.io/data_science/pytorch_starter/

## 1) Basics

In [10]:
# 1) Import Stuff
import torch
import torch.nn as nn # Neural net library
import torch.optim as optim # Optimization library
import torch.nn.functional as F # Non-linear functions
import torch.autograd as autograd #build a computational graph

from torch.autograd import Variable # Computation Graph module to get Gradients

In [11]:
# Two useful functions: 
# - squeeze: remove dimension of 1
# - unsqueeze: insert dimentions of 1

# squeeze
y = torch.zeros(1,2,1,2)
print('original shape:',y.shape)
torch.squeeze(y).shape

original shape: torch.Size([1, 2, 1, 2])


torch.Size([2, 2])

In [12]:
# unsqueeze
z = torch.zeros(2,3,4,5)
torch.unsqueeze(z,2).shape # insert dimention 1 at position 2

torch.Size([2, 3, 1, 4, 5])

In [13]:
# GPU or CPU
use_cuda = torch.cuda.is_available()
FloatTensor = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if use_cuda else torch.LongTensor
ByteTensor = torch.cuda.ByteTensor if use_cuda else torch.ByteTensor
Tensor = FloatTensor

## 2) Replace numpy ndarrays

In [14]:
# 2 matrices of size 2x3 into a 3d tensor 2x2x3
d = [ [[1., 2.,3.], [4.,5.,6.]], [[7.,8.,9.], [11.,12.,13.]] ]
d = torch.Tensor(d) # array from python list
print("shape of the tensor:", d.size())

# the first index is the depth
z = d[0] + d[1]
print("adding up the two matrices of the 3d tensor:",z)

shape of the tensor: torch.Size([2, 2, 3])
adding up the two matrices of the 3d tensor: 
  8  10  12
 15  17  19
[torch.FloatTensor of size 2x3]



In [15]:
# a heavily used operation is reshaping of tensors using .view()
print(d.view(2,-1)) #-1 makes torch infer the second dim


  1   2   3   4   5   6
  7   8   9  11  12  13
[torch.FloatTensor of size 2x6]



## 3) Computational graphs:  torch.autograd

- Tensor --> node in the graph
- Operations on tensors --> edges in the graph

In [16]:
# d is a tensor not a node, to create a node based on it:
x = autograd.Variable(d, requires_grad=True)
print("the node's data is the tensor:", x.data.size())
print("the node's gradient is empty at creation:", x.grad) # the grad is empty right now

the node's data is the tensor: torch.Size([2, 2, 3])
the node's gradient is empty at creation: None


In [28]:
# do operation on the node to make a computational graph
y = x + 1
z = x + y
s = z.sum()
print(s.grad_fn)

<SumBackward0 object at 0x0000016B755D6518>


In [22]:
# calculate gradients
s.backward()
print("the variable now has gradients:",x.grad)

the variable now has gradients: Variable containing:
(0 ,.,.) = 
  2  2  2
  2  2  2

(1 ,.,.) = 
  2  2  2
  2  2  2
[torch.FloatTensor of size 2x2x3]



## 4) torch.nn contains various NN layers

- (linear mappings of rows of a tensor) + (nonlinearities)

- It helps build a neural net computational graph without the hassle of manipulating tensors and parameters manually

In [23]:
# linear transformation of a 2x5 matrix into a 2x3 matrix
linear_map = nn.Linear(5,3) # input size:5, output size:3
print("using randomly initialized params:", linear_map.parameters)

using randomly initialized params: <bound method Module.parameters of Linear(in_features=5, out_features=3, bias=True)>


In [24]:
# data has 2 examples with 5 features and 3 target
data = torch.randn(2,5) # training
y = autograd.Variable(torch.randn(2,3)) # target
# make a node
x = autograd.Variable(data, requires_grad=True)

# apply transformation to a node creates a computational graph
a = linear_map(x)
z = F.relu(a)
o = F.softmax(z)
print("output of softmax as a probability distribution:", o.data.view(1,-1))

# loss function
loss_func = nn.MSELoss() #instantiate loss function
L = loss_func(z,y) # calculateMSE loss between output and target
print("Loss:", L)

output of softmax as a probability distribution: 
 0.3886  0.3579  0.2535  0.3448  0.3415  0.3137
[torch.FloatTensor of size 1x6]

Loss: Variable containing:
 2.5171
[torch.FloatTensor of size 1]



  # Remove the CWD from sys.path while we load stuff.


When defining a custom layer, 2 functions need to be implemented:

- "\__init\__" function has to always be inherited first, then define parameters of the layer here as the class variables i.e. self.x

- forward funtion is where we pass an input through the layer, perform operations on inputs using parameters and return the output. The input needs to be an autograd.Variable() so that pytorch can build the computational graph of the layer.


In [25]:
class Log_reg_classifier(nn.Module):
    def __init__(self, in_size,out_size):
        super(Log_reg_classifier,self).__init__() #always call parent's init 
        self.linear = nn.Linear(in_size, out_size) #layer parameters
        
    def forward(self,vect):
        return F.log_softmax(self.linear(vect)) # 

## 5) Optimization

- torch.optim can do optimization

- we build a nn computational graph using torch.nn, compute gradients using torch.autograd, and then feed them into torch.optim to update network parameters

In [26]:
optimizer = optim.SGD(linear_map.parameters(), lr = 1e-2) # instantiate optimizer with model params + learning rate

# epoch loop: we run following until convergence
optimizer.zero_grad() # make gradients zero
L.backward(retain_variables = True)
optimizer.step()
print(L)

Variable containing:
 2.5171
[torch.FloatTensor of size 1]





# Example: Simple Regression

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt

from torch.autograd import Variable

# if gpu is to be used
use_cuda = torch.cuda.is_available()

device = torch.device("cuda:0" if use_cuda else "cpu")

W = 2
b = 0.3

x = torch.arange(100).to(device).unsqueeze(1)

y = W * x + b

###### PARAMS ######
learning_rate = 0.01
num_episodes = 1000

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.linear1 = nn.Linear(1,1)
        
    def forward(self, x):
        output = self.linear1(x)
        return output
    
mynn = NeuralNetwork().to(device)
    
loss_func = nn.MSELoss()
#loss_func = nn.SmoothL1Loss()

optimizer = optim.Adam(params=mynn.parameters(), lr=learning_rate)
#optimizer = optim.RMSprop(params=mynn.parameters(), lr=learning_rate)

for i_episode in range(num_episodes):
    
    predicted_value = mynn(x)
    
    loss = loss_func(predicted_value, y)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if i_episode % 50 == 0:
        print("Episode %i, loss %.4f " % (i_episode, loss.item()))
    
    
plt.figure(figsize=(12,5))
plt.plot(x.cpu().numpy(), y.cpu().numpy(), alpha=0.6, color='green')
plt.plot(x.cpu().numpy(), predicted_value.detach().cpu().numpy(), alpha=0.6, color='red')

if use_cuda:
    plt.savefig("graph.png")
else:
    plt.show()

Episode 0, loss 6516.1152 
Episode 50, loss 2819.0657 
Episode 100, loss 982.0585 
Episode 150, loss 270.0243 
Episode 200, loss 57.2376 
Episode 250, loss 9.3092 
Episode 300, loss 1.2464 
Episode 350, loss 0.2372 
Episode 400, loss 0.1430 
Episode 450, loss 0.1353 
Episode 500, loss 0.1334 
Episode 550, loss 0.1317 
Episode 600, loss 0.1300 
Episode 650, loss 0.1282 
Episode 700, loss 0.1263 
Episode 750, loss 0.1244 
Episode 800, loss 0.1223 
Episode 850, loss 0.1203 
Episode 900, loss 0.1182 
Episode 950, loss 0.1160 


'0.3.1.post2'

## Example (not working...)

In [27]:
# define model
model = Log_reg_classifier(10,2)

# define loss function
loss_func = nn.MSELoss() 

# define optimizer
optimizer = optim.SGD(model.parameters(),lr=1e-1)

# send data through model in minibatches for 10 epochs
for epoch in range(10):
    for minibatch, target in data: # What is in "data"?
        model.zero_grad() # pytorch accumulates gradients, making them zero for each minibatch
        
        #forward pass
        out = model(autograd.Variable(minibatch))
        
        #backward pass 
        L = loss_func(out,target) #calculate loss
        L.backward() # calculate gradients
        optimizer.step() # make an update step

ValueError: too many values to unpack (expected 2)