## PyTorch Tutorial

IFT6135 – Representation Learning

A Deep Learning Course, January 2020

By Chin-Wei Huang 

(Adapted from Sandeep Subramanian's MILA tutorial)

## An introduction to the PyTorch neural network library

### `torch.nn` & `torch.optim`

In [0]:
import numpy as np

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torch.nn.functional as F

### torch.nn

Neural networks can be constructed using the `torch.nn` package.

Provides pretty much all neural network related functionalities such as :

1. Linear layers - `nn.Linear`, `nn.Bilinear`
2. Convolution Layers - `nn.Conv1d`, `nn.Conv2d`, `nn.Conv3d`, `nn.ConvTranspose2d`
3. Nonlinearities - `nn.Sigmoid`, `nn.Tanh`, `nn.ReLU`, `nn.LeakyReLU`
4. Pooling Layers - `nn.MaxPool1d`, `nn.AveragePool2d`
4. Recurrent Networks - `nn.LSTM`, `nn.GRU`
5. Normalization - `nn.BatchNorm2d`
6. Dropout - `nn.Dropout`, `nn.Dropout2d`
7. Embedding - `nn.Embedding`
8. Loss Functions - `nn.MSELoss`, `nn.CrossEntropyLoss`, `nn.NLLLoss`

Instances of these classes will have an `__call__` function built-in that can be used to run an input through the layer.

### Linear, Bilinear & Nonlinearities

In [2]:
x = torch.randn(32, 10)
y = torch.randn(32, 30)

sigmoid = nn.Sigmoid()
#torch.sigmoid

linear = nn.Linear(in_features=10, out_features=20, bias=True)
output_linear = linear(x)
print('Linear output size : ', output_linear.size())

bilinear = nn.Bilinear(in1_features=10, in2_features=30, out_features=50, bias=True)
output_bilinear = bilinear(x, y)
print('Bilinear output size : ', output_bilinear.size())

Linear output size :  torch.Size([32, 20])
Bilinear output size :  torch.Size([32, 50])


### Convolution, BatchNorm & Pooling Layers

In [5]:
x = torch.randn(10, 3, 28, 28)

conv = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), stride=1, padding=1, bias=True)
#bn = nn.BatchNorm2d(num_features=32)
pool = nn.MaxPool2d(kernel_size=(2, 2), stride=2)

output_conv = conv(x)
outpout_pool = pool(conv(x))

print('Conv output size : ', output_conv.size())
print('Pool output size : ', outpout_pool.size())

Conv output size :  torch.Size([10, 32, 28, 28])
Pool output size :  torch.Size([10, 32, 14, 14])


### Recurrent, Embedding & Dropout Layers

In [12]:
inputs = [[1, 2, 3], [1, 0, 4], [1, 2, 4], [1, 4, 0], [1, 3, 3]]
x = torch.LongTensor(inputs)

embedding = nn.Embedding(num_embeddings=5, embedding_dim=20, padding_idx=1)
# num_embedding: vocabulary size
# embedding_dim: dimensionality of the embedding
# padding_idx: If given, pads the output with the embedding 
# .            vector at padding_idx (initialized to zeros) 
# .            whenever it encounters the index.
drop = nn.Dropout(p=0.5)
rnn = nn.RNN(input_size=20, hidden_size=50, num_layers=2, batch_first=True, bidirectional=True, dropout=0.3)
# batch_first=True -> x: batch_size x sequence_length x embedding_dim

# equiv .forward
emb = drop(embedding(x))
rnn_h, rnn_h_t = rnn(emb)

print('Embedding size : ', emb.size())
print('RNN hidden states size : ', rnn_h.size()) # batch_size, sequence_length, num_directions * hiddien_size
print('RNN last hidden state size : ', rnn_h_t.size()) # num_layers * num_directions, batch_size, hiddien_size
print(emb[1,0])


Embedding size :  torch.Size([5, 3, 20])
RNN hidden states size :  torch.Size([5, 3, 100])
RNN last hidden state size :  torch.Size([4, 5, 50])
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       grad_fn=<SelectBackward>)


### torch.nn.functional

Using the above classes requires defining an instance of the class and then running inputs through the instance.

The functional API provides users a way to use these classes in a `functional` way. Such as

`import torch.nn.functional as F`

1. Linear layers - `F.linear(input=x, weight=W, bias=b)`
2. Convolution Layers - `F.conv2d(input=x, weight=W, bias=b, stride=1, padding=0, dilation=1, groups=1)`
3. Nonlinearities - `F.sigmoid(x), F.tanh(x), F.relu(x), F.softmax(x)`
4. Dropout - `F.dropout(x, p=0.5, training=True)`

### A few examples of the functional API

In [14]:
x = torch.randn(10, 3, 28, 28)
filters = torch.randn(32, 3, 3, 3)
conv_out = F.relu(F.dropout(F.conv2d(input=x, weight=filters, padding=1), p=0.5, training=True))

print('Conv output size : ', conv_out.size())

Conv output size :  torch.Size([10, 32, 28, 28])


### torch.nn.init

Provides a set of functions for standard weight initialization techniques

`import torch.nn.init as init`

1. Calculate the gain of a layer based on the activation function - `init.calculate_gain('sigmoid')`
2. Uniform init - `init.uniform(tensor, low, high)`
3. Xavier uniform - `init.xavier_uniform(tensor, gain=init.calculate_gain('sigmoid'))`
4. Xavier normal - `init.xavier_normal(tensor, gain=init.calculate_gain('tanh'))`
5. Orthogonal - `init.orthogonal(tensor, gain=init.calculate_gain('tanh'))`
6. Kaiming normal - `init.kaiming_normal(tensor, mode='fan_in')`

### Initializing convolution kernels

In [32]:
conv_layer = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=(3, 3), padding=1)
for k,v in conv_layer.named_parameters():
    print(k)
    if k == 'weight':
        init.xavier_normal_(v)

weight
bias


### torch.optim

Provides implementations of standard stochastic optimization techniques

`import torch.optim as optim`

    W1 = Variable(torch.randn(10, 20), requires_grad=True)
    W2 = Variable(torch.randn(10, 20), requires_grad=True)

1. SGD - `optim.SGD([W1, W2], lr=0.01, momentum=0.9, dampening=0, weight_decay=1e-2, nesterov=True)`
2. Adam - `optim.Adam([W1, W2], lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)`

#### Learning Rate Scheduling

`optim.lr_scheduler`

1. `optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)`
2. `optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=True, threshold=1e-04, threshold_mode='rel', min_lr=1e-05, eps=1e-08)`

### We'll look at how to use `torch.optim` in the following tutorial