<font color="red" size="10">pytorch (very) fast tutorial </font>

***pytorch*** is a package designed to provide tools for fast development of Gradient-based deep learning (GBDL)

The key data structure in pytorch is the multidimensional array, that is called a <font color="green" size="6">***tensor***</font>

Lets play a little with tensors and their more frequent operations.

In [50]:
import torch

In [51]:
a = torch.randn(3,4)
print(a)

tensor([[ 0.8241, -1.3152,  0.3866, -0.6902],
        [-0.7364,  1.2262,  2.2513, -1.3385],
        [-0.7840,  0.4838, -0.6880, -1.9676]])


Tensors use 0 based indexes. Every element in a tensor is a tensor:

In [52]:
a[1,1]

tensor(1.2262)

Write the way to obtain the inner value of the element at position 1,1

In [53]:
a[1,1].item()

1.226157784461975

Tensors can be accessed with slices

In [54]:
print(a[0,:])
print(a[:,0])

tensor([ 0.8241, -1.3152,  0.3866, -0.6902])
tensor([ 0.8241, -0.7364, -0.7840])


Vectoriced operations can be run with tensors

In [55]:
a = torch.randn(2,2)
print('a', a)
b = torch.randn(2,2)
print('b', b)

a tensor([[ 0.2717, -0.3133],
        [ 0.0792,  1.7926]])
b tensor([[-0.3481, -0.2770],
        [-1.7911,  0.7874]])


In [56]:

c = a + b # elementwise sum operation
print('a + b',c)
d = a * b # elementwise product operation
print('a * b',d)
# Compute e that is the matrix multiplication operation of a and b
e = torch.matmul(a, b)
print('A * B',e)
# Compute f that is the matrix (a)-vector(first column of matrix b) multiplication
f = a - b[:, 0]
print('A - b[:0] = ',f)

a + b tensor([[-0.0764, -0.5904],
        [-1.7119,  2.5800]])
a * b tensor([[-0.0946,  0.0868],
        [-0.1418,  1.4115]])
A * B tensor([[ 0.4666, -0.3220],
        [-3.2382,  1.3896]])
A - b[:0] =  tensor([[0.6199, 1.4778],
        [0.4273, 3.5837]])


A given tensor can be transformed to another tensor with different dimensions (we can arrange its elements in other dimensions, e.g from 1 dimension to 2 or even more dimensions)
Several examples are given bellow. Don't run executing them, try first to figure out what will be the result and then understand the output.

In [57]:
a = torch.randn(2,2) # 2 x 2 dimensions
print('a',a)
b = a.unsqueeze(-1) # 2 x 2 x 1 dimensions
print('a.unsqueeze(-1) =',b)

a tensor([[-0.8182, -0.8174],
        [-0.9671,  0.9302]])
a.unsqueeze(-1) = tensor([[[-0.8182],
         [-0.8174]],

        [[-0.9671],
         [ 0.9302]]])


In [58]:
b = a.unsqueeze(0) # 1 x 2 x 2 dimensions
print('a.unsqueeze(0)',b)

a.unsqueeze(0) tensor([[[-0.8182, -0.8174],
         [-0.9671,  0.9302]]])


In [59]:
c = a.view(4,1) # rearrange data as 4 x 1
print('a.view(4,1)',c)

a.view(4,1) tensor([[-0.8182],
        [-0.8174],
        [-0.9671],
        [ 0.9302]])


In [60]:
a = torch.randn(2) # 1 x 2 dimensions
print('a\n',a)
b = a.unsqueeze(-1) # 1 x 1 x 1 dimensions
print('a.unsqueeze(-1)\n',b)
c = a.expand(3,2) #out dimensions have to be multiple of in dimensions
print('a.expand(3,2)\n',c)

a
 tensor([ 1.7559, -0.4647])
a.unsqueeze(-1)
 tensor([[ 1.7559],
        [-0.4647]])
a.expand(3,2)
 tensor([[ 1.7559, -0.4647],
        [ 1.7559, -0.4647],
        [ 1.7559, -0.4647]])


How can we detect if ***CUDA*** is available in pytorch?

In [61]:
do_I_have_cuda = torch.cuda.is_available()
if do_I_have_cuda:
  print("You have CUDA, turning GPU on")
  device = torch.device('cuda')
  a = a.to(device)
else:
  print("You don't have CUDA, :( why?")
  device = torch.device('cpu')
  a = a.to(device)

You have CUDA, turning GPU on


In ANN (Artificial Neural Networks) there is three basic ways to do backpropagation given there are n examples for training it:

*   ***Full Gradient Descent***: one step for all the n examples
*   ***Stochastic Gradient Descent***: one step for each example
*   ***Mini-batch Stochastic Gradient Descent***: n/m steps for all the n examples, m is the size of the mini-batch

Why mini-batch is so commonly used? Because it provides a more stable gradient estimate and because of computational efficency.

***Autograd***: Automatic differentiation

pytorch contains methods for automatically compute gradients (frequently used in the backpropagation phase during training)

In [62]:
x = torch.randn(1, requires_grad=True)
# x is a tensor that will record gradients
print(x)
y = x.exp() # e^x
print(y)
y.backward() # For every [x1, ..., xk] values used to compute y
             #   dy/dx is computed and stored in x_i.grad
             # Here, dy/dx = e^x = y
print(x.grad, y)

tensor([0.5945], requires_grad=True)
tensor([1.8121], grad_fn=<ExpBackward0>)
tensor([1.8121]) tensor([1.8121], grad_fn=<ExpBackward0>)


The calls of backward accumulate, so if we do not want this, we can de-activated computing grads:

In [63]:
with torch.no_grad():
  # do_all_my_things_here_without_accumulate_gradients
  #
  pass ## just for not to see an error message from python here!

###Lets see an example of how to programming an ANN using pytorch.###

We are going to use the example of the MNIST database, which are examples of hand-written digits. The goal of the ANN will be to recognize handwritten digits.

In [64]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Hyperparameters
input_size = 28 * 28  # Image size (28x28 pixels)
hidden_size = 128     # Number of neurons of the hidden layer
num_classes = 10      # Number of classes (digits 0-9)
num_epochs = 5        # Number of epoch for training
batch_size = 64
learning_rate = 0.001 # Learning pace

# Normalize images
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

An ANN can be programed as a new class that extends from the nn.Module provided by torch.

In [65]:
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = x.view(-1, input_size)  # Put an image as a vector
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

Get an object from this class, define the loss function and which optimizator will be applied.

In [66]:
model = SimpleNN(input_size, hidden_size, num_classes)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

Lets train the ANN!!!

In [67]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Pasar las imágenes a la red
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backpropagation y optimización
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}')

Epoch [1/5], Step [100/938], Loss: 0.4862
Epoch [1/5], Step [200/938], Loss: 0.4553
Epoch [1/5], Step [300/938], Loss: 0.2051
Epoch [1/5], Step [400/938], Loss: 0.3954
Epoch [1/5], Step [500/938], Loss: 0.1284
Epoch [1/5], Step [600/938], Loss: 0.2482
Epoch [1/5], Step [700/938], Loss: 0.2438
Epoch [1/5], Step [800/938], Loss: 0.2427
Epoch [1/5], Step [900/938], Loss: 0.4540
Epoch [2/5], Step [100/938], Loss: 0.1868
Epoch [2/5], Step [200/938], Loss: 0.2323
Epoch [2/5], Step [300/938], Loss: 0.1308
Epoch [2/5], Step [400/938], Loss: 0.3629
Epoch [2/5], Step [500/938], Loss: 0.1971
Epoch [2/5], Step [600/938], Loss: 0.1804
Epoch [2/5], Step [700/938], Loss: 0.1233
Epoch [2/5], Step [800/938], Loss: 0.2368
Epoch [2/5], Step [900/938], Loss: 0.2749
Epoch [3/5], Step [100/938], Loss: 0.2148
Epoch [3/5], Step [200/938], Loss: 0.1055
Epoch [3/5], Step [300/938], Loss: 0.0640
Epoch [3/5], Step [400/938], Loss: 0.0650
Epoch [3/5], Step [500/938], Loss: 0.1687
Epoch [3/5], Step [600/938], Loss:

Lets eval the ANN!!!

In [68]:
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Precision on test-set: {100 * correct / total:.2f}%')

Precision on test-set: 96.90%


<font color="red" size="6">Try to change the optimizer to use a Stochastic Gradient Descent one. </font>

Run the code again and observe if there are any changes in the results provided.

In [69]:
model = SimpleNN(input_size, hidden_size, num_classes)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)

Lets train the ANN!!!

In [70]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Pasar las imágenes a la red
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backpropagation y optimización
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}')

Epoch [1/5], Step [100/938], Loss: 2.2304
Epoch [1/5], Step [200/938], Loss: 2.1448
Epoch [1/5], Step [300/938], Loss: 2.0534
Epoch [1/5], Step [400/938], Loss: 2.0201
Epoch [1/5], Step [500/938], Loss: 1.9247
Epoch [1/5], Step [600/938], Loss: 1.8383
Epoch [1/5], Step [700/938], Loss: 1.6791
Epoch [1/5], Step [800/938], Loss: 1.7120
Epoch [1/5], Step [900/938], Loss: 1.6006
Epoch [2/5], Step [100/938], Loss: 1.4811
Epoch [2/5], Step [200/938], Loss: 1.4385
Epoch [2/5], Step [300/938], Loss: 1.2962
Epoch [2/5], Step [400/938], Loss: 1.2296
Epoch [2/5], Step [500/938], Loss: 1.1723
Epoch [2/5], Step [600/938], Loss: 1.2354
Epoch [2/5], Step [700/938], Loss: 1.0945
Epoch [2/5], Step [800/938], Loss: 0.9625
Epoch [2/5], Step [900/938], Loss: 1.0768
Epoch [3/5], Step [100/938], Loss: 0.8796
Epoch [3/5], Step [200/938], Loss: 0.8670
Epoch [3/5], Step [300/938], Loss: 0.7339
Epoch [3/5], Step [400/938], Loss: 0.8033
Epoch [3/5], Step [500/938], Loss: 0.8821
Epoch [3/5], Step [600/938], Loss:

Lets eval the ANN!!!

In [71]:
model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Precision on test-set: {100 * correct / total:.2f}%')

Precision on test-set: 87.46%
