### 08. Pytorch Basics

This pytorch basics tutorial contains more examples for autograd, loading data, input pipline, pretrained model, save and load model.

#### Table of Contents

- [1. Basic autograd example 1](#heading)
- [2. Basic autograd example 2](#heading)
- [3. Loading data from numpy](#heading)
- [4. 3 ways to stop autograd from tracking history](#heading)
- [5. Empty gradients](#heading)

#### 1. Basic autograd example 1

In [1]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms

In [2]:
# create tensors
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# build a computational graph
# y = 2 * x + 3
y = w * x + b

# compute gradients
y.backward()

# print out the gradients
print(x.grad)
print(w.grad)
print(b.grad)

tensor(2.)
tensor(1.)
tensor(1.)


#### 2. Basic autograd example 2

In [3]:
# create tensors of shape (10, 3) and (10, 2)
x = torch.randn(10, 3)
y = torch.randn(10, 2)
print(f'x: {x}')
print(f'y: {y}')

x: tensor([[-0.6839, -0.2573,  0.6411],
        [-0.4957,  0.7591, -1.2835],
        [ 1.6182,  0.5759,  0.9798],
        [ 1.2377, -0.3729, -1.0701],
        [ 0.5785,  1.0547, -0.3503],
        [-0.7050, -1.5306, -1.2690],
        [-1.0282,  1.0397, -0.2188],
        [ 1.8504, -0.2965, -0.8068],
        [-0.0172,  0.4202,  0.8105],
        [-1.2379, -0.7372, -0.9421]])
y: tensor([[-0.6834,  0.3433],
        [-0.0672, -1.7657],
        [ 0.9741,  1.2022],
        [-0.0669, -0.5877],
        [ 1.2479, -0.4725],
        [-0.6621, -2.0225],
        [-1.3513, -1.3836],
        [-1.4928, -0.7222],
        [ 1.9421, -0.4827],
        [ 0.3823,  0.1048]])


`torch.nn.Linear(in_features, out_features, bias=True)` applies a linear transformation to the incoming data $y=xA^{T}+b$

In [4]:
# build a fully connected layer
linear = nn.Linear(3, 2)
print('w: ', linear.weight)
print('b: ', linear.bias)
print('shape of w: ', linear.weight.size())
print('shape of b: ', linear.bias.size())

w:  Parameter containing:
tensor([[ 0.5139,  0.1776, -0.4014],
        [ 0.2545,  0.0679, -0.0469]], requires_grad=True)
b:  Parameter containing:
tensor([-0.0189, -0.0841], requires_grad=True)
shape of w:  torch.Size([2, 3])
shape of b:  torch.Size([2])


In [5]:
# build loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# forward pass
pred = linear(x)
print(pred)

tensor([[-0.6734, -0.3058],
        [ 0.3763, -0.0985],
        [ 0.5216,  0.3209],
        [ 0.9804,  0.2558],
        [ 0.6063,  0.1512],
        [-0.1437, -0.3079],
        [-0.2749, -0.2650],
        [ 1.2032,  0.4046],
        [-0.2785, -0.0980],
        [-0.4079, -0.4050]], grad_fn=<AddmmBackward>)


In [6]:
# compute loss
loss = criterion(pred, y)
print('loss: ', loss.item())

loss:  1.3554081916809082


In [7]:
# backward pass
loss.backward()

# print out the gradients
print('dL/dw: ', linear.weight.grad)
print('dL/db: ', linear.bias.grad)

dL/dw:  tensor([[ 0.4499, -0.1818, -0.6026],
        [-0.0053,  0.0010, -0.7078]])
dL/db:  tensor([0.1687, 0.5439])


In [8]:
# 1-step gradient descent
# optimizer.step()

# you can also perform gradient descent at the low level
linear.weight.data.sub_(0.01 * linear.weight.grad.data)
linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# print out the loss after 1-step gradient descent
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1-step optimization: ', loss.item())

loss after 1-step optimization:  1.3412503004074097


#### 3. Loading data from numpy

In [9]:
# create a numpy array
x = np.array([[1, 2], [3, 4]])
print('x: ', x)
print(type(x))

x:  [[1 2]
 [3 4]]
<class 'numpy.ndarray'>


In [10]:
# convert the numpy array to a torch tensor
y = torch.from_numpy(x)
print('y: ', y)
print(type(y))

y:  tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
<class 'torch.Tensor'>


In [11]:
# convert the torch tensor to a numpy array
z = y.numpy()
print('z: ', z)
print(type(z))

z:  [[1 2]
 [3 4]]
<class 'numpy.ndarray'>


#### 4. 3 ways to stop autograd from tracking history

In [12]:
x = torch.randn(3, requires_grad=True)
print('x with gradient: ', x)

x with gradient:  tensor([-0.9679,  0.3728, -1.5229], requires_grad=True)


In [13]:
# 3 ways to stop autograd from tracking history
# 1: x.requires_grad_(False)
# 2: x.detach()
# 3: with torch.no_grad():
with torch.no_grad():
    y = x + 2
print('y without gradient: ', y)

y without gradient:  tensor([1.0321, 2.3728, 0.4771])


#### 5. Empty gradients

Whenever we call the backward function then the gradient for this tensor will be accumulated into the dot grad attribute, so their values will be summed up.

In [14]:
weights = torch.ones(4, requires_grad=True)
print('weights: ', weights)

for epoch in range(3):
    print('epoch: %d -------------------' % epoch)
    model_output = (weights*3).sum()
    model_output.backward()

    print('model_output: ', model_output)
    print('weights gradient: ', weights.grad)


weights:  tensor([1., 1., 1., 1.], requires_grad=True)
epoch: 0 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 1 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([6., 6., 6., 6.])
epoch: 2 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([9., 9., 9., 9.])


The gradients are summed up for each epoch. The weights' gradients are incorrect. Before the next iteration and optimization step, we must empty the gradients.

In [15]:
weights = torch.ones(4, requires_grad=True)
print('weights: ', weights)

for epoch in range(3):
    print('epoch: %d -------------------' % epoch)
    model_output = (weights*3).sum()
    model_output.backward()

    print('model_output: ', model_output)
    print('weights gradient: ', weights.grad)

    weights.grad.zero_()

weights:  tensor([1., 1., 1., 1.], requires_grad=True)
epoch: 0 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 1 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 2 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])


You should also add the empty gradient in optimizer:

optimizer = torch.optim.SGD(weights, lr=0.01)

optimizer.step()

optimizer.zero_grad()