### 08. Pytorch Basics

This pytorch basics tutorial contains more examples for autograd, loading data, input pipline, pretrained model, save and load model.

#### Table of Contents

- [1. Basic autograd example 1](#heading08-1)
- [2. Basic autograd example 2](#heading08-2)
- [3. Loading data from numpy](#heading08-3)
- [4. 3 ways to stop autograd from tracking history](#heading08-4)
- [5. Empty gradients](#heading08-5)

<a id="heading08-1"></a>

#### 1. Basic autograd example 1

In [1]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms

In [2]:
# create tensors
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# build a computational graph
# y = 2 * x + 3
y = w * x + b

# compute gradients
y.backward()

# print out the gradients
print(x.grad)
print(w.grad)
print(b.grad)

tensor(2.)
tensor(1.)
tensor(1.)


<a id="heading08-2"></a>

#### 2. Basic autograd example 2

In [3]:
# create tensors of shape (10, 3) and (10, 2)
x = torch.randn(10, 3)
y = torch.randn(10, 2)
print(f'x: {x}')
print(f'y: {y}')

x: tensor([[-1.8100,  0.0178,  1.1026],
        [ 0.0946, -0.1424,  0.9076],
        [ 0.6578,  1.9524,  0.1552],
        [-1.9348,  0.9727,  2.1199],
        [-0.6134, -0.2425,  0.8243],
        [-1.1508, -0.5725, -0.9671],
        [-0.5052,  0.0695, -1.0907],
        [ 1.3945,  1.3826, -0.6763],
        [ 0.8802,  2.0845,  0.4626],
        [-1.5163, -0.2028, -0.8273]])
y: tensor([[ 0.6923, -0.2588],
        [-0.6760, -0.3768],
        [ 0.1214,  0.0410],
        [ 0.4356, -0.4424],
        [-0.1600,  0.3881],
        [-0.6632,  1.7332],
        [-0.4212,  0.9490],
        [-2.0136,  1.0912],
        [-1.4429,  2.8246],
        [-0.4048, -0.1409]])


`torch.nn.Linear(in_features, out_features, bias=True)` applies a linear transformation to the incoming data $y=xA^{T}+b$

In [4]:
# build a fully connected layer
linear = nn.Linear(3, 2)
print('w: ', linear.weight)
print('b: ', linear.bias)
print('shape of w: ', linear.weight.size())
print('shape of b: ', linear.bias.size())

w:  Parameter containing:
tensor([[ 0.0281,  0.5275,  0.2074],
        [ 0.2333, -0.1504, -0.4774]], requires_grad=True)
b:  Parameter containing:
tensor([0.2383, 0.5099], requires_grad=True)
shape of w:  torch.Size([2, 3])
shape of b:  torch.Size([2])


In [5]:
# build loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# forward pass
pred = linear(x)
print(pred)

tensor([[ 0.4255, -0.4414],
        [ 0.3540,  0.1201],
        [ 1.3189,  0.2956],
        [ 1.1366, -1.0999],
        [ 0.2641,  0.0098],
        [-0.2965,  0.7893],
        [ 0.0346,  0.9024],
        [ 0.8665,  0.9502],
        [ 1.4585,  0.1809],
        [-0.0828,  0.5817]], grad_fn=<AddmmBackward>)


In [6]:
# compute loss
loss = criterion(pred, y)
print('loss: ', loss.item())

loss:  1.4870531558990479


In [7]:
# backward pass
loss.backward()

# print out the gradients
print('dL/dw: ', linear.weight.grad)
print('dL/db: ', linear.bias.grad)

dL/dw:  tensor([[ 0.5181,  1.2552,  0.0939],
        [-0.0460, -0.5440, -0.2178]])
dL/db:  tensor([ 1.0012, -0.3520])


In [8]:
# 1-step gradient descent
# optimizer.step()

# you can also perform gradient descent at the low level
linear.weight.data.sub_(0.01 * linear.weight.grad.data)
linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# print out the loss after 1-step gradient descent
pred = linear(x)
loss = criterion(pred, y)
print('loss after 1-step optimization: ', loss.item())

loss after 1-step optimization:  1.4540834426879883


<a id="heading08-3"></a>

#### 3. Loading data from numpy

In [9]:
# create a numpy array
x = np.array([[1, 2], [3, 4]])
print('x: ', x)
print(type(x))

x:  [[1 2]
 [3 4]]
<class 'numpy.ndarray'>


In [10]:
# convert the numpy array to a torch tensor
y = torch.from_numpy(x)
print('y: ', y)
print(type(y))

y:  tensor([[1, 2],
        [3, 4]], dtype=torch.int32)
<class 'torch.Tensor'>


In [11]:
# convert the torch tensor to a numpy array
z = y.numpy()
print('z: ', z)
print(type(z))

z:  [[1 2]
 [3 4]]
<class 'numpy.ndarray'>


<a id="heading08-4"></a>

#### 4. 3 ways to stop autograd from tracking history

In [12]:
x = torch.randn(3, requires_grad=True)
print('x with gradient: ', x)

x with gradient:  tensor([ 0.8552,  1.8401, -0.2896], requires_grad=True)


In [13]:
# 3 ways to stop autograd from tracking history
# 1: x.requires_grad_(False)
# 2: x.detach()
# 3: with torch.no_grad():
with torch.no_grad():
    y = x + 2
print('y without gradient: ', y)

y without gradient:  tensor([2.8552, 3.8401, 1.7104])


<a id="heading08-5"></a>

#### 5. Empty gradients

Whenever we call the backward function then the gradient for this tensor will be accumulated into the dot grad attribute, so their values will be summed up.

In [14]:
weights = torch.ones(4, requires_grad=True)
print('weights: ', weights)

for epoch in range(3):
    print('epoch: %d -------------------' % epoch)
    model_output = (weights*3).sum()
    model_output.backward()

    print('model_output: ', model_output)
    print('weights gradient: ', weights.grad)


weights:  tensor([1., 1., 1., 1.], requires_grad=True)
epoch: 0 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 1 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([6., 6., 6., 6.])
epoch: 2 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([9., 9., 9., 9.])


The gradients are summed up for each epoch. The weights' gradients are incorrect. Before the next iteration and optimization step, we must empty the gradients.

In [15]:
weights = torch.ones(4, requires_grad=True)
print('weights: ', weights)

for epoch in range(3):
    print('epoch: %d -------------------' % epoch)
    model_output = (weights*3).sum()
    model_output.backward()

    print('model_output: ', model_output)
    print('weights gradient: ', weights.grad)

    weights.grad.zero_()

weights:  tensor([1., 1., 1., 1.], requires_grad=True)
epoch: 0 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 1 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])
epoch: 2 -------------------
model_output:  tensor(12., grad_fn=<SumBackward0>)
weights gradient:  tensor([3., 3., 3., 3.])


You should also add the empty gradient in optimizer:

optimizer = torch.optim.SGD(weights, lr=0.01)

optimizer.step()

optimizer.zero_grad()