## PyTorch Tutorial

PyTorch is a Python library built on top of Torch’s THNN computational
backend.

Its main features are:

- Efficient tensor operations on CPU/GPU,
- automatic on-the-fly differentiation (autograd),
- optimizers,
- data I/O.

“Efficient tensor operations” encompass both standard linear algebra and, as we will see later, deep-learning specific operations (convolution, pooling, etc.)

A key specificity of PyTorch is the central role of autograd to compute
derivatives of anything!



In [1]:
import torch

In [2]:
from IPython.display import IFrame

## Lecture 1 - Tensor Basics and Linear Regression

In [3]:
IFrame("https://fleuret.org/ee559/materials/ee559-slides-1-4-tensors-and-linear-regression.pdf#view=Fit", height=500, width="100%")

In [4]:
# In-place operations are suffixed with an underscore, and a 0d tensor can be
# converted back to a Python scalar with item().

x = torch.empty(2, 5)
print(x)

print(x.size())
print(str(x.size()[0]) + " - " + str(x.size()[1]))

x.fill_(1.125)
print(x)

print(x.mean())
print(x.std())
print(x.sum())
print(x.sum().item())


tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
torch.Size([2, 5])
2 - 5
tensor([[1.1250, 1.1250, 1.1250, 1.1250, 1.1250],
        [1.1250, 1.1250, 1.1250, 1.1250, 1.1250]])
tensor(1.1250)
tensor(0.)
tensor(11.2500)
11.25


In [5]:
x = torch.tensor([[11. , 12. ,13.], [21., 22., 23.]])

print(x)
print(x[1,2])

tensor([[11., 12., 13.],
        [21., 22., 23.]])
tensor(23.)


In [6]:
# PyTorch provides operators for component-wise and vector/matrix operations.
x = torch.tensor([10., 20., 30.])
y = torch.tensor([11., 21., 31.])

m = torch.tensor([[0., 0., 3.], [0., 2., 0.], [1., 0., 0.]])

z1 = x + y
z2 = x * y
z3 = x**2

z4 = m.mv(x)
z5 = m @ x

print(z1)
print(z2)
print(z3)
print(z4)
print(z5)


tensor([21., 41., 61.])
tensor([110., 420., 930.])
tensor([100., 400., 900.])
tensor([90., 40., 10.])
tensor([90., 40., 10.])


In [7]:
# And as in numpy, the : symbol defines a range of values for an index and allows
# to slice tensors.

x = torch.empty(2, 4).random_(10)

print(x)

print(x[0])
print(x[0,:])

print(x[:,0])


tensor([[0., 7., 1., 8.],
        [3., 6., 3., 6.]])
tensor([0., 7., 1., 8.])
tensor([0., 7., 1., 8.])
tensor([0., 3.])


In [8]:
# PyTorch provides interfacing to standard linear operations, such as linear system
# solving or Eigen-decomposition.

y = torch.empty(3).normal_()
m = torch.empty(3, 3).normal_()
print(y)
print(m)

q, _ = torch.lstsq(y,m)
s = torch.mm(m,q)
print(q)
print(s)

tensor([-0.5098, -0.0341,  1.9124])
tensor([[-0.3325,  0.1958, -0.6358],
        [-0.3749, -0.4426,  0.4036],
        [-0.1004, -0.5164, -0.6872]])
tensor([[ 2.0517],
        [-2.6535],
        [-1.0884]])
tensor([[-0.5098],
        [-0.0341],
        [ 1.9124]])


In [9]:
# Linear Regression

## Lecture 2 - High Dimension Tensors

In [10]:
IFrame("https://fleuret.org/ee559/materials/ee559-slides-1-5-high-dimension-tensors.pdf#view=Fit", height=500, width="100%")

A tensor can be of several types:
- torch.float16, torch.float32, torch.float64,
- torch.uint8,
- torch.int8, torch.int16, torch.int32, torch.int64

and can be located either in the CPU’s or in a GPU’s memory

In [11]:
x = torch.zeros(1,3)
print(x.dtype, x.device)

x = x.long()
print(x.dtype, x.device)

x = x.to('cuda')
print(x.dtype, x.device)


torch.float32 cpu
torch.int64 cpu
torch.int64 cuda:0


### Here are some examples from the vast library of tensor operations:
#### Creation
- torch.empty(*size, ...)
- torch.zeros(*size, ...)
- torch.full(size, value, ...)
- torch.tensor(sequence, ...)
- torch.eye(n, ...)
- torch.from_numpy(ndarray)

#### Indexing, Slicing, Joining, Mutating
- torch.Tensor.view(*size)
- torch.cat(inputs, dimension=0)
- torch.chunk(tensor, nb_chunks, dim=0)[source]
- torch.split(tensor, split_size, dim=0)[source]
- torch.index_select(input, dim, index, out=None)
- torch.t(input, out=None)
- torch.transpose(input, dim0, dim1, out=None)

#### Filling
- Tensor.fill_(value)
- torch.bernoulli_(proba)
- torch.normal_([mu, [std]])

#### Pointwise math
- torch.abs(input, out=None)
- torch.add()
- torch.cos(input, out=None)
- torch.sigmoid(input, out=None)
- (+ many operators)

#### Math reduction
- torch.dist(input, other, p=2, out=None)
- torch.mean()
- torch.norm()
- torch.std()
- torch.sum()

#### BLAS and LAPACK Operations
- torch.eig(a, eigenvectors=False, out=None)
- torch.lstsq(B, A, out=None)
- torch.inverse(input, out=None)
- torch.mm(mat1, mat2, out=None)
- torch.mv(mat, vec, out=None)


In [12]:
x = torch.tensor([ [1, 3, 0], [2, 4, 6] ])

x1 = x.t()
x2 = x.view(-1)
x3 = x.view(3, -1)
x4 = x[:, 1:3]
x5 = x.view(1, 2, 3)
x6 = x.view(1, 2, 3).expand(3, 2, 3)

print(x, x.size())
print(x1, x1.size())
print(x2, x2.size())
print(x3, x3.size())
print(x4, x4.size())
print(x5, x5.size())
print(x6, x6.size())


tensor([[1, 3, 0],
        [2, 4, 6]]) torch.Size([2, 3])
tensor([[1, 2],
        [3, 4],
        [0, 6]]) torch.Size([3, 2])
tensor([1, 3, 0, 2, 4, 6]) torch.Size([6])
tensor([[1, 3],
        [0, 2],
        [4, 6]]) torch.Size([3, 2])
tensor([[3, 0],
        [4, 6]]) torch.Size([2, 2])
tensor([[[1, 3, 0],
         [2, 4, 6]]]) torch.Size([1, 2, 3])
tensor([[[1, 3, 0],
         [2, 4, 6]],

        [[1, 3, 0],
         [2, 4, 6]],

        [[1, 3, 0],
         [2, 4, 6]]]) torch.Size([3, 2, 3])


In [13]:
x = torch.tensor([ [ [ 1, 2, 1 ], [ 2, 1, 2 ] ],
                   [ [ 3, 0, 3 ], [ 0, 3, 0 ] ] ])

x1 = x[0:1, :, :]
x2 = x[:, :, 0:2]
x3 = x.transpose(0, 1)
x4 = x.transpose(0, 2)
x5 = x.transpose(1, 2)

print(x, x.size())
print(x1, x1.size())
print(x2, x2.size())
print(x3, x3.size())
print(x4, x4.size())
print(x5, x5.size())


tensor([[[1, 2, 1],
         [2, 1, 2]],

        [[3, 0, 3],
         [0, 3, 0]]]) torch.Size([2, 2, 3])
tensor([[[1, 2, 1],
         [2, 1, 2]]]) torch.Size([1, 2, 3])
tensor([[[1, 2],
         [2, 1]],

        [[3, 0],
         [0, 3]]]) torch.Size([2, 2, 2])
tensor([[[1, 2, 1],
         [3, 0, 3]],

        [[2, 1, 2],
         [0, 3, 0]]]) torch.Size([2, 2, 3])
tensor([[[1, 3],
         [2, 0]],

        [[2, 0],
         [1, 3]],

        [[1, 3],
         [2, 0]]]) torch.Size([3, 2, 2])
tensor([[[1, 2],
         [2, 1],
         [1, 2]],

        [[3, 0],
         [0, 3],
         [3, 0]]]) torch.Size([2, 3, 2])


In [14]:
# PyTorch offers simple interfaces to standard image data-bases.


In [15]:
# Broadcasting automagically expands dimensions by replicating coefficients,
# when it is necessary to perform operations that are “intuitively reasonable”.

# Precisely, broadcasting proceeds as follows:
# 1. If one of the tensors has fewer dimensions than the other, it is reshaped by
# adding as many dimensions of size 1 as necessary in the front; then
# 2. for every dimension mismatch, if one of the two tensors is of size one, it
# is expanded along this axis by replicating coefficients.
# If there is a tensor size mismatch for one of the dimension and neither of them
# is one, the operation fails.

A = torch.tensor([[1.], [2.], [3.], [4.]])
B = torch.tensor([[5., -5., 5., -5., 5.]])
C = A + B


print("A = \n", A, A.size())
print("B =", B, B.size())
print("-"*20)
print("C = \n", C, C.size())


A = 
 tensor([[1.],
        [2.],
        [3.],
        [4.]]) torch.Size([4, 1])
B = tensor([[ 5., -5.,  5., -5.,  5.]]) torch.Size([1, 5])
--------------------
C = 
 tensor([[ 6., -4.,  6., -4.,  6.],
        [ 7., -3.,  7., -3.,  7.],
        [ 8., -2.,  8., -2.,  8.],
        [ 9., -1.,  9., -1.,  9.]]) torch.Size([4, 5])


In [16]:
# To deal with complex operations, PyTorch provides a dimension naming mechanism:
seq = torch.empty(100, 3, 1024, names = [ 'n', 'c', 't' ]).normal_()

mean_1 = seq.mean('n')
mean_2 = seq.mean('c')
mean_3 = seq.mean('t')

print(seq.size())
print("-"*5)
print(mean_1.size())
print(mean_2.size())
print(mean_3.size())
print("-"*20)

time_first = seq.align_to('n', 't', 'c')
print(time_first.size())
print("-"*20)

array = seq.flatten([ 'c', 't' ], 'i')
print(array.size())
print(array.names)




torch.Size([100, 3, 1024])
-----
torch.Size([3, 1024])
torch.Size([100, 1024])
torch.Size([100, 3])
--------------------
torch.Size([100, 1024, 3])
--------------------
torch.Size([100, 3072])
('n', 'i')


# Lecture 3 - Tensor Internals

In [17]:
IFrame("https://fleuret.org/ee559/materials/ee559-slides-1-6-tensor-internals.pdf#view=Fit", height=500, width="100%")

In [18]:
# A tensor is a view of a [part of a] storage, which is a low-level 1d vector.
x = torch.zeros(2, 4)
print(x.storage())
print("-"*10)

q = x.storage()
q[4] = 1.0

print(x)


 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
[torch.FloatStorage of size 8]
----------
tensor([[0., 0., 0., 0.],
        [1., 0., 0., 0.]])


In [19]:
# Multiple tensors can share the same storage. It happens when using operations
# such as view(), expand() or transpose().

y = x.view(2, 2, 2)
print(y)
print("-"*10)

y[1, 1, 0] = 7.0
print(x)
print("-"*10)

y.narrow(0, 1, 1).fill_(3.0)
print(x)


tensor([[[0., 0.],
         [0., 0.]],

        [[1., 0.],
         [0., 0.]]])
----------
tensor([[0., 0., 0., 0.],
        [1., 0., 7., 0.]])
----------
tensor([[0., 0., 0., 0.],
        [3., 3., 3., 3.]])


In [20]:
# The first coefficient of a tensor is the one at storage_offset() in storage().
# Incrementing index k by 1 move by stride(k) elements in the storage.

q = torch.arange(0, 20).storage().float()

x = torch.empty(0).set_(q, storage_offset = 5, size = (3, 2), stride = (4, 1))
print(x)



tensor([[ 5.,  6.],
        [ 9., 10.],
        [13., 14.]])


In [21]:
# We can explicitly create different “views” of the same storage
n = torch.linspace(1, 4, 4)
print(n)

n1 = torch.tensor(0.).set_(n.storage(), 1, (3, 3), (0, 1))
print(n1)

n2 = torch.tensor(0.).set_(n.storage(), 1, (2, 4), (1, 0))
print(n2)
print("-"*10)

# This is in particular how transpositions and broadcasting are implemented.
x = torch.empty(100, 100)
print(x.stride())

y = x.t()
print(y.stride())


tensor([1., 2., 3., 4.])
tensor([[2., 3., 4.],
        [2., 3., 4.],
        [2., 3., 4.]])
tensor([[2., 2., 2., 2.],
        [3., 3., 3., 3.]])
----------
(100, 1)
(1, 100)


In [22]:
# This organization explains the following (maybe surprising) error

x = torch.empty(100, 100)
x.t().view(-1)

# The function reshape() combines view() and contiguous().


RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

# Lecture 3 - Tensor Internals

In [23]:
IFrame("https://fleuret.org/ee559/materials/ee559-slides-4-2-autograd.pdf#view=Fit", height=500, width="100%")

Conceptually, the forward pass is a standard tensor computation, and the DAG
of tensor operations is required only to compute derivatives.

__When executing tensor operations, PyTorch can automatically construct
on-the-fly the graph of operations to compute the gradient of any quantity
with respect to any tensor involved.__

This “autograd” mechanism (Paszke et al., 2017) has two main benefits:

- Simpler syntax: one just needs to write the forward pass as a standard sequence of Python operations,
- Greater flexibility: since the graph is not static, the forward pass can be dynamically modulated.


In [28]:
# A Tensor has a Boolean field requires_grad, set to False by default, 
# which states if PyTorch should build the graph of operations so that gradients with respect to it can be computed.

# The result of a tensorial operation has this flag to True if any of its operand has it to True.

x = torch.tensor([ 1., 2. ])
y = torch.tensor([ 4., 5. ])
z = torch.tensor([ 7., 3. ])

print("x:", x.requires_grad)
print("x+y:", (x + y).requires_grad)

z.requires_grad = True
print("x+z:", (x + z).requires_grad)


x: False
x+y: False
x+z: True


In [30]:
# Only floating point type tensors can have their gradient computed.

x = torch.tensor([1., 10.])
x.requires_grad = True

x = torch.tensor([1, 10])
x.requires_grad = True

# The method requires_grad_(value = True) set requires_grad to value, which is True by default.


RuntimeError: only Tensors of floating point dtype can require gradients

In [None]:
# torch.autograd.grad(outputs, inputs) computes and returns the gradient
# of outputs with respect to inputs.

t = torch.tensor([1., 2., 4.]).requires_grad_()
u = torch.tensor([10., 20.]).requires_grad_()
a = t.pow(2).sum() + u.log().sum()
>>> torch.autograd.grad(a, (t, u))
(tensor([2., 4., 8.]), tensor([0.1000, 0.0500]))
inputs can be a single tensor, but the result is still a [one element] tuple.
If outputs is a tuple, the result is the sum of the gradients of its elements.


In [None]:
The function Tensor.backward() accumulates gradients in the grad fields of
tensors which are not results of operations, the “leaves” in the autograd graph.
>>> x = torch.tensor([ -3., 2., 5. ]).requires_grad_()
>>> u = x.pow(3).sum()
>>> x.grad
>>> u.backward()
>>> x.grad
tensor([27., 12., 75.])
This function is an alternative to torch.autograd.grad(...) and standard for
training models.
