# Thinking in tensors in PyTorch

Hands-on training  by [Piotr Migdał](https://p.migdal.pl) (2019). Version 0.3 for Uniwersytet Śląski.


## Notebook 1 (technical): PyTorch arithmetics

<a href="https://colab.research.google.com/github/stared/thinking-in-tensors-writing-in-pytorch/blob/master/1_tech%20PyTorch%20aritmetics.ipynb" target="_parent">
    <img src="https://colab.research.google.com/assets/colab-badge.svg"/>
</a>


In this chapter, we show the basics of PyTorch API for adding, multiplying and reshapping tensors.

See [Keras or PyTorch as your first deep learning framework](https://deepsense.ai/keras-or-pytorch/) for my (over)view in the Keras vs PyTorch struggle.

For numerics in Python, see:

* [Nicolas P. Rougier's From Python to Numpy](http://www.labri.fr/perso/nrougier/from-python-to-numpy/)
* [SciPy Lecture Notes](http://www.scipy-lectures.org/)

As a general hint, you need to avoid Python loops, unless they are strictly necessary. 

In [1]:
import torch
from torch import nn

In [2]:
# most likely will be False if run on a generic laptop
torch.cuda.is_available()

True

In [3]:
# we work on 1.3.1
torch.__version__

'1.8.1+cu101'

## Arithmetics
PyTorch arithmetics works like `numpy` operations.

Vide: [What is PyTorch?](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py)

In [4]:
a = torch.tensor([[0.5, -2., 1., 3., 0., 0.9], [-1., 0., 10., -5., 4., 4.2]])
b = torch.randn(2, 6)

In [5]:
b

tensor([[-0.4888,  1.4370,  1.3974,  0.2230,  0.3569,  1.5773],
        [ 1.3555, -1.0024,  0.1264, -0.4826,  1.6111,  0.1796]])

In [6]:
a + b

tensor([[ 0.0112, -0.5630,  2.3974,  3.2230,  0.3569,  2.4773],
        [ 0.3555, -1.0024, 10.1264, -5.4826,  5.6111,  4.3796]])

In [7]:
2 * a 

tensor([[  1.0000,  -4.0000,   2.0000,   6.0000,   0.0000,   1.8000],
        [ -2.0000,   0.0000,  20.0000, -10.0000,   8.0000,   8.4000]])

In [8]:
# matrix transposition
b.t()

tensor([[-0.4888,  1.3555],
        [ 1.4370, -1.0024],
        [ 1.3974,  0.1264],
        [ 0.2230, -0.4826],
        [ 0.3569,  1.6111],
        [ 1.5773,  0.1796]])

In [9]:
# matrix multiplication
a.mm(b.t())

tensor([[ 0.3677,  1.5228],
        [21.4006,  9.5198]])

In [10]:
a.mm(b)

RuntimeError: ignored

Note that error messaes are very descriptive.

In [11]:
# equivalent to +
a.add(b)

tensor([[ 0.0112, -0.5630,  2.3974,  3.2230,  0.3569,  2.4773],
        [ 0.3555, -1.0024, 10.1264, -5.4826,  5.6111,  4.3796]])

In [12]:
a

tensor([[ 0.5000, -2.0000,  1.0000,  3.0000,  0.0000,  0.9000],
        [-1.0000,  0.0000, 10.0000, -5.0000,  4.0000,  4.2000]])

In [13]:
# inplace operations
a.add_(b)

tensor([[ 0.0112, -0.5630,  2.3974,  3.2230,  0.3569,  2.4773],
        [ 0.3555, -1.0024, 10.1264, -5.4826,  5.6111,  4.3796]])

In [14]:
a

tensor([[ 0.0112, -0.5630,  2.3974,  3.2230,  0.3569,  2.4773],
        [ 0.3555, -1.0024, 10.1264, -5.4826,  5.6111,  4.3796]])

In [15]:
torch.pow(a, 2)

tensor([[1.2527e-04, 3.1694e-01, 5.7478e+00, 1.0388e+01, 1.2739e-01, 6.1372e+00],
        [1.2637e-01, 1.0048e+00, 1.0254e+02, 3.0059e+01, 3.1485e+01, 1.9181e+01]])

In [16]:
a.pow(2)

tensor([[1.2527e-04, 3.1694e-01, 5.7478e+00, 1.0388e+01, 1.2739e-01, 6.1372e+00],
        [1.2637e-01, 1.0048e+00, 1.0254e+02, 3.0059e+01, 3.1485e+01, 1.9181e+01]])

In [17]:
a.sum(dim=1)

tensor([ 7.9030, 13.9876])

In [18]:
# methods with underscores **change** the object
a.zero_()

tensor([[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])

In [19]:
a

tensor([[0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])

## Reshaping

In [20]:
b.size()

torch.Size([2, 6])

In [21]:
# same as b.size()
b.shape

torch.Size([2, 6])

In [23]:
b

tensor([[-0.4888,  1.4370,  1.3974,  0.2230,  0.3569,  1.5773],
        [ 1.3555, -1.0024,  0.1264, -0.4826,  1.6111,  0.1796]])

In [22]:
# to rearrange array elements we use view
b.view(2, 3, -1)

tensor([[[-0.4888,  1.4370],
         [ 1.3974,  0.2230],
         [ 0.3569,  1.5773]],

        [[ 1.3555, -1.0024],
         [ 0.1264, -0.4826],
         [ 1.6111,  0.1796]]])

In [None]:
# flattening an array into 1-d array
b.view(-1)

In [24]:
b.view(b.size(0), -1)

tensor([[-0.4888,  1.4370,  1.3974,  0.2230,  0.3569,  1.5773],
        [ 1.3555, -1.0024,  0.1264, -0.4826,  1.6111,  0.1796]])

In [25]:
b

tensor([[-0.4888,  1.4370,  1.3974,  0.2230,  0.3569,  1.5773],
        [ 1.3555, -1.0024,  0.1264, -0.4826,  1.6111,  0.1796]])

In [26]:
c = torch.arange(0, 15)
c

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [27]:
# size
c.size()

torch.Size([15])

In [28]:
# steps
c.stride()

(1,)

In [29]:
c2 = c.view(3, 5)
c2

tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14]])

In [30]:
c2.size()

torch.Size([3, 5])

In [31]:
c2.stride()

(5, 1)

What are strides?

* [How to understand numpy strides for layman?](https://stackoverflow.com/questions/53097952/how-to-understand-numpy-strides-for-layman)
* [Memory Buffer and Strides - Numpy Overview from Principles of Performance](https://llllllllll.github.io/principles-of-performance/numpy-overview.html)
* https://fgnt.github.io/python_crashkurs_doc/include/numpy.html

## Variable broadcasting

In [32]:
b2 = torch.tensor([-100., 100.])

In [33]:
# error!
# to add tensors they need to have the same shape
b + b2

RuntimeError: ignored

In [34]:
b2.size()

torch.Size([2])

In [35]:
b2.view(2, 1)

tensor([[-100.],
        [ 100.]])

In [36]:
b2.unsqueeze(1)

tensor([[-100.],
        [ 100.]])

In [37]:
b2.unsqueeze(1).expand_as(b)

tensor([[-100., -100., -100., -100., -100., -100.],
        [ 100.,  100.,  100.,  100.,  100.,  100.]])

In [38]:
b + b2.unsqueeze(1)

tensor([[-100.4888,  -98.5630,  -98.6026,  -99.7770,  -99.6431,  -98.4227],
        [ 101.3555,   98.9976,  100.1264,   99.5174,  101.6111,  100.1796]])

In [39]:
b - torch.tensor([b.mean()])

tensor([[-1.0130,  0.9128,  0.8732, -0.3012, -0.1673,  1.0531],
        [ 0.8313, -1.5266, -0.3978, -1.0068,  1.0869, -0.3446]])

In [40]:
v = torch.tensor([1., 2., 3.])

v1 = v.unsqueeze(0)
v2 = v.unsqueeze(1)

In [41]:
v1

tensor([[1., 2., 3.]])

In [42]:
v2

tensor([[1.],
        [2.],
        [3.]])

In [43]:
(v1 - v2).pow(2).sum()

tensor(12.)

In [44]:
v1 - v2

tensor([[ 0.,  1.,  2.],
        [-1.,  0.,  1.],
        [-2., -1.,  0.]])

In [45]:
v1.size()

torch.Size([1, 3])

In [46]:
v2.size()

torch.Size([3, 1])

In [47]:
v1 - v

tensor([[0., 0., 0.]])

In [48]:
v2 - v

tensor([[ 0., -1., -2.],
        [ 1.,  0., -1.],
        [ 2.,  1.,  0.]])

## Data types

We want to work with:

* `torch.float32` (also know ans `float`, a floating-point number represented by 32 bits)
* `torch.int64` (also know as `long`, an integer represented by 64 bits)


See also:

* [Floating Point Demystified, Part 1](http://blog.reverberate.org/2014/09/what-every-computer-programmer-should.html) by Josh Haberman
* [Floating point visually explained](http://fabiensanglard.net/floating_point_visually_explained/) by Fabien Sanglard
* http://0.30000000000000004.com/

In [49]:
x = torch.tensor([-23., 54.])
x.dtype

torch.float32

In [50]:
y = torch.tensor([-23, 54])
y.dtype

torch.int64

In [51]:
# it used to be an error in 1.2.x, in 1.3.x it's type promotion.
x + y

tensor([-46., 108.])

In [52]:
import numpy as np

x_np = np.random.randn(3, 2)
torch.from_numpy(x_np)  # likely to be float64

tensor([[-1.5291,  0.5705],
        [ 0.7882,  0.6742],
        [-0.5615, -0.4352]], dtype=torch.float64)

## From/to GPU

If you want to use GPU, in Colab use: **Runtime** -> **Change runtime time** -> **Hardware acceleration** -> **GPU**. 

In [53]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

x = x.to(device)

In [None]:
# much better than x.cuda() as it is more flexible

In [None]:
# if you want to turn it to NumPy
x.cpu()