<p style="align: center;"><img align=center src="https://s8.hostingkartinok.com/uploads/images/2018/08/308b49fcfbc619d629fe4604bceb67ac.jpg" width=500 height=450/></p>

<h3 style="text-align: center;"><b>"Глубокое обучение". Продвинутый поток</b></h3>

<h2 style="text-align: center;"><b>Семинар 6. Основы библиотеки Pythorch </b></h2>


<h2 style="text-align: center;"><b>PyTorch basics: syntax, torch.cuda and torch.autograd</b></h2>

<p style="align: center;"><img src="https://upload.wikimedia.org/wikipedia/commons/9/96/Pytorch_logo.png" width=400 height=100></p>

Hi! In this notebook we will cover the basics of the **PyTorch deep learning framework**. 

<h3 style="text-align: center;"><b>Intro</b></h3>

**Frameworks** are the specific code libraries with their own internal structure and pipelines.

There are many deep learning frameworks nowadays (02/2019). The difference between them is in the internal computation principles. For example, in **[Caffe](http://caffe.berkeleyvision.org/)** and **[Caffe2](https://caffe2.ai/)** you write the code using some "ready blocks" (just like the $LEGO^{TM}$ :). In **[TensorFlow](https://www.tensorflow.org/)** and **[Theano](http://deeplearning.net/software/theano/)** you declare the computation graph at first, then compile it and use it for inference/training (`tf.session()`). By the way, now TensorFlow (since v1.10) has the [Eager Execution](https://www.tensorflow.org/guide/eager), which can be handy for fast prototyping and debugging. **[Keras](https://keras.io/)** is a very popular and useful DL framework that allows to create networks fast and has many demanding features. 

<p style="align: center;"><img src="https://habrastorage.org/web/e3e/c3e/b78/e3ec3eb78d714a7993a6b922911c0866.png" width=500 height=500></p>  
<p style="text-align: center;"><i>Image credit: https://habr.com/post/334380/</i><p>

We will use PyTorch bacause it's been actively developed and supported by the community and [Facebook AI Research](https://research.fb.com/category/facebook-ai-research/).

<h3 style="text-align: center;"><b>Installation</b></h3>

The detailed instruction on how to install PyTorch you can find on the [official PyTorch website](https://pytorch.org/).

<h3 style="text-align: center;">Syntax<b></b></h3>

In [1]:
import torch

Some facts about PyTorch:  
- dynamic computation graph
- handy `torch.nn` and `torchvision` modules for fast neural network prototyping
- even faster than TensorFlow on some tasks
- allows to use GPU easily

If PyTorch was a formula, it would be:  

$$PyTorch = NumPy + CUDA + Autograd$$

(CUDA - [wiki](https://en.wikipedia.org/wiki/CUDA))

Let's see how we can use PyTorch to operate with vectors and tensors.  

Recall that **a tensor** is a multidimensional vector, e.g. :  

`x = np.array([1,2,3])` -- a vector = a tensor with 1 dimension (to be more precise: `(1,)`)  
`y = np.array([[1, 2, 3], [4, 5, 6]])` -- a matrix = a tensor with 2 dimensions (`(2, 3)` in this case)  
`z = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],  
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]],  
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])` -- "a cube" (3, 3, 3) = a tensor with 3 dimensions (`(3, 3, 3)` in this case)

One real example of 3-dimensional tensor is **an image**, it has 3 dimensions: `height`, `width` and the `channel depth` (= 3 for color images, 1 for a greyscale). You can think of it as of parallelepiped consisting of the real numbers.

In PyTorch we will use `torch.Tensor` (`FloatTensor`, `IntTensor`, `ByteTensor`) for all the computations.

All tensor types:

In [2]:
torch.HalfTensor      # 16 бит, floating point
torch.FloatTensor     # 32 бита, floating point
torch.DoubleTensor    # 64 бита, floating point

torch.ShortTensor     # 16 бит, integer, signed
torch.IntTensor       # 32 бита, integer, signed
torch.LongTensor      # 64 бита, integer, signed

torch.CharTensor      # 8 бит, integer, signed
torch.ByteTensor      # 8 бит, integer, unsigned

torch.ByteTensor

We will use only `torch.FloatTensor()` and `torch.IntTensor()`. 

Let's begin to do something!

* Creating the tensor:

In [3]:
a = torch.FloatTensor([1, 2])
a

tensor([1., 2.])

In [4]:
a.shape

torch.Size([2])

In [5]:
b = torch.FloatTensor([[1,2,3], [4,5,6]])
b

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [6]:
b.shape

torch.Size([2, 3])

In [7]:
x = torch.FloatTensor(2,3,4)

In [8]:
x

tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 1.0102e-38, 7.7052e+31, 7.2148e+22],
         [2.5226e-18, 6.4825e-10, 1.0186e-11, 3.0879e-09]],

        [[1.6902e-04, 4.0746e-11, 2.6076e-09, 2.9572e-18],
         [6.7333e+22, 1.7591e+22, 1.7184e+25, 4.3222e+27],
         [6.1972e-04, 7.2443e+22, 1.7728e+28, 7.0367e+22]]])

In [9]:
x = torch.FloatTensor(100)
x

tensor([2.0535e-19, 4.5080e+21, 2.0616e-19, 3.1771e+30, 2.0022e-19, 4.8405e+30,
        1.8174e+31, 7.1377e+31, 5.3740e+19, 3.0478e+32, 1.2706e+31, 1.9684e-19,
        7.1435e+31, 2.7350e+20, 5.3728e+19, 3.0478e+32, 1.6020e-19, 6.9761e+31,
        1.8177e+31, 2.6908e+20, 4.6886e-14, 1.8463e+25, 2.0616e-19, 6.1211e+19,
        5.6541e+05, 2.1459e+20, 9.2489e-04, 4.6316e+27, 1.2118e+25, 6.4600e+19,
        7.2443e+22, 8.9027e-15, 2.6563e+20, 5.4273e-14, 4.6114e+24, 6.9987e+22,
        1.7665e+22, 7.2714e+31, 1.8892e+31, 6.8608e+22, 1.6930e+22, 1.6926e+22,
        1.8062e+28, 2.7262e+20, 2.8376e+20, 1.9436e-19, 7.0376e+28, 1.8970e+31,
        7.2251e+28, 5.2987e-11, 1.4584e-19, 5.0833e+31, 7.7563e+26, 2.0706e-19,
        2.0284e-19, 7.0072e+22, 7.2250e+28, 4.9657e+28, 2.1459e+20, 9.2489e-04,
        2.9514e+29, 1.9012e-19, 1.9435e-19, 6.8419e+19, 1.7418e+28, 6.4600e+19,
        3.1886e-12, 6.8608e+22, 4.6114e+24, 7.1463e+22, 1.8759e+28, 3.9889e+19,
        1.8490e+20, 4.6230e+19, 3.1021e+

In [10]:
x = torch.IntTensor(45, 57, 14, 2)
x.shape

torch.Size([45, 57, 14, 2])

**Note:** if you create `torch.Tensor` with the following constructor it will be filled with the "random trash numbers":

In [11]:
x = torch.IntTensor(3, 2, 4)
x

tensor([[[ 574235236,  808595506,  758329389,  827601714],
         [ 959527481,  775304250,  825701686,  576337719]],

        [[1701061164, 1684956528, 1768124005, 1834972005],
         [ 975336549, 1702195828, 1852121644, 1701734759]],

        [[ 874658338,  845428533,  761411124, 1717843256],
         [1717711917,  862006630, 1697461301, 1647469925]]], dtype=torch.int32)

Here is a way to fill a new tensor with zeroes:

In [12]:
x = torch.IntTensor(3, 2, 4).zero_()
x

tensor([[[0, 0, 0, 0],
         [0, 0, 0, 0]],

        [[0, 0, 0, 0],
         [0, 0, 0, 0]],

        [[0, 0, 0, 0],
         [0, 0, 0, 0]]], dtype=torch.int32)

## Как писать на torch, если знаешь numpy

Все функции на numpy имеют соответствующие функции на torch. Их соответствие вы можете посмотреть здесь:

https://github.com/torch/torch7/wiki/Torch-for-Numpy-users

`np.reshape()` == `torch.view()`:

In [13]:
b

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [14]:
b.view(3, 2)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])

**Note:** `torch.view()` creates a new tensor, one the old one remains unchanged

In [15]:
b.view(-1)

tensor([1., 2., 3., 4., 5., 6.])

In [16]:
b

tensor([[1., 2., 3.],
        [4., 5., 6.]])

* Change a tensor type:

In [17]:
a = torch.FloatTensor([1.5, 3.2, -7])

In [18]:
a.type_as(torch.IntTensor())

tensor([ 1,  3, -7], dtype=torch.int32)

In [19]:
a.type_as(torch.ByteTensor())

tensor([  1,   3, 249], dtype=torch.uint8)

**Note:** `.type_as()` creates a new tensor, the old one remains unchanged

In [20]:
a

tensor([ 1.5000,  3.2000, -7.0000])

* Indexing is just like in `NumPy`:

In [21]:
a = torch.FloatTensor([[100, 20, 35], [15, 163, 534], [52, 90, 66]])
a

tensor([[100.,  20.,  35.],
        [ 15., 163., 534.],
        [ 52.,  90.,  66.]])

In [22]:
a[0, 0]

tensor(100.)

In [23]:
a[0][0]

tensor(100.)

In [24]:
a[0:2, 0:2]

tensor([[100.,  20.],
        [ 15., 163.]])

**Ariphmetics and boolean operations** and their analogues:  

| Оператор | Аналог |
|:-:|:-:|
|`+`| `torch.add()` |
|`-`| `torch.sub()` |
|`*`| `torch.mul()` |
|`/`| `torch.div()` |

* Addition:

In [25]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [26]:
a + b

tensor([[  0.,   0.,   0.],
        [  0.,   0.,   0.],
        [200., 400., 600.]])

In [27]:
a.add(b)

tensor([[  0.,   0.,   0.],
        [  0.,   0.,   0.],
        [200., 400., 600.]])

In [28]:
b = -a
b

tensor([[  -1.,   -2.,   -3.],
        [ -10.,  -20.,  -30.],
        [-100., -200., -300.]])

In [29]:
a + b

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

* Subtraction:

In [30]:
a - b

tensor([[  2.,   4.,   6.],
        [ 20.,  40.,  60.],
        [200., 400., 600.]])

In [31]:
a.sub(b)

tensor([[  2.,   4.,   6.],
        [ 20.,  40.,  60.],
        [200., 400., 600.]])

* Multiplication (elementwise):

In [32]:
a * b

tensor([[-1.0000e+00, -4.0000e+00, -9.0000e+00],
        [-1.0000e+02, -4.0000e+02, -9.0000e+02],
        [-1.0000e+04, -4.0000e+04, -9.0000e+04]])

In [33]:
a.mul(b)

tensor([[-1.0000e+00, -4.0000e+00, -9.0000e+00],
        [-1.0000e+02, -4.0000e+02, -9.0000e+02],
        [-1.0000e+04, -4.0000e+04, -9.0000e+04]])

* Division (elementwise):

In [34]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [35]:
a / b

tensor([[-1., -1., -1.],
        [-1., -1., -1.],
        [ 1.,  1.,  1.]])

Лучше:

In [36]:
a.div(b)

tensor([[-1., -1., -1.],
        [-1., -1., -1.],
        [ 1.,  1.,  1.]])

**Note:** all this operations create new tensors, the old tensors remain unchanged

In [37]:
a

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

In [38]:
b

tensor([[ -1.,  -2.,  -3.],
        [-10., -20., -30.],
        [100., 200., 300.]])

* Comparison operators:

In [39]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [40]:
a == b

tensor([[False, False, False],
        [False, False, False],
        [ True,  True,  True]])

In [41]:
a != b

tensor([[ True,  True,  True],
        [ True,  True,  True],
        [False, False, False]])

In [42]:
a < b

tensor([[False, False, False],
        [False, False, False],
        [False, False, False]])

In [43]:
a > b

tensor([[ True,  True,  True],
        [ True,  True,  True],
        [False, False, False]])

* Using boolean mask indexing:

In [44]:
a[a > b]

tensor([ 1.,  2.,  3., 10., 20., 30.])

In [45]:
b[a == b]

tensor([100., 200., 300.])

Elementwise application of the **universal functions**:

In [46]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])

In [47]:
a.sin()

tensor([[ 0.8415,  0.9093,  0.1411],
        [-0.5440,  0.9129, -0.9880],
        [-0.5064, -0.8733, -0.9998]])

In [48]:
torch.sin(a)

tensor([[ 0.8415,  0.9093,  0.1411],
        [-0.5440,  0.9129, -0.9880],
        [-0.5064, -0.8733, -0.9998]])

In [49]:
a.cos()

tensor([[ 0.5403, -0.4161, -0.9900],
        [-0.8391,  0.4081,  0.1543],
        [ 0.8623,  0.4872, -0.0221]])

In [50]:
a.exp()

tensor([[2.7183e+00, 7.3891e+00, 2.0086e+01],
        [2.2026e+04, 4.8517e+08, 1.0686e+13],
        [       inf,        inf,        inf]])

In [51]:
a.log()

tensor([[0.0000, 0.6931, 1.0986],
        [2.3026, 2.9957, 3.4012],
        [4.6052, 5.2983, 5.7038]])

In [52]:
b = -a
b

tensor([[  -1.,   -2.,   -3.],
        [ -10.,  -20.,  -30.],
        [-100., -200., -300.]])

In [53]:
b.abs()

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

* The sum, mean, max, min:

In [54]:
a.sum()

tensor(666.)

In [55]:
a.mean()

tensor(74.)

Along axis:

In [56]:
a

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

In [57]:
a.sum(dim=0)

tensor([111., 222., 333.])

In [58]:
a.sum(1)

tensor([  6.,  60., 600.])

In [59]:
a.max()

tensor(300.)

In [60]:
a.max(0)

torch.return_types.max(
values=tensor([100., 200., 300.]),
indices=tensor([2, 2, 2]))

In [61]:
a.min()

tensor(1.)

In [62]:
a.min(0)

torch.return_types.min(
values=tensor([1., 2., 3.]),
indices=tensor([0, 0, 0]))

**Note:** the second tensor returned by `.max()` and `.min()` contains the indices of max/min elements along this axis. E.g. in that case `a.min()` returned `(1, 2, 3)` which are the minimum elements along 0 axis (along columns) and their indices along 0 axis are `(0, 0, 0)`.

**Matrix operations**:

* Transpose a tensor:

In [63]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
a

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

In [64]:
a.t()

tensor([[  1.,  10., 100.],
        [  2.,  20., 200.],
        [  3.,  30., 300.]])

It is not not the inplace operation too:

In [65]:
a

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

* Dot product of vectors:

In [66]:
a = torch.FloatTensor([1, 2, 3, 4, 5, 6])
b = torch.FloatTensor([-1, -2, -4, -6, -8, -10])

In [67]:
a.dot(b)

tensor(-141.)

In [68]:
a.shape, b.shape

(torch.Size([6]), torch.Size([6]))

In [69]:
a @ b

tensor(-141.)

In [70]:
type(a)

torch.Tensor

In [71]:
type(b)

torch.Tensor

In [72]:
type(a @ b)

torch.Tensor

* Matrix product:

In [73]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [74]:
a.mm(b)

tensor([[  279.,   558.,   837.],
        [ 2790.,  5580.,  8370.],
        [27900., 55800., 83700.]])

In [75]:
a @ b

tensor([[  279.,   558.,   837.],
        [ 2790.,  5580.,  8370.],
        [27900., 55800., 83700.]])

Remain unchanged:

In [76]:
a

tensor([[  1.,   2.,   3.],
        [ 10.,  20.,  30.],
        [100., 200., 300.]])

In [77]:
b

tensor([[ -1.,  -2.,  -3.],
        [-10., -20., -30.],
        [100., 200., 300.]])

In [78]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1], [-10], [100]])

In [79]:
print(a.shape, b.shape)

torch.Size([3, 3]) torch.Size([3, 1])


In [80]:
a @ b

tensor([[  279.],
        [ 2790.],
        [27900.]])

If we unroll the tensor `b` in an array (`torch.view(-1)`) the multiplication would be like with the column:

In [81]:
b

tensor([[ -1.],
        [-10.],
        [100.]])

In [82]:
b.view(-1)

tensor([ -1., -10., 100.])

In [83]:
a @ b.view(-1)

tensor([  279.,  2790., 27900.])

In [84]:
a.mv(b.view(-1))

tensor([  279.,  2790., 27900.])

**From NumPu to PyTorch conversion**:

In [85]:
import numpy as np

a = np.random.rand(3, 3)
a

array([[0.42654808, 0.68812852, 0.04426251],
       [0.08542813, 0.8064269 , 0.63349253],
       [0.07701885, 0.5320509 , 0.98786146]])

In [86]:
b = torch.from_numpy(a)
b

tensor([[0.4265, 0.6881, 0.0443],
        [0.0854, 0.8064, 0.6335],
        [0.0770, 0.5321, 0.9879]], dtype=torch.float64)

**NOTE!** `a` and `b` have the same data storage, so the changes in one tensor will lead to the changes in another:

In [87]:
b -= b
b

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)

In [88]:
a

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

**From PyTorch to NumPy conversion:**

In [89]:
a = torch.FloatTensor(2, 3, 4)
a

tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 1.0102e-38, 7.7052e+31, 7.2148e+22],
         [2.5226e-18, 6.4825e-10, 1.0186e-11, 3.0879e-09]],

        [[1.0374e-08, 4.1199e-11, 1.7662e-04, 2.9573e-18],
         [6.7333e+22, 1.7591e+22, 1.7184e+25, 4.3222e+27],
         [6.1972e-04, 7.2443e+22, 1.7728e+28, 7.0367e+22]]])

In [90]:
type(a)

torch.Tensor

In [91]:
x = a.numpy()
x

array([[[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00],
        [0.0000000e+00, 1.0101996e-38, 7.7052459e+31, 7.2147959e+22],
        [2.5225907e-18, 6.4824512e-10, 1.0185669e-11, 3.0879161e-09]],

       [[1.0374202e-08, 4.1199245e-11, 1.7662048e-04, 2.9573198e-18],
        [6.7333120e+22, 1.7590538e+22, 1.7184218e+25, 4.3221663e+27],
        [6.1971537e-04, 7.2443192e+22, 1.7728345e+28, 7.0366722e+22]]],
      dtype=float32)

In [92]:
x.shape

(2, 3, 4)

In [93]:
type(x)

numpy.ndarray

Let's write the `forward_pass(X, w)` ($w_0$ is a part of the $w$) for a single neuron (activation = sigmoid) using PyTorch:

In [94]:
def forward_pass(X, w):
    return torch.sigmoid(X @ w)

In [95]:
X = torch.FloatTensor([[-5, 5], [2, 3], [1, -1]])
w = torch.FloatTensor([[-0.5], [2.5]])
result = forward_pass(X, w)
print('result: {}'.format(result))

result: tensor([[1.0000],
        [0.9985],
        [0.0474]])


<h3 style="text-align: center;"><a href="https://ru.wikipedia.org/wiki/CUDA">CUDA</a></h3>

[CUDA documentation](https://docs.nvidia.com/cuda/)

We can use both CPU (Central Processing Unit) and GPU (Graphical Processing Unit) to make the computations with PyTorch. We can switch between them easily, this is one of the most important things in PyTorch framework.

In [96]:
x = torch.FloatTensor(1024, 1024).uniform_()
x

tensor([[0.9731, 0.6161, 0.9821,  ..., 0.5225, 0.0527, 0.0720],
        [0.9962, 0.6491, 0.5754,  ..., 0.5159, 0.0803, 0.7883],
        [0.6841, 0.3443, 0.7369,  ..., 0.7177, 0.2852, 0.0713],
        ...,
        [0.9136, 0.3526, 0.0657,  ..., 0.0989, 0.0727, 0.1867],
        [0.2693, 0.0632, 0.6744,  ..., 0.8068, 0.1094, 0.1529],
        [0.8405, 0.6924, 0.0842,  ..., 0.5012, 0.4305, 0.3297]])

In [97]:
x.is_cuda

False

Place a tensor on GPU:

In [98]:
x = x.cuda()

In [99]:
x

tensor([[0.9731, 0.6161, 0.9821,  ..., 0.5225, 0.0527, 0.0720],
        [0.9962, 0.6491, 0.5754,  ..., 0.5159, 0.0803, 0.7883],
        [0.6841, 0.3443, 0.7369,  ..., 0.7177, 0.2852, 0.0713],
        ...,
        [0.9136, 0.3526, 0.0657,  ..., 0.0989, 0.0727, 0.1867],
        [0.2693, 0.0632, 0.6744,  ..., 0.8068, 0.1094, 0.1529],
        [0.8405, 0.6924, 0.0842,  ..., 0.5012, 0.4305, 0.3297]],
       device='cuda:0')

Let's multiply two tensors on GPU and then move the result on the CPU:

In [100]:
a = torch.FloatTensor(10000, 10000).uniform_()
b = torch.FloatTensor(10000, 10000).uniform_()
c = a.cuda().mul(b.cuda()).cpu()

In [101]:
c

tensor([[0.0794, 0.0034, 0.5671,  ..., 0.0282, 0.2687, 0.4149],
        [0.3832, 0.0647, 0.1959,  ..., 0.0917, 0.6800, 0.4273],
        [0.4692, 0.0724, 0.3606,  ..., 0.4142, 0.8001, 0.0676],
        ...,
        [0.4155, 0.5637, 0.1971,  ..., 0.1677, 0.4523, 0.2210],
        [0.4467, 0.1803, 0.0439,  ..., 0.5187, 0.1803, 0.7448],
        [0.0276, 0.2328, 0.3567,  ..., 0.8654, 0.5719, 0.2312]])

In [102]:
a

tensor([[0.2746, 0.0070, 0.7635,  ..., 0.0911, 0.5771, 0.6652],
        [0.9912, 0.2958, 0.3038,  ..., 0.2277, 0.7651, 0.4630],
        [0.5510, 0.1387, 0.7332,  ..., 0.4642, 0.8126, 0.2007],
        ...,
        [0.4549, 0.9314, 0.3123,  ..., 0.8497, 0.7493, 0.3070],
        [0.6077, 0.3341, 0.9720,  ..., 0.9893, 0.4934, 0.9062],
        [0.7927, 0.4431, 0.7530,  ..., 0.9017, 0.6757, 0.4972]])

Tensors placed on CPU and tensors placed on GPU are unavailable for each other:

In [103]:
a = torch.FloatTensor(10000, 10000).uniform_().cpu()
b = torch.FloatTensor(10000, 10000).uniform_().cuda()

In [104]:
a + b

RuntimeError: expected device cpu but got device cuda:0

Example of working with GPU:

In [105]:
x = torch.FloatTensor(5, 5, 5).uniform_()

# check for CUDA availability (NVIDIA GPU)
if torch.cuda.is_available():
    # get the CUDA device name
    device = torch.device('cuda')          # CUDA-device object
    y = torch.ones_like(x, device=device)  # create a tensor on GPU
    x = x.to(device)                       # or just `.to("cuda")`
    z = x + y
    print(z)
    # you can set the type while `.to` operation
    print(z.to("cpu", torch.double))

tensor([[[1.6947, 1.1507, 1.5808, 1.2840, 1.7074],
         [1.8771, 1.2812, 1.0935, 1.5061, 1.0148],
         [1.8046, 1.1608, 1.4583, 1.7745, 1.2552],
         [1.4301, 1.4597, 1.7526, 1.9911, 1.4498],
         [1.5138, 1.0290, 1.4968, 1.4551, 1.4548]],

        [[1.4192, 1.7695, 1.1023, 1.5734, 1.8456],
         [1.7558, 1.0259, 1.2331, 1.2090, 1.8653],
         [1.0012, 1.3293, 1.5243, 1.2525, 1.9062],
         [1.1668, 1.4368, 1.4571, 1.0064, 1.4582],
         [1.4004, 1.7266, 1.3100, 1.4954, 1.0988]],

        [[1.6061, 1.7697, 1.8356, 1.6310, 1.2260],
         [1.1264, 1.2361, 1.0584, 1.3190, 1.4769],
         [1.2713, 1.8968, 1.9463, 1.8191, 1.0104],
         [1.6125, 1.6036, 1.4592, 1.3846, 1.4069],
         [1.0058, 1.5535, 1.1059, 1.2294, 1.8487]],

        [[1.0308, 1.5804, 1.6960, 1.4590, 1.9801],
         [1.8521, 1.0146, 1.1709, 1.0046, 1.2896],
         [1.1998, 1.9227, 1.6386, 1.8457, 1.9085],
         [1.5095, 1.1475, 1.3028, 1.1943, 1.2799],
         [1.7321, 1.0516,

In [107]:
torch.cuda.get_device_name(0)

'GeForce GTX 1060 3GB'

In [108]:
torch.cuda.current_device()

0

<h3 style="text-align: center;">Autograd<b></b></h3>

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Importing the `Variable()` class:

In [109]:
from torch.autograd import Variable

Now we will convert `torch.Tensor()` into `torch.Variable()` and we will get just the same tensor but the possibility to calculate the gradients with respect to this tensor.

If `а` is a tensor wrapped into `Variable()`, then `a.backward()` calculates the gradients with respect to all the tensors on which the tensor `a` depends.

**Note:** If you use `pytorch 0.4.0` version (or newer), then `torch.Tensor` and `torch.Variable()` were merged into `torch.Tensor`, so one doesn't need to use `Variable()` any more. (`torch.Variable()` is deprecated).

The examples:

In [110]:
from torch.autograd import Variable

In [111]:
dtype = torch.float
device = torch.device("cuda:0")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 3, 3, 10

# Create random Tensors to hold input and outputs.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Tensors during the backward pass.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

# x = torch.FloatTensor(3, 1).uniform_()
# y = torch.FloatTensor(3, 1).uniform_()
# w = torch.FloatTensor(3, 3).uniform_() 
# b = torch.FloatTensor(3, 1).uniform_()

# x = Variable(x, requires_grad=True)
# y = Variable(x, requires_grad=False)
# w = Variable(w, requires_grad=True)
# b = Variable(b, requires_grad=True)

y_pred = (x @ w1).clamp(min=0).mm(w2)

loss = (y_pred - y).pow(2).sum()
# calculate the gradients
loss.backward()

In [112]:
print( Variable((y_pred - y).pow(2).sum()) )

tensor(6174.2773, device='cuda:0')


In [113]:
loss.grad

In [114]:
w1.grad

tensor([[   97.4570, -2925.8645,   596.7159],
        [  766.4255,   722.7627, -1405.3796],
        [ -195.1360,   232.7071,  -463.4208]], device='cuda:0')

In [115]:
b.grad

In [116]:
y.grad

In [117]:
loss.grad

**NOTE:** the gradients are placed into the `.grad` field of tensors (variables) on which gradients were calculated. Gradients *are not placed* in the variable `loss` here!

Here is how to get `torch.Tensor()` from the `Variable()` (`.data` field):

In [118]:
x

tensor([[ 5.9332e-01,  3.5973e-01, -1.8444e-01],
        [-1.0679e+00,  2.5892e-01, -1.9140e-01],
        [ 7.7450e-01, -1.2819e+00, -1.2770e-01],
        [-1.3469e+00, -7.6504e-01,  8.5154e-02],
        [-8.8549e-02,  3.9465e-01, -1.0484e-01],
        [-1.7183e+00, -2.9846e-01, -9.0984e-02],
        [-1.2743e+00,  1.7497e+00,  1.9883e-01],
        [ 4.3831e-01, -1.4057e+00,  5.9739e-01],
        [-9.3755e-01,  5.3619e-02,  3.3065e-01],
        [-1.5487e+00,  3.6972e-02, -8.4198e-01],
        [-7.5560e-01, -7.6131e-01,  7.3310e-01],
        [ 1.3656e+00,  5.3750e-01,  8.4192e-01],
        [-6.7979e-01, -1.2945e+00, -3.7669e-01],
        [-1.5007e+00, -5.8721e-01, -2.6131e-01],
        [-1.9024e-01, -5.1019e-01, -1.1752e+00],
        [-1.8995e-01,  4.9888e-01, -7.9220e-01],
        [ 1.9261e+00, -1.2423e-01, -5.5393e-01],
        [-1.9867e+00,  7.6886e-01,  9.1920e-01],
        [-1.4855e-01, -1.1572e+00, -6.0437e-01],
        [-1.5216e-01,  9.0570e-01,  1.2173e-01],
        [ 7.8091e-01

In [119]:
x.data

tensor([[ 5.9332e-01,  3.5973e-01, -1.8444e-01],
        [-1.0679e+00,  2.5892e-01, -1.9140e-01],
        [ 7.7450e-01, -1.2819e+00, -1.2770e-01],
        [-1.3469e+00, -7.6504e-01,  8.5154e-02],
        [-8.8549e-02,  3.9465e-01, -1.0484e-01],
        [-1.7183e+00, -2.9846e-01, -9.0984e-02],
        [-1.2743e+00,  1.7497e+00,  1.9883e-01],
        [ 4.3831e-01, -1.4057e+00,  5.9739e-01],
        [-9.3755e-01,  5.3619e-02,  3.3065e-01],
        [-1.5487e+00,  3.6972e-02, -8.4198e-01],
        [-7.5560e-01, -7.6131e-01,  7.3310e-01],
        [ 1.3656e+00,  5.3750e-01,  8.4192e-01],
        [-6.7979e-01, -1.2945e+00, -3.7669e-01],
        [-1.5007e+00, -5.8721e-01, -2.6131e-01],
        [-1.9024e-01, -5.1019e-01, -1.1752e+00],
        [-1.8995e-01,  4.9888e-01, -7.9220e-01],
        [ 1.9261e+00, -1.2423e-01, -5.5393e-01],
        [-1.9867e+00,  7.6886e-01,  9.1920e-01],
        [-1.4855e-01, -1.1572e+00, -6.0437e-01],
        [-1.5216e-01,  9.0570e-01,  1.2173e-01],
        [ 7.8091e-01

<h3 style="text-align: center;">Further reading:<b></b></h3>

*1). Official PyTorch tutorials: https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py*

*2). arXiv article about the deep learning frameworks comparison: https://arxiv.org/pdf/1511.06435.pdf*

*3). Useful repo with different tutorials: https://github.com/yunjey/pytorch-tutorial*

*4). Facebook AI Research (main contributor of PyTorch) website: https://facebook.ai/developers/tools*