<a href="https://colab.research.google.com/github/mataney/PyTorchCourse/blob/master/1_PyTorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://github.com/pytorch/pytorch/raw/master/docs/source/_static/img/pytorch-logo-dark.png) 

#PyTorch Tutorial

## Who is this course for?

* People with DL basics without hands-on experience.  
* People want to switch from other frameworks to PyTorch.  
* People with past experience with PyTorch hopefully will enjoy later parts of this course.

### prerequisites:
* Deep learning basics  
* previous experience with either another DL framework (TF, Keras, Theano) or Numpy.
* Some experience with notebooks (Jupyter or Colab) will be awesome, but by no means a must.

## What is PyTorch?

Pytorch It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed


## Why PyTorch?

<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1j6FiFB-qUPTQq5GXVR7wy8_bAFQmfchy" />
</p>

### PyTorch Is Based On Python
Not only is that PyTorch is based on this popular programming langauge, it doesn't reinvent the language as was done in TF 1.0. Models are Python classes etc.

### Dynamic Approach To Graph Computation

In the [words](https://www.fast.ai/2017/09/08/introducing-pytorch-for-fastai/) of Jeremy Howard of Fast.ai

`"With a static computation graph library like Tensorflow, once you have declaratively expressed your computation, you send it off to the GPU where it gets handled like a black box. But with a dynamic approach, you can fully dive into every level of the computation, and see exactly what is going on."`

This also means you can debug!

### Easier To Learn And Simpler To Code
PyTorch is considerably easier to learn than any other deep learning library out there because it doesn’t travel far off from many conventional program practices. The documentation of PyTorch is also very brilliant and helpful for beginners.






Source: https://www.analyticsindiamag.com/9-reasons-why-pytorch-will-become-your-favourite-deep-learning-tool/

## A few words about Colaboratory

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Go ahead and run the next cell (using the "play" button or `shift+enter`):

In [0]:
x = 5
y = x + 8
y # the last line is always printed

Some usefull shortcuts:  
(If you are familiar with jupyter notebook shortcuts, mostly, just add M before.)

- `Shift + Enter` -> Run cell and select below
- `cmd\ctrl + M + A` -> Insert cell above.
- `cmd\ctrl + M + B` -> Insert cell below,
- `cmd\ctrl + M + D` -> Delete cell.
- `cmd\ctrl + M + I` -> Interrupt execution.
- `cmd\ctrl + M + .` -> Restart kernel
- `cmd\ctrl + M + M` -> Change cell to markdown
- `cmd\ctrl + M + Y` -> Change cell to code

Colab also offers using GPUs and TPUs as the processing units for notebooks.  
Enable this by `Runtime -> Change runtime type -> Hardware accelerator -> Choose "GPU"`

In [0]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


In [0]:
!nvidia-smi

Thu May 30 08:29:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    16W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

# PyTorch Tensors



While Tensors have a deep geometrical meaning, for our case:   
A Tensor (similarly to NumPy’s ndarrays) n-dimensional data structure containing some sort of scalar type, e.g., floats, ints, etc.

Examples:

rank 0 tensor is a scalar.  
rank 1 tensor is a vector.  
rank 2 tensor is a matrix.  
rank 3 tensor is, well a rank-3 Tensor.  
And so on..

In [0]:
import torch

In [0]:
torch.__version__

'1.1.0'

### Tensors of different dimensions

In [0]:
torch.tensor(1)

tensor(1)

In [0]:
torch.tensor([1, 1, 1, 1, 1])

tensor([1, 1, 1, 1, 1])

In [0]:
torch.tensor([[1, 1, 1], [1, 1, 1]])

tensor([[1, 1, 1],
        [1, 1, 1]])

Tensors additional holds metadata describing the size of the tensor, the type of the elements in contains (dtype), what device the tensor lives on (CPU memory? CUDA memory?)

In [0]:
x = torch.tensor([[1, 1, 1], [1, 1, 1]])

In [0]:
print(x.dtype)
print(x.size())
print(x.device)

torch.int64
torch.Size([2, 3])
cpu


What can we represent with Tensors?

The red value of a pixel is a rank 0 tensor of size `[]`. for example: `x = torch.tensor(211)`  
a pixel is a rank 1 tensor of size `[3]`. for example: `x = torch.tensor([211, 35, 75])`.  
Image is a rank 3 tensor of size `[3, m, n]`.  
A batch of images is a rank 4 tensor of size `[b, 3, m, n]`.

Correspondingly, `b` sentences of max number of words `l` is a `[b, l]` rank 2 tensor. (How do we represent words here?!)

### Tensors operations

A full list of all Pytorch Tensor command is out of the scope of this short tutorial, but you can find a full list [here](https://pytorch.org/docs/stable/tensors.html).


####  `.size()` command

In [0]:
x = torch.tensor([[1, 2, 3], [4, 5, 2], [1, 2, 0], [1, 4, 0]])
x.size()

torch.Size([4, 3])

#### `.view() command`
Resizing: If you want to resize/reshape tensor, you can use ``torch.view``:

In [0]:
batch_size = 128
x = torch.Tensor(batch_size, 1, 8, 8).uniform_(0,1) 
# Notice What is torch.Tensor doing, What is .uniform() doing, what is that _ doing?
x

tensor([[[[0.0224, 0.5473, 0.1301,  ..., 0.8104, 0.4795, 0.1169],
          [0.8711, 0.1058, 0.2307,  ..., 0.2592, 0.7447, 0.8252],
          [0.0328, 0.9294, 0.0173,  ..., 0.0965, 0.2975, 0.1814],
          ...,
          [0.5378, 0.6870, 0.0532,  ..., 0.1323, 0.6397, 0.3537],
          [0.9086, 0.7785, 0.4218,  ..., 0.0026, 0.6026, 0.2018],
          [0.9405, 0.4619, 0.4235,  ..., 0.2185, 0.1650, 0.6001]]],


        [[[0.2866, 0.1342, 0.8596,  ..., 0.6903, 0.5981, 0.9573],
          [0.9737, 0.5249, 0.7475,  ..., 0.2851, 0.3029, 0.0200],
          [0.6805, 0.6752, 0.1373,  ..., 0.7908, 0.4116, 0.6273],
          ...,
          [0.8170, 0.4058, 0.0923,  ..., 0.1232, 0.4805, 0.6503],
          [0.3600, 0.8495, 0.1442,  ..., 0.1406, 0.1660, 0.6563],
          [0.8562, 0.3802, 0.2658,  ..., 0.0469, 0.0970, 0.6353]]],


        [[[0.0314, 0.6246, 0.2731,  ..., 0.4140, 0.0973, 0.7565],
          [0.2044, 0.4815, 0.1652,  ..., 0.3925, 0.8297, 0.0747],
          [0.8592, 0.8964, 0.6740,  ..

Think of this as a batch of 128 8\*8 pixels images of 1 channel with values between 0-1 (Like MNIST).  
If our model expect to get each picture as a one long tensor then:

In [0]:
x.view(batch_size, 1, 64)

tensor([[[0.0224, 0.5473, 0.1301,  ..., 0.2185, 0.1650, 0.6001]],

        [[0.2866, 0.1342, 0.8596,  ..., 0.0469, 0.0970, 0.6353]],

        [[0.0314, 0.6246, 0.2731,  ..., 0.2847, 0.5035, 0.6295]],

        ...,

        [[0.7709, 0.3989, 0.2022,  ..., 0.2984, 0.6023, 0.1148]],

        [[0.7832, 0.1671, 0.0951,  ..., 0.7783, 0.8377, 0.4670]],

        [[0.0278, 0.9214, 0.7377,  ..., 0.7944, 0.5650, 0.2327]]])

Or Simply:

In [0]:
print(x.view(batch_size, 1, -1))
print(x.view(batch_size, 1, -1).size())

tensor([[[0.0224, 0.5473, 0.1301,  ..., 0.2185, 0.1650, 0.6001]],

        [[0.2866, 0.1342, 0.8596,  ..., 0.0469, 0.0970, 0.6353]],

        [[0.0314, 0.6246, 0.2731,  ..., 0.2847, 0.5035, 0.6295]],

        ...,

        [[0.7709, 0.3989, 0.2022,  ..., 0.2984, 0.6023, 0.1148]],

        [[0.7832, 0.1671, 0.0951,  ..., 0.7783, 0.8377, 0.4670]],

        [[0.0278, 0.9214, 0.7377,  ..., 0.7944, 0.5650, 0.2327]]])
torch.Size([128, 1, 64])


Or if we want everything to be concatenated:

In [0]:
print(x.view(-1))
print(x.view(-1).size()) # 128 * 1 * 8 * 8 = 8192

tensor([0.0224, 0.5473, 0.1301,  ..., 0.7944, 0.5650, 0.2327])
torch.Size([8192])


####  Create new Tensors 
We already saw some options, but here are a few more:  
using `rand, rand_like, zeros, ones` etc'

In [0]:
x = torch.rand(5, 3) #This is similar to what we did before (torch.Tensor(batch_size, 1, 8, 8).uniform_(0,1))
x

tensor([[0.1781, 0.9358, 0.5117],
        [0.6625, 0.5836, 0.0556],
        [0.2777, 0.2435, 0.5239],
        [0.3116, 0.7860, 0.9668],
        [0.4547, 0.5733, 0.1565]])

In [0]:
x = torch.randn(5, 3)
x

tensor([[ 0.7941,  0.3381, -1.7987],
        [-1.2812,  1.2838, -2.0384],
        [ 0.0526,  0.4470, -0.5046],
        [ 0.6519, -1.7166, -0.7014],
        [-0.1864,  0.3468, -0.3179]])

In [0]:
x = torch.zeros(5, 3)
x

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [0]:
x = torch.ones(5, 3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [0]:
x = torch.randn_like(x, dtype=torch.float)    
print(x)
print(x.size())

tensor([[-0.0853, -0.7478,  1.5835],
        [ 1.5661, -0.1871, -1.2070],
        [-1.6379,  0.6363, -1.0752],
        [ 1.9427, -1.4612, -0.0936],
        [-0.8668, -0.7588,  0.0227]])
torch.Size([5, 3])


### Operations

There are multiple syntaxes for operations. In the following example, we will take a look at the addition operation.

In [0]:
x = torch.rand(5, 3)
y = torch.rand(5, 3)
print(x)
print(y)

tensor([[0.5401, 0.8365, 0.6008],
        [0.9074, 0.6449, 0.9891],
        [0.0683, 0.3595, 0.8435],
        [0.5144, 0.1852, 0.5656],
        [0.1754, 0.6969, 0.8043]])
tensor([[0.7263, 0.3191, 0.4224],
        [0.3959, 0.9381, 0.5635],
        [0.0013, 0.9569, 0.1686],
        [0.7091, 0.9666, 0.6392],
        [0.7147, 0.4288, 0.2951]])


In [0]:
print(x + y);
print(torch.add(x, y))
result = torch.empty(5, 3); torch.add(x, y, out=result); print(result)
y.add_(x); print(y) # adds x to y

tensor([[1.2665, 1.1556, 1.0232],
        [1.3034, 1.5830, 1.5526],
        [0.0696, 1.3164, 1.0122],
        [1.2235, 1.1518, 1.2049],
        [0.8900, 1.1257, 1.0993]])
tensor([[1.2665, 1.1556, 1.0232],
        [1.3034, 1.5830, 1.5526],
        [0.0696, 1.3164, 1.0122],
        [1.2235, 1.1518, 1.2049],
        [0.8900, 1.1257, 1.0993]])
tensor([[1.2665, 1.1556, 1.0232],
        [1.3034, 1.5830, 1.5526],
        [0.0696, 1.3164, 1.0122],
        [1.2235, 1.1518, 1.2049],
        [0.8900, 1.1257, 1.0993]])
tensor([[1.2665, 1.1556, 1.0232],
        [1.3034, 1.5830, 1.5526],
        [0.0696, 1.3164, 1.0122],
        [1.2235, 1.1518, 1.2049],
        [0.8900, 1.1257, 1.0993]])


Any operation that mutates a tensor in-place is post-fixed with an ``_``.  
    For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.


### Slicing:

You can use standard NumPy-like indexing with all bells and whistles!

In [0]:
print(x[:, 1])

tensor([0.8365, 0.6449, 0.3595, 0.1852, 0.6969])


If you have a one element tensor, use ``.item()`` to get the value as a Python number

In [0]:
x = torch.randn(1)
print(x)
print(x.item()) # This also detaches the gradient tree (More on this later)

tensor([0.3262])
0.3262418508529663


### CUDA Tensors



In [0]:
torch.cuda.is_available()

True

Tensors can be moved onto any device using the ``.to`` method.

In [0]:
device = torch.device("cuda")
y = torch.ones_like(x, device=device)
x = x.to(device)
z = x + y
print(z)
print(z.to("cpu"))

tensor([1.3262], device='cuda:0')
tensor([1.3262])


In [0]:
!nvidia-smi

Thu May 30 08:30:05 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P0    28W /  70W |    771MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

# Autograd: Automatic Differentiation

The distinguishing characteristic of PyTorch when it was originally released was that it provided automatic differentiation on tensors (these days, we have other cool features like TorchScript; but back then, this was it!)

What does automatic differentiation do? It's the machinery that's responsible for training a neural network.

Neural Networks uses backpropagation to calcualte gradients.  
These gradients are later updated using some optimization method.

Central to all neural networks in PyTorch is the `autograd` package.  
The `autograd` package provides automatic differentiation for all operations on Tensors. 

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call ``.backward()`` and have all the gradients computed automatically.

In [0]:
x = torch.tensor([[1, 2], [3, 4]], dtype=torch.float, requires_grad=True)
x

tensor([[1., 2.],
        [3., 4.]], requires_grad=True)

In [0]:
y = x + 2
y

tensor([[3., 4.],
        [5., 6.]], grad_fn=<AddBackward0>)

In [0]:
y.requires_grad

True

Why does `y` require_grad?  
`y` has two inputs, `x` and `2`.  
If there’s a single input to an operation that requires gradient, its output will also require gradient. 

Changing whether a tensor requires_grad or not is especially useful when you want to freeze part of your model, like wheny you want to train only later layers of a network.

In [0]:
print(y.grad_fn)

<AddBackward0 object at 0x7fc7fd82b048>


In [0]:
z = y * y * 3
z

tensor([[ 27.,  48.],
        [ 75., 108.]], grad_fn=<MulBackward0>)

In [0]:
out = z.mean()
out

tensor(64.5000, grad_fn=<MeanBackward0>)

In [0]:
out.backward()

## The Autograd graph we created:

<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1P-b6SlPNuyTE-4H3Uoa_hkHjXn_JPADa" width="75%" height="75%"/>
</p>

## How is the gradient computed:

In [0]:
x.grad

tensor([[4.5000, 6.0000],
        [7.5000, 9.0000]])

Let’s call the ``out`` *Tensor* “$o$”.  
We want to find the gradient of the leaves with respect to $o$.  
$\frac{\partial o}{\partial x_i} = \frac{1}{4}\sum_j 3(x_j+2)^2$,  

Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$,  hence,  
$\frac{\partial o}{\partial x_1} = 4.5$  
$\frac{\partial o}{\partial x_2} = 6$  
$\frac{\partial o}{\partial x_3} = 7.5$  
$\frac{\partial o}{\partial x_4} = 9$

In [0]:
y.grad #Why?

Given all of this, let's go ahead and write our first Neural model.