<a href="https://colab.research.google.com/github/mataney/PyTorchCourse/blob/master/notebooks/1_PyTorch_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
# Try to run this by pressing the "play" button
print('Working')

Working
Working


![](https://github.com/pytorch/pytorch/raw/master/docs/source/_static/img/pytorch-logo-dark.png) 

## Setting your environment:

Please go to https://bit.ly/2XPl6qX (Case sensitive) or https://github.com/mataney/PyTorchCourse/tree/master/notebooks

Keep this tab open as we will come back to this.

As we are using a google product, that is Google Colaboratory, you should log to your google account first (Sorry about this, nothing I can do about it).

Choose the first notebook `1_PyTorch_tutorial.ipynb` and click the blue `open in colab` button on the top of the notebook (If for some reason there's a problem loading it click `raw` and then copy the url from the first `href` and paste it to your browser).

Then run the first cell by hovering over it and pressing the "play" button.
If you haven't logged in with your google account first, It might ask you to log to your google account now.

Then, try to run the cell again. It should print "Working".

Hooray!

#PyTorch Tutorial

## Who is this course for?

* People with DL basics without hands-on experience.  
* People who want to switch from other frameworks to PyTorch.  
* People with past experience with PyTorch hopefully will enjoy later parts of this course.

### prerequisites:
* Deep learning basics  
* previous experience with either another DL framework (TF, Keras, Theano) or Numpy.
* Some experience with notebooks (Jupyter or Colab) will be awesome, but by no means a must.

## What is PyTorch?

Pytorch It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A replacement for NumPy to use the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed


## Why PyTorch?

<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1j6FiFB-qUPTQq5GXVR7wy8_bAFQmfchy" />
</p>

### PyTorch Is Based On Python
Not only is that PyTorch is based on this popular programming langauge, it doesn't reinvent the language as was done in TF 1.0. Models are Python classes etc.

### Dynamic Approach To Graph Computation

In a static computational graph framework (like TensorFlow) you define graph statically before a model can run. All communication with outer world is performed via `tf.Session` object and `tf.Placeholder` which are tensors that will be substituted by external data at runtime.

In dynamic computational graphs (Like used in PyTorch) you can define, change and execute nodes as you go, no special session interfaces or placeholders.

In the [words](https://www.fast.ai/2017/09/08/introducing-pytorch-for-fastai/) of Jeremy Howard of Fast.ai

`"With a static computation graph library like Tensorflow, once you have declaratively expressed your computation, you send it off to the GPU where it gets handled like a black box. But with a dynamic approach, you can fully dive into every level of the computation, and see exactly what is going on."`

This also means you can debug!

### Easier To Learn And Simpler To Code
PyTorch is considerably easier to learn than any other deep learning library out there because it doesn’t travel far off from many conventional program practices. The documentation of PyTorch is also very helpful.






Source: https://www.analyticsindiamag.com/9-reasons-why-pytorch-will-become-your-favourite-deep-learning-tool/

## A few words about Colaboratory

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Go ahead and run the next cell (using the "play" button or `shift+enter`):

In [6]:
x = 5
y = x + 8
y # the last line is always printed

13

Some usefull shortcuts:  
(If you are familiar with jupyter notebook shortcuts, mostly, just add M before.)

- `Shift + Enter` -> Run cell and select below
- `cmd\ctrl + M + A` -> Insert cell above.
- `cmd\ctrl + M + B` -> Insert cell below,
- `cmd\ctrl + M + D` -> Delete cell.
- `cmd\ctrl + M + I` -> Interrupt execution.
- `cmd\ctrl + M + .` -> Restart kernel
- `cmd\ctrl + M + M` -> Change cell to markdown
- `cmd\ctrl + M + Y` -> Change cell to code

Colab also offers using GPUs and TPUs as the processing units for notebooks.  
Enable this by `Runtime -> Change runtime type -> Hardware accelerator -> Choose "GPU"`

In [7]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


In [8]:
!nvidia-smi

Wed Jul 10 09:50:04 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8    16W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

# PyTorch Tensors



While Tensors have a deep geometrical meaning, for our case:   
A Tensor (similarly to NumPy’s ndarrays) n-dimensional data structure containing some sort of scalar type, e.g., floats, ints, etc.

Examples:

rank 0 tensor is a scalar.  
rank 1 tensor is a vector.  
rank 2 tensor is a matrix.  
rank 3 tensor is, well a rank-3 Tensor.  
And so on..

In [0]:
import torch

In [10]:
torch.__version__

'1.1.0'

### Tensors of different dimensions

In [11]:
torch.tensor(1)

tensor(1)

In [12]:
torch.tensor([1, 1, 1, 1, 1])

tensor([1, 1, 1, 1, 1])

In [13]:
torch.tensor([[1, 1, 1], [1, 1, 1]])

tensor([[1, 1, 1],
        [1, 1, 1]])

What can we represent with Tensors?

The red value of a pixel is a rank 0 tensor of size `[]`. for example: `x = torch.tensor(211)`  
a pixel is a rank 1 tensor of size `[3]`. for example: `x = torch.tensor([211, 35, 75])`.  
Image is a rank 3 tensor of size `[3, m, n]`.  
A batch of images is a rank 4 tensor of size `[b, 3, m, n]`.

### Tensors also hold metadata

In [14]:
x = torch.zeros(size=(3,2), dtype=torch.float32, device=torch.device('cpu'), requires_grad=False)
print(x)
print(x.dtype)
print(x.size()) #or x.shape
print(x.device)
print(x.requires_grad)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
torch.float32
torch.Size([3, 2])
cpu
False


#### Possible Types:

<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1wSAd_92-mrw2YvxHQizBevpPDqVLdOil" />
</p>

#### Possible Devices:
The `torch.device` contains a device type ('cpu' or 'cuda') and optional device ordinal for the device type. For example:

In [15]:
torch.device('cpu')

device(type='cpu')

In [16]:
torch.device('cuda')  # current cuda device

device(type='cuda')

In [17]:
torch.device('cuda:0')

device(type='cuda', index=0)

#### Possible requires_grad:

`True` and `False`

More about `requires_grad` later.


You can think of it as numpy ndarrays (Which hold the values, with the corresponding type and shape), but with 2 additional metadata:  
- Device
- requires_grad

Notice `dtype=torch.float32`, `device=torch.device('cpu')` and `requires_grad=False` are all the default arguments when creating a new tensor.  
So the initialization above is the same as:

In [18]:
x = torch.zeros((3,2))
print(x)
print(x.dtype)
print(x.size())
print(x.device)
print(x.requires_grad)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
torch.float32
torch.Size([3, 2])
cpu
False


### Tensors operations

Full list [here](https://pytorch.org/docs/stable/tensors.html).
Here we will look at some of the more common operations.

####  `.size()` command

In [19]:
x = torch.tensor([[1, 2, 3], [4, 5, 2], [1, 2, 0], [1, 4, 0]])
x.size()
x

tensor([[1, 2, 3],
        [4, 5, 2],
        [1, 2, 0],
        [1, 4, 0]])

#### `.view() command`
Resizing: If you want to resize/reshape tensor, you can use ``torch.view``:

In [20]:
batch_size = 128
x = torch.empty(batch_size, 1, 8, 8).uniform_(0,1) 
# Notice What is torch.empty doing, What is .uniform() doing, what is that _ doing?
x

tensor([[[[0.4852, 0.7439, 0.1558,  ..., 0.9562, 0.7388, 0.6812],
          [0.2603, 0.4302, 0.0422,  ..., 0.2092, 0.3912, 0.5243],
          [0.8465, 0.5084, 0.5628,  ..., 0.3292, 0.6882, 0.4954],
          ...,
          [0.6174, 0.3276, 0.8180,  ..., 0.2425, 0.5771, 0.3050],
          [0.2918, 0.2426, 0.9365,  ..., 0.8725, 0.4395, 0.6637],
          [0.1778, 0.3941, 0.5332,  ..., 0.4496, 0.8314, 0.4578]]],


        [[[0.7804, 0.5637, 0.7163,  ..., 0.1749, 0.9073, 0.3180],
          [0.8585, 0.5553, 0.4480,  ..., 0.9352, 0.7989, 0.3535],
          [0.7389, 0.9841, 0.5836,  ..., 0.4869, 0.1945, 0.3203],
          ...,
          [0.1719, 0.1185, 0.6804,  ..., 0.6511, 0.2461, 0.6553],
          [0.3626, 0.9266, 0.4068,  ..., 0.6648, 0.2418, 0.7421],
          [0.1657, 0.8024, 0.7070,  ..., 0.4728, 0.8675, 0.3725]]],


        [[[0.0758, 0.4245, 0.2073,  ..., 0.4605, 0.5546, 0.9666],
          [0.3701, 0.6401, 0.9811,  ..., 0.0348, 0.1087, 0.0419],
          [0.8633, 0.4895, 0.0962,  ..

Think of this as a batch of 128 8\*8 pixels images of 1 channel with values between 0-1.  
If our model expect to get each picture as a one long tensor then:

In [21]:
print(x.view(batch_size, 1, -1))
print(x.view(batch_size, 1, -1).size())

tensor([[[0.4852, 0.7439, 0.1558,  ..., 0.4496, 0.8314, 0.4578]],

        [[0.7804, 0.5637, 0.7163,  ..., 0.4728, 0.8675, 0.3725]],

        [[0.0758, 0.4245, 0.2073,  ..., 0.0955, 0.5980, 0.6697]],

        ...,

        [[0.9368, 0.4777, 0.4154,  ..., 0.5935, 0.8413, 0.3498]],

        [[0.5912, 0.6734, 0.9960,  ..., 0.1908, 0.9930, 0.7310]],

        [[0.9980, 0.5925, 0.0476,  ..., 0.9403, 0.9469, 0.2841]]])
torch.Size([128, 1, 64])


Or if we want everything to be concatenated:

In [23]:
print(x.view(-1))
print(x.view(-1).size()) # 128 * 1 * 8 * 8 = 8192

tensor([0.4852, 0.7439, 0.1558,  ..., 0.9403, 0.9469, 0.2841])
torch.Size([8192])


Can I break it?!

In [22]:
x.view(batch_size, 1, 65)

RuntimeError: ignored

####  Create new Tensors 
We already saw some options, but here are a few more:  
using `rand, rand_like, zeros, ones` etc'

In [26]:
x = torch.zeros(5, 3)
x

tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [27]:
x = torch.ones(5, 3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [28]:
x = torch.rand(5, 3) #This is similar to what we did before (torch.empty(batch_size, 1, 8, 8).uniform_(0,1))
x

tensor([[0.2135, 0.7315, 0.8543],
        [0.8362, 0.3603, 0.7612],
        [0.7142, 0.9070, 0.2891],
        [0.7088, 0.5226, 0.2651],
        [0.4001, 0.0544, 0.8978]])

In [29]:
x = torch.randn(5, 3)
x

tensor([[ 0.8421,  0.9120,  1.4448],
        [-1.3336,  1.4641, -0.9091],
        [ 0.9500,  0.3838, -0.9774],
        [ 0.6177,  0.8500,  3.1549],
        [ 0.1438, -0.4079,  0.6336]])

#### *_like

In [34]:
x = torch.full_like(x, 3.141592)
x

tensor([[3.1416, 3.1416, 3.1416, 3.1416],
        [3.1416, 3.1416, 3.1416, 3.1416],
        [3.1416, 3.1416, 3.1416, 3.1416]])

In [31]:
x2 = torch.rand_like(x)
print(x2)
print(x2.size())

tensor([[0.4779, 0.6301, 0.9907],
        [0.4157, 0.5428, 0.2494],
        [0.6278, 0.4905, 0.7291],
        [0.2552, 0.0958, 0.4178],
        [0.9937, 0.2331, 0.4972]])
torch.Size([5, 3])


etc..

### Operations

There are multiple syntaxes for operations. In the following example, we will take a look at the addition operation.

In [43]:
x = torch.rand(5, 3)
y = torch.rand(5, 3)
print(x)
print(y)

tensor([[0.2880, 0.9275, 0.0951],
        [0.2719, 0.1998, 0.0197],
        [0.3824, 0.9755, 0.8949],
        [0.5789, 0.5209, 0.7545],
        [0.8250, 0.7413, 0.0246]])
tensor([[0.9194, 0.4180, 0.4562],
        [0.0152, 0.3840, 0.1168],
        [0.8072, 0.1648, 0.2647],
        [0.7009, 0.7676, 0.2343],
        [0.3691, 0.5354, 0.0801]])


In [44]:
print(x + 1)

tensor([[1.2880, 1.9275, 1.0951],
        [1.2719, 1.1998, 1.0197],
        [1.3824, 1.9755, 1.8949],
        [1.5789, 1.5209, 1.7545],
        [1.8250, 1.7413, 1.0246]])


In [45]:
print(x + y);
print(torch.add(x, y))
print(x.add(y))
result = torch.empty(5, 3); torch.add(x, y, out=result); print(result)
y.add_(x); print(y) # adds x to y

tensor([[1.2074, 1.3455, 0.5513],
        [0.2870, 0.5838, 0.1365],
        [1.1896, 1.1404, 1.1596],
        [1.2798, 1.2884, 0.9888],
        [1.1941, 1.2767, 0.1047]])
tensor([[1.2074, 1.3455, 0.5513],
        [0.2870, 0.5838, 0.1365],
        [1.1896, 1.1404, 1.1596],
        [1.2798, 1.2884, 0.9888],
        [1.1941, 1.2767, 0.1047]])
tensor([[1.2074, 1.3455, 0.5513],
        [0.2870, 0.5838, 0.1365],
        [1.1896, 1.1404, 1.1596],
        [1.2798, 1.2884, 0.9888],
        [1.1941, 1.2767, 0.1047]])
tensor([[1.2074, 1.3455, 0.5513],
        [0.2870, 0.5838, 0.1365],
        [1.1896, 1.1404, 1.1596],
        [1.2798, 1.2884, 0.9888],
        [1.1941, 1.2767, 0.1047]])
tensor([[1.2074, 1.3455, 0.5513],
        [0.2870, 0.5838, 0.1365],
        [1.1896, 1.1404, 1.1596],
        [1.2798, 1.2884, 0.9888],
        [1.1941, 1.2767, 0.1047]])


Any operation that mutates a tensor in-place is post-fixed with an ``_``.  
    For example: ``x.copy_(y)``, ``x.t_()``, will change ``x``.


In [46]:
x

tensor([[0.2880, 0.9275, 0.0951],
        [0.2719, 0.1998, 0.0197],
        [0.3824, 0.9755, 0.8949],
        [0.5789, 0.5209, 0.7545],
        [0.8250, 0.7413, 0.0246]])

### Slicing:

You can use standard NumPy-like indexing with all bells and whistles!

In [47]:
print(x[2, 1])

tensor(0.9755)


If you have a one element tensor, use ``.item()`` to get the value as a Python number

In [64]:
x = torch.randn(1)
print(x)
print(x.numpy())
print(x.item())

tensor([-0.5040])
[-0.5040146]
-0.5040146112442017


### CUDA Tensors



In [65]:
torch.cuda.is_available()

True

Tensors can be moved onto any device using the ``.to`` method.

In [0]:
if torch.cuda.is_available():
  device = torch.device("cuda")
else:
  device = torch.device("cpu")

In [67]:
device

device(type='cuda')

In [68]:
x.device

device(type='cpu')

In [0]:
y = torch.ones_like(x, device=device)

In [70]:
y.device

device(type='cuda', index=0)

Summing over x (from CPU) and y (From GPU)?

In [72]:
x + y

RuntimeError: ignored

In [73]:
x = x.to(device)
z = x + y
print(z)
print(z.to("cpu"))

tensor([0.4960], device='cuda:0')
tensor([0.4960])


In [75]:
!nvidia-smi

Wed Jul 10 10:11:01 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   54C    P0    29W /  70W |    771MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
+-------

# Autograd: Automatic Differentiation

The distinguishing characteristic of PyTorch when it was originally released was that it provided automatic differentiation on tensors (these days, we have other cool features like TorchScript; but back then, this was it!)

#### Automatic differentiation is the machinery that's responsible for training a neural network, but how does it work?

Neural Networks use backpropagation to calcualte gradients.  
These gradients are later updated using some optimization method.

Central to all neural networks in PyTorch is the `autograd` package.  
The `autograd` package provides automatic differentiation for all operations on Tensors. 

If you set a tensor's `.requires_grad` attribute to `True`, it starts to track all operations on it. When you finish your computation you can call ``.backward()`` and have all the gradients computed automatically.

In [76]:
x = torch.tensor([[1, 2], [3, 4.]], requires_grad=True)
x

tensor([[1., 2.],
        [3., 4.]], requires_grad=True)

In [79]:
y = x + torch.tensor(2) # this can also be y = x + 2
y

tensor([[3., 4.],
        [5., 6.]], grad_fn=<AddBackward0>)

In [0]:
y.requires_grad

True

Why does `y` require_grad?  
`y` has two inputs, `x` and `torch.tensor(2)`.  
If there’s a single input to an operation that requires gradient, its output will also require gradient. 

Changing whether a tensor requires_grad or not is especially useful when you want to freeze part of your model, like wheny you want to train only later layers of a network.

In [80]:
print(y.grad_fn)

<AddBackward0 object at 0x7f6a083ab198>


In [81]:
z = y * y * 3
z

tensor([[ 27.,  48.],
        [ 75., 108.]], grad_fn=<MulBackward0>)

In [82]:
out = z.mean()
out

tensor(64.5000, grad_fn=<MeanBackward0>)

In [0]:
out.backward()

## The Autograd graph we created:

<p align="center">
  <img src="https://drive.google.com/uc?export=view&id=1P-b6SlPNuyTE-4H3Uoa_hkHjXn_JPADa" width="75%" height="75%"/>
</p>

## How is the gradient computed:

In [84]:
x.grad

tensor([[4.5000, 6.0000],
        [7.5000, 9.0000]])

Let’s call the ``out`` *Tensor* “$o$”.  
We want to find the gradient of the leaves with respect to $o$.  
$o =\frac{1}{4}\sum_j 3(x_j+2)^2$,  

Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$,  hence,  
$\frac{\partial o}{\partial x_1} = 4.5$  
$\frac{\partial o}{\partial x_2} = 6$  
$\frac{\partial o}{\partial x_3} = 7.5$  
$\frac{\partial o}{\partial x_4} = 9$

In [0]:
y.grad #Why?

Given all of this, let's go ahead and write our first Neural model.