<img src="images/pytorch-logo.jpg" width="400" align="center">

# Welcome to PyTorch!

## Motivation
---

<img src="images/andrej_karpathy_on_pytorch.png" width="400" align="center">

## Setup
---

Use the instruction guide here to install PyTorch locally:
- https://pytorch.org/get-started/locally/

Alternatively, use Google Colab Notebook here:
- https://colab.research.google.com


### Local installation and versions
---

In [1]:
import torch
torch.__version__

'1.4.0'

In [2]:
import torchvision
torchvision.__version__

'0.5.0'

In [3]:
import numpy as np
np.__version__

'1.18.1'

### Using PyTorch + GPU/TPU in Google's Colab
---

Colaboratory is a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.

Colaboratory notebooks are stored in Google Drive and can be shared just as you would with Google Docs or Sheets. Colaboratory is free to use.
 -- https://colab.research.google.com/notebooks/welcome.ipynb

**Setup**
- Go to https://colab.research.google.com
- Create a new python 3 notebook
- Enable the GPU: "Edit -> Notebook settings -> Hardware accelerator: GPU -> Save"
- Then try the following:

In [4]:
import torch

print(torch.__version__)

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)

1.4.0
cpu


## So, what is PyTorch exactly?
---

"PyTorch - From Research To Production

An open source deep learning platform that provides a seamless path from research prototyping to production deployment."

<img src="images/dynamic_graph.gif" width="400" align="center">

## Understanding Tensors for Deep Learning
---

The inputs, outputs, and transformations within neural networks are all represented using **tensors**, and as a result, neural network programming utilises tensors heavily.

> A tensor is the primary data structure used by neural networks.

The concept of a tensor is a mathematical generalisation of other more specific concepts - scalars, vectors and matrices. Let’s look at some specific instances of those concepts to develop the intution for tensor generalisation.
 

<img src="images/scalar-vector-matrix-tensor.jpeg" width="400" align="center">

In [17]:
from __future__ import print_function
import numpy as np

In [32]:
# scalar
s = 2
print("scalar: \n s =", s)

scalar: 
 s = 2


In [31]:
# vector
v = [1, 2, 3, 4]
print("vector: \n v =", np.array(v))
## accessing first element
print(v[0])

vector: 
 v = [1 2 3 4]
1


In [33]:
# matrix
m = [[1, 4, 5, 12], [-5, 8, 9, 0],[-6, 7, 11, 19]]
print("matrix: \n m =", np.array(m))
## accessing last element of the first list
print(m[0][3])

matrix: 
 m = [[ 1  4  5 12]
 [-5  8  9  0]
 [-6  7 11 19]]
12


In [40]:
# tensor (nd array)
t = [[[1, 2, 3], [4, 5, 6], [7, 8, 9]], [[10, 11, 12], [13, 14, 15], [16, 17, 18]], [[19, 20, 21], [22, 23, 24], [25, 26, 27]]]
print("tensor: \n t =", np.array(t))
## accessing an element
print(t[2][0][2])

tensor: 
 t = [[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
21


**Indexes required to access an element**

The relationship within each of these pairs is that both elements require the same number of indexes to refer to a specific element within the data structure.


> data structure => indexes requires

> * scalar => 0
> * vector => 1
> * matrix => 2
> * ndarray => n

## Tensor in PyTorch
---

In [73]:
import torch

**uninitialised**

In [86]:
# scalar/1 element
x = torch.empty(1)
print("scalar: ", x)
# vector
x = torch.empty(3)
print("vector: ", x)
# matrix
x = torch.empty(3, 5)
print("matrix: ", x)
# tensor
x = torch.empty(3, 5, 3)
print("tensor: ", x)


scalar:  tensor([0.])
vector:  tensor([7.0065e-45, 0.0000e+00, 2.3694e-38])
matrix:  tensor([[ 1.1210e-44,  0.0000e+00,  0.0000e+00,  1.8367e-40,  3.9427e+04],
        [-2.0005e+00,  3.9440e+04, -1.5849e+29,  4.2039e-45,  0.0000e+00],
        [ 0.0000e+00,  1.4013e-45,  3.9440e+04, -2.0005e+00,  3.9427e+04]])
tensor:  tensor([[[ 0.0000e+00, -8.5899e+09,  5.4839e+04],
         [ 8.5920e+09,  2.2421e-44,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00]],

        [[ 0.0000e+00,  2.8217e+32,  4.5879e-41],
         [ 2.8217e+32,  4.5879e-41,  2.8217e+32],
         [ 4.5879e-41,  4.9045e-44,  0.0000e+00],
         [ 7.0065e-45,  0.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00]],

        [[ 0.0000e+00,  7.3755e-40,  2.8217e+32],
         [ 4.5879e-41,  2.8217e+32,  4.5879e-41],
         [ 2.8217e+32,  4.5879e-41,  1.4013e-45],
         [ 0.0000e+00,  1.

**filled with zeros**

In [89]:
# scalar/1 element
x = torch.zeros(1, dtype=torch.int)
print("scalar: ", x)
# vector
x = torch.zeros(3, dtype=torch.int)
print("vector: ", x)
# matrix
x = torch.zeros(3, 5, dtype=torch.int)
print("matrix: ", x)
# tensor
x = torch.zeros(3, 5, 3, dtype=torch.int)
print("tensor: ", x)

scalar:  tensor([0], dtype=torch.int32)
vector:  tensor([0, 0, 0], dtype=torch.int32)
matrix:  tensor([[0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0]], dtype=torch.int32)
tensor:  tensor([[[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],

        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]],

        [[0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0],
         [0, 0, 0]]], dtype=torch.int32)


**randomly initialised and of double data type**

In [96]:
# scalar/1 element
x = torch.rand(1, dtype=torch.double)
print("scalar: ", x)
# vector
x = torch.rand(3, dtype=torch.double)
print("vector: ", x)
# matrix
x = torch.rand(3, 5, dtype=torch.double)
print("matrix: ", x)
# tensor
x = torch.rand(3, 5, 3, dtype=torch.double)
print("tensor: ", x)

scalar:  tensor([0.3055], dtype=torch.float64)
vector:  tensor([0.2178, 0.4272, 0.5825], dtype=torch.float64)
matrix:  tensor([[0.9853, 0.1155, 0.0419, 0.1673, 0.9414],
        [0.5901, 0.8637, 0.1335, 0.5174, 0.4313],
        [0.5631, 0.1608, 0.8215, 0.2788, 0.6948]], dtype=torch.float64)
tensor:  tensor([[[0.7289, 0.3684, 0.0513],
         [0.0770, 0.0869, 0.9821],
         [0.4691, 0.0084, 0.4949],
         [0.0963, 0.7806, 0.6972],
         [0.9902, 0.9085, 0.5838]],

        [[0.8757, 0.3817, 0.1414],
         [0.3053, 0.9752, 0.3242],
         [0.6669, 0.4480, 0.4202],
         [0.8847, 0.0659, 0.8775],
         [0.7223, 0.6830, 0.1865]],

        [[0.8608, 0.6474, 0.1776],
         [0.1161, 0.5399, 0.1326],
         [0.9023, 0.2389, 0.9970],
         [0.4701, 0.8738, 0.8882],
         [0.3988, 0.3489, 0.0029]]], dtype=torch.float64)


Get its size:

In [99]:
print(x.size())

torch.Size([3, 5, 3])


## Rank, Axis and Shape of a tensor
---

### Rank of a tensor

The rank of a tensor refers to the number of dimensions present within the tensor. Suppose we are told that we have a rank-2 tensor. This means all of the following:

We have a matrix
We have a 2d-array
We have a 2d-tensor

> The rank of a tensor tells us how many indexes are required to access (refer to) a specific data element contained within the tensor data structure.

### Axes of a tensor

If we have a tensor, and we want to refer to a specific dimension, we use the word axis in deep learning.

> An axis of a tensor is a specific dimension of a tensor.

If we say that a tensor is a rank 2 tensor, we mean that the tensor has 2 dimensions, or equivalently, the tensor has two axes.

> Elements are said to exist or run along an axis. 

This *running* is constrained by the length of each axis. The length of each axis tells us how many indexes are available along each axis.

Let's look at the length of an axis now. Suppose we have a tensor called *t*, and its defined as follows:

In [42]:
t = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
]
print(np.array(t))

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [110]:
# size/shape
shape = np.array(t.size())
print("shape: ", shape)

# rank 
rank = len(t.shape)
print("rank: ", rank)

# number of elements
numel = t.numel()
print("number of elements: ", numel)

shape:  [3 4]
rank:  2
number of elements:  12


**We can see that the first axis has a length of three while the second axis has a length of four.**

In [53]:
print("t[0][0]: ", t[0][0])
print("t[0][1]: ", t[0][1])
print("t[0][2]: ", t[0][2])
print("t[0][3]: ", t[0][3])

print("t[1][0]: ", t[1][0])
print("t[1][1]: ", t[1][1])
print("t[1][2]: ", t[1][2])
print("t[1][3]: ", t[1][3])

print("t[2][0]: ", t[2][0])
print("t[2][1]: ", t[2][1])
print("t[2][2]: ", t[2][2])
print("t[2][3]: ", t[2][3])

t[0][0]:  1
t[0][1]:  2
t[0][2]:  3
t[0][3]:  4
t[1][0]:  5
t[1][1]:  6
t[1][2]:  7
t[1][3]:  8
t[2][0]:  9
t[2][1]:  10
t[2][2]:  11
t[2][3]:  12


**Each element along the first axis, is an array:**

In [49]:
print("t[0]: ", t[0])
print("t[1]: ", t[1])
print("t[2]: ", t[2])

t[0]:  [1, 2, 3, 4]
t[1]:  [5, 6, 7, 8]
t[2]:  [9, 10, 11, 12]


Each element along the second axis, is a scalar:

In [50]:
print("t[0][0]: ", t[0][0])
print("t[0][1]: ", t[0][1])
print("t[0][2]: ", t[0][2])
print("t[0][3]: ", t[0][3])

t[0][0]:  1
t[0][1]:  2
t[0][2]:  3
t[0][3]:  4


Note that, with tensors, the elements of the last axis are always scalar. Every other axis will contain n-dimensional arrays. This is what we see in this example, but this idea generalises.



### Shape of a tensor

The shape of a tensor is determined by the length of each axis, so if we know the shape of a given tensor, then we know the length of each axis, and this tells us how many indexes are available along each axis.


Let’s consider the same tensor t as before. To work with this tensor's shape, we’ll create a `torch.Tensor` object like so:

In [61]:
t = torch.tensor(t)
type(t)
print(t)
t.shape


tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])


torch.Size([3, 4])

The shape of a tensor is important for a few reasons. The first reason is because the shape allows us to conceptually think about, or even visualize, a tensor. Higher rank tensors become more abstract, and the shape gives us something concrete to think about.

Additionally, one of the types of operations we must perform frequently when we are programming our neural networks is called **reshaping**.



## Tensor operations
---

We have the following high-level categories of operations:

* Element-wise operations
* Reshaping operations
* Reduction operations
* Access operations


### Element-wise operations 

In [101]:
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print("x: ", x)
print("y: ", y)
# addition
a = torch.add(x, y)
print("addition: ", a)
# subtraction
s = torch.sub(x, y)
print("subtraction: ", s)
# multiplication
m = torch.mul(x, y)
print("multiplication: ", m)
# division
d = torch.div(x, y)
print("division: ", d)

x:  tensor([[0.9346, 0.3776],
        [0.7256, 0.0598]])
y:  tensor([[0.0562, 0.9949],
        [0.3113, 0.4844]])
addition:  tensor([[0.9908, 1.3725],
        [1.0369, 0.5442]])
subtraction:  tensor([[ 0.8784, -0.6173],
        [ 0.4143, -0.4245]])
multiplication:  tensor([[0.0525, 0.3757],
        [0.2259, 0.0290]])
division:  tensor([[16.6254,  0.3795],
        [ 2.3308,  0.1235]])


### Reshaping operation

In [72]:
# tensor reshaping example
print("\n t.reshape(1,12): ", t.reshape(1,12))
print("\n t.reshape(2,6): ", t.reshape(2,6))
print("\n t.reshape(3,4): ", t.reshape(3,4))
print("\n t.reshape(4,3): ", t.reshape(4,3))


 t.reshape(1,12):  tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

 t.reshape(2,6):  tensor([[ 1,  2,  3,  4,  5,  6],
        [ 7,  8,  9, 10, 11, 12]])

 t.reshape(3,4):  tensor([[ 1,  2,  3,  4],
        [ 5,  6,  7,  8],
        [ 9, 10, 11, 12]])

 t.reshape(4,3):  tensor([[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9],
        [10, 11, 12]])


Now, one thing to notice about reshaping is that the product of the component values in the shape must equal the total number of elements in the tensor.

This makes it so that there are enough positions inside the tensor data structure to contain all of the original data elements after the reshaping.

> Reshaping changes the shape but not the underlying data elements.

#### changing shape by `squeezing` and `unsqueezing` a tensor

* squeezing a tensor removes the dimensions or axes that have a length of one.
* unsqueezing a tensor adds a dimension with a length of one.

These operations are used to expand or shrink the rank (number of dimensions) of a tensor.

In [111]:
print(t.reshape([1,12]))

tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])


In [112]:
print(t.reshape([1,12]).shape)

torch.Size([1, 12])


In [114]:
print(t.reshape([1,12]).squeeze())

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])


In [115]:
print(t.reshape([1,12]).squeeze().shape)

torch.Size([12])


In [117]:
print(t.reshape([1,12]).squeeze().unsqueeze(dim=0))

tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])


In [119]:
print(t.reshape([1,12]).squeeze().unsqueeze(dim=0).shape)

torch.Size([1, 12])


Let’s look at a common use case for squeezing a tensor by building a flatten function.


In [120]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

In [121]:
flatten(t)

tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

We'll see that flatten operations are required when passing an output tensor from a convolutional layer to a linear layer.

#### Concatenating tensors

In [126]:
t1 = torch.tensor([
    [1,2],
    [3,4]
])
print(t1)

tensor([[1, 2],
        [3, 4]])


In [127]:
t2 = torch.tensor([
    [5,6],
    [7,8]
])
print(t2)

tensor([[5, 6],
        [7, 8]])


In [124]:
# We can combine t1 and t2 row-wise (axis-0) in the following way:
torch.cat((t1, t2), dim=0)

tensor([[1, 2],
        [3, 4],
        [5, 6],
        [7, 8]])

In [125]:
# We can combine them column-wise (axis-1) like this:
torch.cat((t1, t2), dim=1)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])

### Torch <-> Numpy 

In [131]:
a = torch.rand(5)
print(a)

tensor([0.6033, 0.6615, 0.2307, 0.4885, 0.4076])


In [132]:
b = a.numpy()
print(b)

[0.60334    0.66145104 0.23070425 0.4885323  0.40763372]


In [133]:
a.add_(1)
print(a)
print(b)

tensor([1.6033, 1.6615, 1.2307, 1.4885, 1.4076])
[1.60334   1.6614511 1.2307043 1.4885323 1.4076338]


In [134]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


### CUDA Tensors

Tensors can be moved onto any device using the `.to` method.

In [135]:
# let us run this cell only if CUDA is available
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    device = torch.device("cuda")          # a CUDA device object
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!

## References
---
- Twitter: https://twitter.com/PyTorch
- Forum: https://discuss.pytorch.org/
- Tutorials: https://pytorch.org/tutorials/
- Examples: https://github.com/pytorch/examples
- API Reference: https://pytorch.org/docs/stable/index.html
- Torchvision: https://pytorch.org/docs/stable/torchvision/index.html
- PyTorch Text: https://github.com/pytorch/text
- PyTorch Audio: https://github.com/pytorch/audio
- AllenNLP: https://allennlp.org/
- Object detection/segmentation: https://github.com/facebookresearch/maskrcnn-benchmark
- Facebook AI Research Sequence-to-Sequence Toolkit written in PyTorch: https://github.com/pytorch/fairseq
- FastAI http://www.fast.ai/
- Stanford CS230 Deep Learning notes https://cs230-stanford.github.io