# PyTorch - Beginners Tutorial 1

## References and other resources
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [Torchvision](https://pytorch.org/docs/stable/torchvision/index.html)

## Alternatives

- [Tensorflow](https://www.tensorflow.org/)
- [Keras](https://keras.io/)
- [Theano](http://deeplearning.net/software/theano/)
- [Caffe](http://caffe.berkeleyvision.org/)
- [Caffe2](https://caffe2.ai/)
- [MXNet](https://mxnet.apache.org/)

## So why PyTorch?

- Simple Python
- Easy to use + debug
- Supported/developed by Facebook
- Nice and extensible interface (modules, etc.)
- A lot of research code is published as PyTorch project

____

# Import the Libraries `numpy` and `torch`

In [1]:
import torch
import numpy as np

### Check your PyTorch version

In [2]:
print("PyTorch Version:", torch.__version__)

PyTorch Version: 1.8.1


### NOTE: 
#### Although this is a PyTorch tutorial course, we'll also compare and contrast the library with another popular mathematics library, Numpy

# Tensors

## Definition of a `Tensor`

A **matrix** is a grid of numbers, let's say (3x5). In simple terms, a **tensor** can be seen as a generalization of a matrix to higher dimension. It can be of arbitrary shape, for example;<br>  dim = \[3 x 2 x 4 x 5\]
<br>

For now, just think of tensors as multidimensional arrays.

### Creating our first torch Tensor

In [3]:
X = torch.tensor([1, 2, 3, 4, 5])

In [4]:
print(f'X is represented as;\n {X} \n\nX is of type:\n {type(X)}')

X is represented as;
 tensor([1, 2, 3, 4, 5]) 

X is of type:
 <class 'torch.Tensor'>


Use the `.shape` method to access the dimension attribute of the tensor

In [5]:
X.shape

torch.Size([5])

#### This indicates and array of length 5

#### Adding square brackets around the array however changes that to an n dimensional matrix form:<br>
    X = torch.tensor( [ [1, 2, 3, 4, 5 ] ])

In [6]:
X = torch.tensor( [ [1, 2, 3, 4, 5 ] ])

In [7]:
X

tensor([[1, 2, 3, 4, 5]])

#### We see the printout is almost identicle to the previous case, but these two objects are distinct in their interpretation, as can be proven by using .shape on the new X variable

In [9]:
X.shape

torch.Size([1, 5])

## Syntax for constructing tensors

In [10]:
X = torch.tensor([[1, 2, 3], [4, 5, 6]])

In [11]:
X

tensor([[1, 2, 3],
        [4, 5, 6]])

We see to create a n x m object we pass `n arrays` (rows) consisting of a `m elements` (columns)

In [12]:
X.shape

torch.Size([2, 3])

# `Numpy arrays` and `Torch tensors`

### The `Identity` matrix of dimension n:
(using numpy) Generate a 3x3 Identity:

In [13]:
np_eye = np.eye(3)

In [14]:
np_eye

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

(using torch) Generate a 3x3 Identity:

In [15]:
torch_eye = torch.eye(3)

In [16]:
torch_eye

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

### Scalar Multipled By tensor
(using numpy) 

In [17]:
5*np_eye

array([[5., 0., 0.],
       [0., 5., 0.],
       [0., 0., 5.]])

### Scalar Multipled By tensor
(using torch)

In [18]:
5*torch_eye

tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.]])

### The `null` vector of dim 1 x n:
(using numpy) Generate an 1 x n array 

In [19]:
np_null = np.zeros(5)

In [20]:
np_null

array([0., 0., 0., 0., 0.])

### The `null` vector of dim 1 x n:
(using torch) Generate an 1 x n array 

In [21]:
torch_null = torch.zeros(5)

## Using `.empty` 
Return an n x m Matrix without initializing entries. 

`empty`, unlike `zeros`, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution.


(using numpy)

In [22]:
np_empty = np.empty((3, 5))

In [23]:
np_empty

array([[0.        , 0.        , 0.4472136 , 0.0531494 , 0.18257419],
       [0.4472136 , 0.2125976 , 0.36514837, 0.4472136 , 0.4783446 ],
       [0.54772256, 0.4472136 , 0.85039041, 0.73029674, 0.4472136 ]])

(using torch)

In [24]:
torch_empty = torch.empty((3, 5))

In [25]:
torch_empty

tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  6.2218e-33],
        [ 1.4013e-45,  6.1452e-33,  1.4013e-45,  6.2237e-33,  1.4013e-45],
        [ 6.2218e-33,  1.4013e-45,  0.0000e+00, -1.5846e+29, -2.1600e-36]])

What if we wanted to only include integers as follows:<br>
 Identical syntax for both torch and numpy

In [26]:
np_empty = np.empty((3, 5), dtype=int)

In [27]:
torch_empty = torch.empty((3, 5), dtype=int)

In [28]:
np_empty

array([[                  0,                   0, 4601727903846100441,
        4587820456862672466, 4595745948572889240],
       [4601727903846100441, 4596827656117413458, 4600249548200259736,
        4601727903846100441, 4602288710223143708],
       [4603108665757041778, 4601727903846100441, 4605834855372154450,
        4604753147827630232, 4601727903846100441]])

In [29]:
torch_empty

tensor([[-1152921504606846976,  4611694808133567514,                   12,
                            0,                  255],
        [                   0,                    0,                    0,
                            0,                  255],
        [                   0,                    0,                    0,
                            0,                  255]])

## Create a `Random Matrix` with dimension n x m

In [30]:
Y = torch.rand((4, 7))

In [31]:
Y

tensor([[0.9845, 0.3775, 0.2989, 0.7676, 0.7381, 0.9413, 0.0617],
        [0.8585, 0.6364, 0.8901, 0.7282, 0.0857, 0.3206, 0.9156],
        [0.3841, 0.1259, 0.8554, 0.4420, 0.4261, 0.0206, 0.3840],
        [0.7511, 0.4116, 0.8079, 0.5052, 0.8308, 0.3390, 0.4749]])

In [32]:
Y.shape

torch.Size([4, 7])

___

### So far we've seen that torch tensors and numpy array behave essentially identically

## So the question is then:

## <span style="color:red">Why do we even need tensors if we can do exactly the same with numpy arrays?</span>

As we've seen, `torch.tensor` behaves identically to numpy arrays under mathematical operations. The difference is that by using torch tensors we keep track of the gradients and also are GPU compatible/ready/optimized

____

## Matrix Operations

In [33]:
X = np.random.rand(2, 7)
X

array([[0.72490666, 0.81159863, 0.61953472, 0.10798165, 0.01962622,
        0.91124195, 0.87604696],
       [0.08312025, 0.98766394, 0.12046817, 0.52930693, 0.65505334,
        0.7253217 , 0.59148155]])

In [34]:
Y = torch.rand(2, 7)
Y

tensor([[0.2215, 0.0823, 0.3400, 0.4802, 0.8337, 0.1834, 0.5305],
        [0.9835, 0.9047, 0.9644, 0.1993, 0.4791, 0.2204, 0.4537]])

## Matrix Multiplication 
###  NOTE: Operator `*` does element-wise multiplication, in both torch and numpy

### Matrix Multiplication 
(numpy way:)

In [151]:
output_np = X.T @ X

In [152]:
output_np

array([[0.48297978, 0.40750159, 0.12105153, 0.17923705, 0.5090578 ,
        0.20715088, 0.63431275],
       [0.40750159, 0.88837881, 0.16441592, 0.2692404 , 0.48118361,
        0.34516588, 0.62503161],
       [0.12105153, 0.16441592, 0.03746296, 0.05842039, 0.13349821,
        0.07140662, 0.1692567 ],
       [0.17923705, 0.2692404 , 0.05842039, 0.09209133, 0.20011443,
        0.11380051, 0.2548688 ],
       [0.5090578 , 0.48118361, 0.13349821, 0.20011443, 0.54144831,
        0.23450578, 0.67708841],
       [0.20715088, 0.34516588, 0.07140662, 0.11380051, 0.23450578,
        0.1421601 , 0.30017007],
       [0.63431275, 0.62503161, 0.1692567 , 0.2548688 , 0.67708841,
        0.30017007, 0.84788695]])

In [153]:
output_np.shape

(7, 7)

### Matrix Multiplication 
(torch way)

Y.t() * Y  # error, dimensions do not match for element-wise multiplication

In [154]:
output_torch = Y.t() @ Y

In [155]:
output_torch

tensor([[0.3906, 0.0746, 0.5750, 0.4044, 0.1973, 0.6432, 0.5547],
        [0.0746, 0.0271, 0.0918, 0.1561, 0.0807, 0.1714, 0.1262],
        [0.5750, 0.0918, 0.8721, 0.4842, 0.2299, 0.8785, 0.7881],
        [0.4044, 0.1561, 0.4842, 0.9023, 0.4681, 0.9636, 0.6984],
        [0.1973, 0.0807, 0.2299, 0.4681, 0.2436, 0.4873, 0.3479],
        [0.6432, 0.1714, 0.8785, 0.9636, 0.4873, 1.2424, 0.9898],
        [0.5547, 0.1262, 0.7881, 0.6984, 0.3479, 0.9898, 0.8196]])

In [156]:
output_torch.shape

torch.Size([7, 7])

# Using `matmul` 

<h2>We use matmul when we would like to computer the `Matrix product` of two tensors.</h2>

<b>NOTE: The behavior depends on the dimensionality of the tensors as follows:</b>
<ul>
<li>If both tensors are 1-dimensional, the dot product (scalar) is returned.</li>

<li>If both arguments are 2-dimensional, the matrix-matrix product is returned.</li>

<li>If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.</li>

<li>If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.</li>

<li>If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if input is a (j×1×n×n) tensor and other is a (k×n×n) tensor, out will be a (j×k×n×n) tensor.<br>
Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not the matrix dimensions. For example, if input is a (j×1×n×m) tensor and other is a (k×m×p) tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. out will be a (j×k×n×p) tensor.
</ul>
This operator supports TensorFloat32.

<b>NOTE:The 1-dimensional dot product version of this function does not support an out parameter.</b>

In [157]:
Y.t().matmul(Y)

tensor([[0.3906, 0.0746, 0.5750, 0.4044, 0.1973, 0.6432, 0.5547],
        [0.0746, 0.0271, 0.0918, 0.1561, 0.0807, 0.1714, 0.1262],
        [0.5750, 0.0918, 0.8721, 0.4842, 0.2299, 0.8785, 0.7881],
        [0.4044, 0.1561, 0.4842, 0.9023, 0.4681, 0.9636, 0.6984],
        [0.1973, 0.0807, 0.2299, 0.4681, 0.2436, 0.4873, 0.3479],
        [0.6432, 0.1714, 0.8785, 0.9636, 0.4873, 1.2424, 0.9898],
        [0.5547, 0.1262, 0.7881, 0.6984, 0.3479, 0.9898, 0.8196]])

## Finding the inverse

(numpy way)

In [158]:
np.linalg.inv(X.T @ X)

array([[-4.46681249e+16, -2.38135441e+16, -8.13627957e+16,
         9.32921300e+16, -1.47873186e+16,  2.31599634e+16,
         4.27794271e+16],
       [-1.85382964e+16, -7.42976274e+15, -3.60830457e+16,
         3.49899205e+16, -4.90011335e+15,  4.51229940e+15,
         1.83464613e+16],
       [-0.00000000e+00, -1.51161781e+15, -0.00000000e+00,
        -0.00000000e+00, -3.60287970e+16, -0.00000000e+00,
         2.98854577e+16],
       [ 6.30503948e+16,  2.41206137e+16,  1.08086391e+17,
        -7.20575940e+16,  2.70215978e+16, -3.60287970e+16,
        -7.36892721e+16],
       [ 2.47697980e+16,  1.41506798e+16,  2.70215978e+16,
        -5.40431955e+16,  6.75539944e+15, -9.00719925e+15,
        -2.03168522e+16],
       [ 4.50359963e+15,  7.87400440e+14,  1.80143985e+16,
        -3.60287970e+16,  4.50359963e+15,  1.80143985e+16,
        -6.68957183e+15],
       [ 6.75539944e+15,  4.76445052e+15,  2.70215978e+16,
        -1.80143985e+16,  6.75539944e+15, -9.00719925e+15,
        -1.0750911

In [159]:
torch.inverse(Y.t() @ Y)

tensor([[-3.9834e+07,  1.5230e+08,  4.4511e+07, -1.6292e+07,  3.9657e+07,
         -2.9735e+07, -6.3358e+06],
        [ 6.8964e+07,  1.3747e+09, -1.4902e+07, -3.8444e+07, -2.2102e+08,
         -1.2524e+08,  3.3822e+07],
        [ 2.2818e+07,  2.0722e+06, -1.0906e+07,  2.5851e+06, -6.1696e+06,
         -2.6806e+06, -1.6207e+06],
        [-1.2449e+07, -2.5013e+07,  9.5530e+06, -2.7622e+06,  2.4217e+07,
         -9.6080e+06,  6.7682e+06],
        [ 1.1587e+07, -1.6137e+08,  2.5322e+06,  1.2371e+07,  1.8453e+07,
          1.7709e+07, -2.5191e+07],
        [ 3.8199e+05, -1.7103e+08, -1.3756e+07, -3.0508e+05,  1.3885e+07,
          2.8792e+07, -1.1027e+06],
        [-3.7347e+05, -2.0379e+07, -9.9441e+06,  1.1930e+07, -3.2111e+07,
          7.8840e+06,  6.8966e+06]])

# Creating Sequential Arrays and Matrices

(numpy way)

In [162]:
np.arange(2, 10, 2)

array([2, 4, 6, 8])

(torch way)

In [163]:
torch.arange(2, 10, 2)

tensor([2, 4, 6, 8])

Using `linspace` to generate N equally spaced values on a given interval:

In [168]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [169]:
torch.linspace(0, 1, 10)

tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889,
        1.0000])

## More on PyTorch Tensors

Each operation is also available as a function.

In [None]:
X = torch.rand(3, 2)

In [None]:
torch.exp(X)

In [None]:
X.exp()

In [None]:
X.sqrt()

In [None]:
(X.exp() + 2).sqrt() - 2 * X.log().sigmoid()  # be creative :-)

Many more functions available: sin, cos, tanh, log, etc.

In [None]:
A = torch.eye(3)
A

In [None]:
A.add(5)

In [None]:
A

Functions that mutate (in-place) the passed object end with an underscore, e.g. *add_*, *div_*, etc.

In [None]:
A.add_(5)

In [None]:
A

In [None]:
A.div_(3)

In [None]:
A

In [None]:
A.uniform_()  # fills the tensor with random uniform numbers in [0, 1]

In [None]:
A

## Indexing

Again, it works just like in numpy.

In [None]:
A = torch.randint(100, (3, 3))
A

In [None]:
A[0, 0]

In [None]:
A[2, 1]

In [None]:
A[1]

In [None]:
A[:, 1]

In [None]:
A[1:2, :], A[1:2, :].shape

In [None]:
A[1:, 1:]

In [None]:
A[:2, :2]

_____

## Reshaping & Expanding

In [None]:
X = torch.tensor([1, 2, 3, 4])
X

In [None]:
X = X.repeat(3, 1) # repeat it 3 times along 0th dimension and 1 times along first dimension
X, X.shape

In [None]:
# equivalent of 'reshape' in numpy (view does not allocate new memory!)
Y = X.view(2, 6)
Y

In [None]:
Y = X.view(-1)  # -1 tells PyTorch to infer the number of elements along that dimension
Y, Y.shape

In [None]:
Y = X.view(-1, 2)
Y, Y.shape

In [None]:
Y = X.view(-1, 4)
Y, Y.shape

In [None]:
Y = torch.ones(5)
Y, Y.shape

In [None]:
Y = Y.view(-1, 1)
Y, Y.shape

In [None]:
Y.expand(5, 5)  # similar to repeat but does not actually allocate new memory

In [None]:
X = torch.eye(4)
Y = X[3:, :]
Y, Y.shape

In [None]:
Y = Y.squeeze() # removes all dimensions of size '1'
Y, Y.shape

In [None]:
Y = Y.unsqueeze(1)
Y, Y.shape

## Reductions

In [None]:
X = torch.randint(10, (3, 4)).float()
X

In [None]:
X.sum()

In [None]:
X.sum().item()

In [None]:
X.sum(0) # colum-wise sum

In [None]:
X.sum(dim=1)  # row-wise sum

In [None]:
X.mean()

In [None]:
X.mean(dim=1)

In [None]:
X.norm(dim=0)

## Your turn!

Compute the norms of the row-vectors in matrix **X** without using _torch.norm()_.

Remember: $$||\vec{v}||_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}$$

Hint: _X\*\*2_ computes the element-wise square.

In [170]:
X = torch.eye(4) + torch.arange(4).repeat(4, 1).float()

In [173]:
# ANS:?

## Masking

In [None]:
X = torch.randint(100, (5, 3))
X

In [None]:
mask = (X > 25) & (X < 75)
mask

In [None]:
X[mask]  # returns all elements matching the criteria in a 1D-tensor

In [None]:
mask.sum()  # number of elements that fulfill the condition

In [None]:
(X == 25) | (X > 60)

## Your turn!

Get the number of non-zeros in **X**

In [None]:
X = torch.tensor([[1, 0, 2], [0, 6, 0]])
# YOUR TURN

Compute the sum of all entries in X that are larger than the mean of all values in X.

In [None]:
# YOUR TURN

______

## Some useful properties of tensors

In [None]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.shape: \n%s\n" % (x.shape,))
print("x.size(): \n%s\n" % (x.size(),))
print("x.size(1): \n%s\n" % x.size(1))
print("x.dim(): \n%s\n" % x.dim())

print("x.dtype: \n%s\n" % x.dtype)
print("x.device: \n%s\n" % x.device)

The `nonzero` function returns indices of the non zero elements.

In [None]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.nonzero(): \n%s\n" % x.nonzero())

In [None]:
# press tab to autocomplete
# x.

___

## Converting between PyTorch and numpy

In [None]:
X = np.random.random((5,3))
X

In [None]:
# numpy ---> torch
Y = torch.from_numpy(X)  # Y is actually a DoubleTensor (i.e. 64-bit representation)
Y

In [None]:
Y = torch.rand((2,4))
Y

In [None]:
# torch ---> numpy
X = Y.numpy()
X

____

## Using GPUs 

Using **GPU** in pytorch is as simple as calling **`.cuda()`** on your tensor.

But first, you may want to check: 
 - that cuda can actually be used : `torch.cuda.is_available()`
 - how many gpus are available : `torch.cuda.device_count()`

In [None]:
torch.cuda.is_available()

In [None]:
torch.cuda.device_count()

In [None]:
x = torch.Tensor([[1,2,3], [4,5,6]])
print(x)

### tensor.cuda

_Note : If you don't have Cuda on the machine, the following examples won't work_

In [None]:
x.cuda(0)
print(x.device)
x = x.cuda(0)
print(x.device)
x = x.cuda(1)
print(x.device)

In [None]:
x = torch.Tensor([[1,2,3], [4,5,6]])

# This will generate an error since you cannot do operation on tensor that are not on the same device
x + x.cuda()

#### Write an if statement that moves x on gpu if cuda is available

In [None]:
# YOUR TURN

These kinds of if statements used to be all over the place in people's pytorch code. Recently, a more flexible way was introduced:

### torch.device

A **`torch.device`** is an object representing the device on which a torch.tensor is or will be allocated.

You can easily move a tensor from a device to another by using the **`tensor.to()`** function

In [None]:
cpu = torch.device('cpu')
cuda_0 = torch.device('cuda:0')

x = x.to(cpu)
print(x.device)
x = x.to(cuda_0)
print(x.device)

It can be more flexible since you can check if cuda exists only once in your code

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = x.to(device)  # We don't need to care anymore about whether cuda is available or not
print(x.device)

#### Timing GPU

How much faster is GPU ?  See for yourself ...

In [None]:
A = torch.rand(100, 1000, 1000)
B = A.cuda(1)
A.size()

In [None]:
%timeit -n 3 torch.bmm(A, A)

In [None]:
%timeit -n 30 torch.bmm(B, B)