![](https://discuss.pytorch.org/uploads/default/original/2X/3/35226d9fbc661ced1c5d17e374638389178c3176.png)

<!--NAVIGATION-->
# | Basics | [Autograd](2-Autograd.ipynb) >

## References and other resources
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [Torchvision](https://pytorch.org/docs/stable/torchvision/index.html)

## Alternatives

- [Tensorflow](https://www.tensorflow.org/)
- [Keras](https://keras.io/)
- [Theano](http://deeplearning.net/software/theano/)
- [Caffe](http://caffe.berkeleyvision.org/)
- [Caffe2](https://caffe2.ai/)
- [MXNet](https://mxnet.apache.org/)
- [many more...](https://www.google.com/search?q=deep+learning+frameworks&oq=deep+learning+frame&aqs=chrome.0.0j69i57j69i61l2j0l2.2284j0j1&sourceid=chrome&ie=UTF-8)

## So why PyTorch?

- Simple Python
- Easy to use + debug
- Supported/developed by Facebook
- Nice and extensible interface (modules, etc.)
- A lot of research code is published as PyTorch project

____

## Google Colab only!

In [3]:
# execute only if you're using Google Colab
# !wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/binder/requirements.txt -O requirements.txt
# !pip install -qr requirements.txt

___

In [4]:
import torch

In [5]:
print("PyTorch Version:", torch.__version__)

PyTorch Version: 1.10.1


In [6]:
import numpy as np

Very similar to numpy framework (if that helps!)

## Tensor Creation 

## First of all, what is a tensor?

A **matrix** is a grid of numbers, let's say (3x5). In simple terms, a **tensor** can be seen as a generalization of a matrix to higher dimension. It can be of arbitrary shape, e.g. (3 x 6 x 2 x 10). 

For the start, you can think of tensors as multidimensional arrays.

In [7]:
X = torch.tensor([1, 2, 3, 4, 5])
X

tensor([1, 2, 3, 4, 5])

In [8]:
X.shape

torch.Size([5])

In [9]:
X = torch.tensor([[1, 2, 3], [4, 5, 6]])
X

tensor([[1, 2, 3],
        [4, 5, 6]])

In [10]:
X.shape

torch.Size([2, 3])

In [11]:
# numpy
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [12]:
# torch
torch.eye(3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [13]:
# numpy
5 * np.eye(3)

array([[5., 0., 0.],
       [0., 5., 0.],
       [0., 0., 5.]])

In [14]:
# torch
5 * torch.eye(3)

tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.]])

In [15]:
# numpy
np.ones(5)

array([1., 1., 1., 1., 1.])

In [16]:
# torch
torch.ones(5)

tensor([1., 1., 1., 1., 1.])

In [17]:
# numpy
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [18]:
# torch
torch.zeros(5)

tensor([0., 0., 0., 0., 0.])

In [19]:
# numpy
np.empty((3, 5))

array([[4.68460263e-310, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000]])

In [20]:
# torch
torch.empty((3, 5))

tensor([[2.0888e+25, 3.0935e-41, 1.7670e+25, 3.0935e-41, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
        [1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 9.1084e-44]])

In [21]:
# numpy
X = np.random.random((5, 3))
X

array([[0.82773173, 0.1466076 , 0.76437319],
       [0.35993801, 0.48111636, 0.00637678],
       [0.16595733, 0.80979487, 0.60137132],
       [0.37395775, 0.56277321, 0.07196787],
       [0.96795967, 0.37908609, 0.15649552]])

In [22]:
# torch
Y = torch.rand((5, 3))
Y

tensor([[0.4855, 0.5216, 0.5906],
        [0.7241, 0.8283, 0.7959],
        [0.9068, 0.1235, 0.1178],
        [0.6039, 0.6437, 0.0838],
        [0.4621, 0.9000, 0.2131]])

In [23]:
# numpy
X.shape

(5, 3)

In [24]:
# torch
Y.shape

torch.Size([5, 3])

___

## But wait: Why do we even need tensors if we can do exactly the same with numpy arrays?

`torch.tensor` behaves like numpy arrays under mathematical operations. However, `torch.tensor` additionally keeps track of the gradients (see next notebook) and provides GPU support.

____

## Linear Algebra Operations

In [25]:
X = np.random.rand(3, 5)
Y = torch.rand(3, 5)

In [26]:
# numpy (matrix multiplication)
X.T @ X

array([[1.16027753, 1.31713201, 1.00904359, 1.27960924, 0.3789696 ],
       [1.31713201, 2.14310322, 1.85096968, 1.54049549, 0.71894512],
       [1.00904359, 1.85096968, 1.66975961, 1.14587119, 0.68766418],
       [1.27960924, 1.54049549, 1.14587119, 1.58679042, 0.34305865],
       [0.3789696 , 0.71894512, 0.68766418, 0.34305865, 0.33195435]])

In [27]:
Y.shape

torch.Size([3, 5])

In [28]:
# torch (matrix multiplication)
Y.t() @ Y

tensor([[0.5175, 0.4669, 0.5942, 0.7398, 0.6965],
        [0.4669, 0.5656, 0.5984, 0.7584, 0.9329],
        [0.5942, 0.5984, 0.8130, 0.9557, 1.0696],
        [0.7398, 0.7584, 0.9557, 1.1582, 1.2768],
        [0.6965, 0.9329, 1.0696, 1.2768, 1.7644]])

In [29]:
Y.t().matmul(Y)

tensor([[0.5175, 0.4669, 0.5942, 0.7398, 0.6965],
        [0.4669, 0.5656, 0.5984, 0.7584, 0.9329],
        [0.5942, 0.5984, 0.8130, 0.9557, 1.0696],
        [0.7398, 0.7584, 0.9557, 1.1582, 1.2768],
        [0.6965, 0.9329, 1.0696, 1.2768, 1.7644]])

In [30]:
# CAUTION: Operator '*' does element-wise multiplication, just like in numpy!
# Y.t() * Y  # error, dimensions do not match for element-wise multiplication

In [31]:
np.linalg.inv(X.T @ X)

array([[ 4.42403197e+15, -2.69904948e+15,  6.69105341e+15,
        -3.80428466e+15, -9.13441916e+15],
       [-4.74781993e+14,  5.19615283e+15, -5.00560071e+15,
        -1.25289773e+15,  9.52446999e+14],
       [ 4.74738413e+15, -7.18384741e+15,  1.09267421e+16,
        -2.63073200e+15, -9.77771801e+15],
       [-4.55734471e+15,  1.11921284e+15, -5.44107249e+15,
         4.48133650e+15,  9.41910205e+15],
       [-9.14704366e+15,  5.55265465e+15, -1.38099586e+16,
         7.87509448e+15,  1.88863079e+16]])

In [32]:
torch.inverse(Y.t() @ Y)

tensor([[ 11553690.0000,  -3508588.5000,    657969.0000, -10899183.0000,
           4783271.0000],
        [ -3118191.7500,  -3814708.0000,  -6578392.0000,   9599846.0000,
            288729.8750],
        [  1182758.1250,  -6759991.5000,  -8536887.0000,   7834619.5000,
           2613137.2500],
        [-11445084.0000,  10133906.0000,   8298590.5000,   1486307.0000,
          -6947202.0000],
        [  4653757.5000,    166433.7500,   2388496.2500,  -6599020.0000,
           1402619.0000]])

In [33]:
np.arange(2, 10, 2)

array([2, 4, 6, 8])

In [34]:
torch.arange(2, 10, 2)

tensor([2, 4, 6, 8])

In [35]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [36]:
torch.linspace(0, 1, 10)

tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889,
        1.0000])

## Your turn

**_Create the tensor:_**

$ \begin{bmatrix}
5 & 7 & 9 & 11 & 13 & 15 & 17 & 19
\end{bmatrix}  $

In [37]:
# YOUR TURN

## More on PyTorch Tensors

Each operation is also available as a function.

In [38]:
X = torch.rand(3, 2)

In [39]:
torch.exp(X)

tensor([[1.0063, 1.9142],
        [1.9751, 2.0566],
        [1.5168, 1.7168]])

In [40]:
X.exp()

tensor([[1.0063, 1.9142],
        [1.9751, 2.0566],
        [1.5168, 1.7168]])

In [41]:
X.sqrt()

tensor([[0.0792, 0.8058],
        [0.8250, 0.8492],
        [0.6454, 0.7352]])

In [42]:
(X.exp() + 2).sqrt() - 2 * X.log().sigmoid()  # be creative :-)

tensor([[1.7214, 1.1911],
        [1.1838, 1.1762],
        [1.2872, 1.2262]])

Many more functions available: sin, cos, tanh, log, etc.

In [43]:
A = torch.eye(3)
A

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [44]:
A.add(5)

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [45]:
A

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

Functions that mutate (in-place) the passed object end with an underscore, e.g. *add_*, *div_*, etc.

In [46]:
A.add_(5)

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [47]:
A

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [48]:
A.div_(3)

tensor([[2.0000, 1.6667, 1.6667],
        [1.6667, 2.0000, 1.6667],
        [1.6667, 1.6667, 2.0000]])

In [49]:
A

tensor([[2.0000, 1.6667, 1.6667],
        [1.6667, 2.0000, 1.6667],
        [1.6667, 1.6667, 2.0000]])

In [50]:
A.uniform_()  # fills the tensor with random uniform numbers in [0, 1]

tensor([[0.5130, 0.1512, 0.6868],
        [0.0958, 0.7495, 0.8564],
        [0.2072, 0.1197, 0.2117]])

In [51]:
A

tensor([[0.5130, 0.1512, 0.6868],
        [0.0958, 0.7495, 0.8564],
        [0.2072, 0.1197, 0.2117]])

## Indexing

Again, it works just like in numpy.

In [52]:
A = torch.randint(100, (3, 3))
A

tensor([[64, 39, 30],
        [87, 78, 14],
        [24, 97, 13]])

In [53]:
A[0, 0]

tensor(64)

In [54]:
A[2, 1]

tensor(97)

In [55]:
A[1]

tensor([87, 78, 14])

In [56]:
A[:, 1]

tensor([39, 78, 97])

In [57]:
A[1:2, :], A[1:2, :].shape

(tensor([[87, 78, 14]]), torch.Size([1, 3]))

In [58]:
A[1:, 1:]

tensor([[78, 14],
        [97, 13]])

In [59]:
A[:2, :2]

tensor([[64, 39],
        [87, 78]])

_____

## Reshaping & Expanding

In [60]:
X = torch.tensor([1, 2, 3, 4])
X

tensor([1, 2, 3, 4])

In [61]:
X = X.repeat(3, 1) # repeat it 3 times along 0th dimension and 1 times along first dimension
X, X.shape

(tensor([[1, 2, 3, 4],
         [1, 2, 3, 4],
         [1, 2, 3, 4]]),
 torch.Size([3, 4]))

In [62]:
X = torch.tensor([[1,2,3,4],
                 [5,6,7,8],[9,10,11,12]])
X, X.shape                 

(tensor([[ 1,  2,  3,  4],
         [ 5,  6,  7,  8],
         [ 9, 10, 11, 12]]),
 torch.Size([3, 4]))

In [63]:
# equivalent of 'reshape' in numpy (view does not allocate new memory!)
Y = X.view(2, 6)
Y

tensor([[ 1,  2,  3,  4,  5,  6],
        [ 7,  8,  9, 10, 11, 12]])

In [64]:
Y = X.view(-1)  # -1 tells PyTorch to infer the number of elements along that dimension
Y, Y.shape

(tensor([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]), torch.Size([12]))

In [65]:
Y = X.view(-1, 2)
Y, Y.shape

(tensor([[ 1,  2],
         [ 3,  4],
         [ 5,  6],
         [ 7,  8],
         [ 9, 10],
         [11, 12]]),
 torch.Size([6, 2]))

In [66]:
Y = X.view(-1, 4)
Y, Y.shape

(tensor([[ 1,  2,  3,  4],
         [ 5,  6,  7,  8],
         [ 9, 10, 11, 12]]),
 torch.Size([3, 4]))

In [67]:
Y = torch.ones(5)
Y, Y.shape

(tensor([1., 1., 1., 1., 1.]), torch.Size([5]))

In [68]:
Y = Y.view(-1, 1)
Y, Y.shape

(tensor([[1.],
         [1.],
         [1.],
         [1.],
         [1.]]),
 torch.Size([5, 1]))

In [69]:
Y.expand(5, 5)  # similar to repeat but does not actually allocate new memory

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [70]:
X = torch.eye(4)
Y = X[3:, :]
Y, Y.shape

(tensor([[0., 0., 0., 1.]]), torch.Size([1, 4]))

In [71]:
Y = Y.squeeze() # removes all dimensions of size '1'
Y, Y.shape

(tensor([0., 0., 0., 1.]), torch.Size([4]))

In [72]:
Y = Y.unsqueeze(1)
Y, Y.shape

(tensor([[0.],
         [0.],
         [0.],
         [1.]]),
 torch.Size([4, 1]))

## Your turn!

**_Create the tensor:_**

$ \begin{bmatrix}
7 & 5 & 5 & 5 & 5 \\
5 & 7 & 5 & 5 & 5 \\
5 & 5 & 7 & 5 & 5 \\
5 & 5 & 5 & 7 & 5 \\
5 & 5 & 5 & 5 & 7 
\end{bmatrix}  $

Hint: You can use matrix sum and scalar multiplication

In [73]:
# YOUR TURN

**_Create the tensor:_**

$ \begin{bmatrix}
4 & 6 & 8 & 10 & 12 \\
14 & 16 & 18 & 20 & 22 \\
24 & 26 & 28 & 30 & 32
\end{bmatrix}$

In [74]:
# YOUR TURN

**_Create the tensor:_**

$ \begin{bmatrix}
2 & 2 & 2 & 2 & 2 \\
4 & 4 & 4 & 4 & 4 \\
6 & 6 & 6 & 6 & 6 \\
8 & 8 & 8 & 8 & 8
\end{bmatrix}  $

In [75]:
# YOUR TURN

_____

## Reductions

In [76]:
X = torch.randint(10, (3, 4)).float()
X

tensor([[7., 4., 7., 2.],
        [5., 1., 7., 4.],
        [6., 4., 1., 0.]])

In [77]:
X.sum()

tensor(48.)

In [78]:
X.sum().item()

48.0

In [79]:
X.sum(0) # colum-wise sum

tensor([18.,  9., 15.,  6.])

In [80]:
X.sum(dim=1)  # row-wise sum

tensor([20., 17., 11.])

In [81]:
X.mean()

tensor(4.)

In [82]:
X.mean(dim=1)

tensor([5.0000, 4.2500, 2.7500])

In [83]:
X.norm(dim=0)

tensor([10.4881,  5.7446,  9.9499,  4.4721])

## Your turn!

Compute the norms of the row-vectors in matrix **X** without using _torch.norm()_.

Remember: $$||\vec{v}||_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}$$

Hint: _X\*\*2_ computes the element-wise square.

In [84]:
X = torch.eye(4) + torch.arange(4).repeat(4, 1).float()

# YOUR TURN

# SOLUTION: tensor([3.8730, 4.1231, 4.3589, 4.5826]

## Masking

In [85]:
X = torch.randint(100, (5, 3))
X

tensor([[13, 14, 44],
        [10, 35,  4],
        [54, 90, 74],
        [82, 84, 71],
        [50, 34, 33]])

In [86]:
mask = (X > 25) & (X < 75)
mask

tensor([[False, False,  True],
        [False,  True, False],
        [ True, False,  True],
        [False, False,  True],
        [ True,  True,  True]])

In [87]:
X[mask]  # returns all elements matching the criteria in a 1D-tensor

tensor([44, 35, 54, 74, 71, 50, 34, 33])

In [88]:
mask.sum()  # number of elements that fulfill the condition

tensor(8)

In [89]:
(X == 25) | (X > 60)

tensor([[False, False, False],
        [False, False, False],
        [False,  True,  True],
        [ True,  True,  True],
        [False, False, False]])

## Your turn!

Get the number of non-zeros in **X**

In [90]:
X = torch.tensor([[1, 0, 2], [0, 6, 0]])
# YOUR TURN

Compute the sum of all entries in X that are larger than the mean of all values in X.

In [91]:
# YOUR TURN

______

## Some useful properties of tensors

In [92]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.shape: \n%s\n" % (x.shape,))
print("x.size(): \n%s\n" % (x.size(),))
print("x.size(1): \n%s\n" % x.size(1))
print("x.dim(): \n%s\n" % x.dim())

print("x.dtype: \n%s\n" % x.dtype)
print("x.device: \n%s\n" % x.device)

x.shape: 
torch.Size([2, 3])

x.size(): 
torch.Size([2, 3])

x.size(1): 
3

x.dim(): 
2

x.dtype: 
torch.float32

x.device: 
cpu



The `nonzero` function returns indices of the non zero elements.

In [93]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.nonzero(): \n%s\n" % x.nonzero())

x.nonzero(): 
tensor([[0, 1],
        [0, 2],
        [1, 0],
        [1, 1],
        [1, 2]])



In [94]:
# press tab to autocomplete
# x.

___

## Converting between PyTorch and numpy

In [95]:
X = np.random.random((5,3))
X, type(X[0,0])

(array([[0.21101141, 0.70801791, 0.09402548],
        [0.6017359 , 0.02075034, 0.37112562],
        [0.20323175, 0.4549351 , 0.55545013],
        [0.84579597, 0.84134657, 0.80327522],
        [0.77742112, 0.33261911, 0.28625991]]),
 numpy.float64)

In [96]:
# numpy ---> torch
Y = torch.from_numpy(X)  # Y is actually a DoubleTensor (i.e. 64-bit representation)
Y

tensor([[0.2110, 0.7080, 0.0940],
        [0.6017, 0.0208, 0.3711],
        [0.2032, 0.4549, 0.5555],
        [0.8458, 0.8413, 0.8033],
        [0.7774, 0.3326, 0.2863]], dtype=torch.float64)

In [97]:
Y = torch.rand((2,4))
Y

tensor([[0.7975, 0.4272, 0.8645, 0.3960],
        [0.3141, 0.4966, 0.8700, 0.7548]])

In [98]:
# torch ---> numpy
X = Y.numpy()
X

array([[0.79745436, 0.42715847, 0.86452985, 0.3960278 ],
       [0.31407893, 0.4966336 , 0.8699857 , 0.7547784 ]], dtype=float32)

____

## Using GPUs 

Using **GPU** in pytorch is as simple as calling **`.cuda()`** on your tensor.

But first, you may want to check: 
 - that cuda can actually be used : `torch.cuda.is_available()`
 - how many gpus are available : `torch.cuda.device_count()`

In [99]:
torch.cuda.is_available()

  return torch._C._cuda_getDeviceCount() > 0


False

In [100]:
torch.cuda.device_count()

0

In [101]:
x = torch.Tensor([[1,2,3], [4,5,6]])
print(x)

tensor([[1., 2., 3.],
        [4., 5., 6.]])


### tensor.cuda

_Note : If you don't have Cuda on the machine, the following examples won't work_

In [102]:
x.cuda(0)
print(x.device)
x = x.cuda(0)
print(x.device)
x = x.cuda(1)
print(x.device)

RuntimeError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

In [None]:
x = torch.Tensor([[1,2,3], [4,5,6]])

# This will generate an error since you cannot do operation on tensor that are not on the same device
x + x.cuda()

RuntimeError: expected device cpu but got device cuda:0

#### Write an if statement that moves x on gpu if cuda is available

In [None]:
# YOUR TURN

These kinds of if statements used to be all over the place in people's pytorch code. Recently, a more flexible way was introduced:

### torch.device

A **`torch.device`** is an object representing the device on which a torch.tensor is or will be allocated.

You can easily move a tensor from a device to another by using the **`tensor.to()`** function

In [None]:
cpu = torch.device('cpu')
cuda_0 = torch.device('cuda:0')

x = x.to(cpu)
print(x.device)
x = x.to(cuda_0)
print(x.device)

cpu
cuda:0


It can be more flexible since you can check if cuda exists only once in your code

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = x.to(device)  # We don't need to care anymore about whether cuda is available or not
print(x.device)

cuda:0


#### Timing GPU

How much faster is GPU ?  See for yourself ...

In [None]:
A = torch.rand(100, 1000, 1000)
B = A.cuda(0)
A.size()

torch.Size([100, 1000, 1000])

In [None]:
%timeit -n 3 torch.bmm(A, A)

883 ms ± 36.8 ms per loop (mean ± std. dev. of 7 runs, 3 loops each)


In [None]:
%timeit -n 30 torch.bmm(B, B)

32.4 µs ± 12.5 µs per loop (mean ± std. dev. of 7 runs, 30 loops each)


___

## Don't forget to download the notebook, otherwise your changes will be lost!

![Download the notebook](figures/notebook-download.png)

<!--NAVIGATION-->
# | Basics | [Autograd](2-Autograd.ipynb) >