![](https://discuss.pytorch.org/uploads/default/original/2X/3/35226d9fbc661ced1c5d17e374638389178c3176.png)

## References and other resources
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [Torchvision](https://pytorch.org/docs/stable/torchvision/index.html)

## Alternatives

- [Tensorflow](https://www.tensorflow.org/)
- [Keras](https://keras.io/)
- [Theano](http://deeplearning.net/software/theano/)
- [Caffe](http://caffe.berkeleyvision.org/)
- [Caffe2](https://caffe2.ai/)
- [MXNet](https://mxnet.apache.org/)
- [many more...](https://www.google.com/search?q=deep+learning+frameworks&oq=deep+learning+frame&aqs=chrome.0.0j69i57j69i61l2j0l2.2284j0j1&sourceid=chrome&ie=UTF-8)

## So why PyTorch?

- Simple Python
- Easy to use + debug
- Supported/developed by Facebook
- Nice and extensible interface (modules, etc.)
- A lot of research code is published as PyTorch project

____

## Google Colab only!

In [None]:
# execute only if you're using Google Colab
#!wget -q https://raw.githubusercontent.com/ahug/amld-pytorch-workshop/master/binder/requirements.txt -O requirements.txt
#!pip install -qr requirements.txt

___

In [1]:
import torch

In [2]:
print("PyTorch Version:", torch.__version__)

PyTorch Version: 1.0.0


In [3]:
import numpy as np

Very similar to numpy framework (if that helps!)

## Tensor Creation 

## First of all, what is a tensor?

A **matrix** is a grid of numbers, let's say (3x5). In simple terms, a **tensor** can be seen as a generalization of a matrix to higher dimension. It can be of arbitrary shape, e.g. (3 x 6 x 2 x 10). 

For the start, you can think of tensors as multidimensional arrays.

In [9]:
X = torch.tensor([1, 2, 3, 4, 5])
X

tensor([1, 2, 3, 4, 5])

In [10]:
X.shape

torch.Size([5])

In [11]:
X = torch.tensor([[1, 2, 3], [4, 5, 6]])
X

tensor([[1, 2, 3],
        [4, 5, 6]])

In [32]:
X.shape

(3, 5)

In [13]:
# numpy
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [14]:
# torch
torch.eye(3)

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [15]:
# numpy
5 * np.eye(3)

array([[5., 0., 0.],
       [0., 5., 0.],
       [0., 0., 5.]])

In [16]:
# torch
5 * torch.eye(3)

tensor([[5., 0., 0.],
        [0., 5., 0.],
        [0., 0., 5.]])

In [17]:
# numpy
np.ones(5)

array([1., 1., 1., 1., 1.])

In [18]:
# torch
torch.ones(5)

tensor([1., 1., 1., 1., 1.])

In [19]:
# numpy
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [20]:
# torch
torch.zeros(5)

tensor([0., 0., 0., 0., 0.])

In [21]:
# numpy
np.empty((3, 5))

array([[2.84682285e-316, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000, 0.00000000e+000]])

In [22]:
# torch
torch.empty((3, 5))

tensor([[-4.5315e-24,  4.5687e-41, -4.5315e-24,  4.5687e-41,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  7.0222e-37,  0.0000e+00],
        [ 7.0222e-37,  0.0000e+00,  7.0222e-37,  0.0000e+00,  0.0000e+00]])

In [23]:
# numpy
X = np.random.random((5, 3))
X

array([[0.95035403, 0.38749927, 0.96907378],
       [0.28689738, 0.63787234, 0.11452607],
       [0.93785669, 0.33011051, 0.91907239],
       [0.73236054, 0.15929606, 0.24060854],
       [0.9788027 , 0.42148306, 0.31876195]])

In [24]:
# torch
Y = torch.rand((5, 3))
Y

tensor([[0.0089, 0.1747, 0.9115],
        [0.3636, 0.4825, 0.2500],
        [0.5644, 0.4245, 0.2499],
        [0.8862, 0.4591, 0.2637],
        [0.6629, 0.0394, 0.4381]])

In [25]:
# numpy
X.shape

(5, 3)

In [26]:
# torch
Y.shape

torch.Size([5, 3])

___

## But wait: Why do we even need tensors if we can do exactly the same with numpy arrays?

`torch.tensor` behaves like numpy arrays under mathematical operations. However, `torch.tensor` additionally keeps track of the gradients (see next notebook) and provides GPU support.

____

## Linear Algebra Operations

In [39]:
X = np.random.rand(3, 5)
Y = torch.rand(3, 5)

In [40]:
# numpy (matrix multiplication)
X.T @ X

array([[0.92420952, 0.85478557, 0.92700345, 0.64887642, 0.6101654 ],
       [0.85478557, 1.13769941, 1.27679697, 1.05184043, 1.17233079],
       [0.92700345, 1.27679697, 1.53251109, 1.09956116, 1.22641085],
       [0.64887642, 1.05184043, 1.09956116, 1.1416102 , 1.34126516],
       [0.6101654 , 1.17233079, 1.22641085, 1.34126516, 1.61851119]])

In [41]:
Y.shape

torch.Size([3, 5])

In [42]:
# torch (matrix multiplication)
Y.t() @ Y

tensor([[0.1955, 0.3612, 0.5444, 0.3858, 0.3804],
        [0.3612, 1.1458, 0.9059, 1.1131, 0.9163],
        [0.5444, 0.9059, 1.5374, 0.9792, 0.9940],
        [0.3858, 1.1131, 0.9792, 1.2522, 1.2090],
        [0.3804, 0.9163, 0.9940, 1.2090, 1.3370]])

In [43]:
Y.t().matmul(Y)

tensor([[0.1955, 0.3612, 0.5444, 0.3858, 0.3804],
        [0.3612, 1.1458, 0.9059, 1.1131, 0.9163],
        [0.5444, 0.9059, 1.5374, 0.9792, 0.9940],
        [0.3858, 1.1131, 0.9792, 1.2522, 1.2090],
        [0.3804, 0.9163, 0.9940, 1.2090, 1.3370]])

In [44]:
# CAUTION: Operator '*' does element-wise multiplication, just like in numpy!
# Y.t() * Y  # error, dimensions do not match for element-wise multiplication

In [45]:
np.linalg.inv(X.T @ X)

array([[ 2.12079648e+14, -1.39012887e+15,  5.80898026e+14,
         1.09433403e+15, -4.20092572e+14],
       [-1.52673825e+15,  5.68394409e+15, -2.25388247e+15,
        -1.67120026e+15, -4.48678539e+14],
       [ 6.41815822e+14, -2.27900260e+15,  8.98249596e+14,
         5.44003824e+14,  2.77325668e+14],
       [ 1.29045228e+15, -2.25179981e+15,  7.66845987e+14,
        -2.25179981e+15,  2.42955213e+15],
       [-5.29826408e+14,  0.00000000e+00,  9.74260168e+13,
         2.25179981e+15, -1.74015677e+15]])

In [46]:
torch.inverse(Y.t() @ Y)

tensor([[-84827744.0000,  -8891802.0000,  27863942.0000,  24135774.0000,
         -12313025.0000],
        [ -7808281.0000,   2269471.7500,   2235538.5000,  -3138432.2500,
           1841960.3750],
        [ 27748394.0000,   2579338.7500,  -9079584.0000,  -7323563.0000,
           3710477.2500],
        [ 22254986.0000,  -3027290.2500,  -6738643.0000,   2971984.2500,
          -1934281.7500],
        [-11269010.0000,   1794120.2500,   3384314.0000,  -1958330.1250,
           1231144.3750]])

In [49]:
np.arange(2, 10, 2)

array([2, 3, 4, 5, 6, 7, 8, 9])

In [55]:
torch.arange(2, 10, 2)

tensor([2, 4, 6, 8])

In [56]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [68]:
torch.linspace(0, 1, 10)

tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889,
        1.0000])

## Your turn

**_Create the tensor:_**

$ \begin{bmatrix}
5 & 7 & 9 & 11 & 13 & 15 & 17 & 19
\end{bmatrix}  $

In [127]:
torch.arange(5, 20, 2)

tensor([ 5,  7,  9, 11, 13, 15, 17, 19])

## More on PyTorch Tensors

Each operation is also available as a function.

In [77]:
X = torch.rand(3, 2)
X

tensor([[0.3161, 0.3117],
        [0.9063, 0.9142],
        [0.6522, 0.0970]])

In [78]:
X + 2

tensor([[2.3161, 2.3117],
        [2.9063, 2.9142],
        [2.6522, 2.0970]])

In [79]:
torch.exp(X)

tensor([[1.3718, 1.3658],
        [2.4753, 2.4948],
        [1.9198, 1.1019]])

In [80]:
X.exp()

tensor([[1.3718, 1.3658],
        [2.4753, 2.4948],
        [1.9198, 1.1019]])

In [81]:
X.sqrt()

tensor([[0.5622, 0.5583],
        [0.9520, 0.9561],
        [0.8076, 0.3115]])

In [82]:
(X.exp() + 2).sqrt() - 2 * X.log().sigmoid()  # be creative :-)

tensor([[1.3559, 1.3593],
        [1.1646, 1.1649],
        [1.1903, 1.5844]])

Many more functions available: sin, cos, tanh, log, etc.

In [74]:
A = torch.eye(3)
A

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

In [75]:
A.add(5)

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [76]:
A

tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

Functions that mutate (in-place) the passed object end with an underscore, e.g. *add_*, *div_*, etc.

In [83]:
A.add_(5)

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [84]:
A

tensor([[6., 5., 5.],
        [5., 6., 5.],
        [5., 5., 6.]])

In [85]:
A.div_(3)

tensor([[2.0000, 1.6667, 1.6667],
        [1.6667, 2.0000, 1.6667],
        [1.6667, 1.6667, 2.0000]])

In [86]:
A

tensor([[2.0000, 1.6667, 1.6667],
        [1.6667, 2.0000, 1.6667],
        [1.6667, 1.6667, 2.0000]])

In [87]:
A.uniform_()  # fills the tensor with random uniform numbers in [0, 1]

tensor([[0.2454, 0.4780, 0.8237],
        [0.8041, 0.0625, 0.0369],
        [0.0940, 0.0669, 0.9224]])

In [88]:
A

tensor([[0.2454, 0.4780, 0.8237],
        [0.8041, 0.0625, 0.0369],
        [0.0940, 0.0669, 0.9224]])

## Indexing

Again, it works just like in numpy.

In [89]:
A = torch.randint(100, (3, 3))
A

tensor([[78, 32, 60],
        [14, 46, 69],
        [ 5, 11, 65]])

In [90]:
A[0, 0]

tensor(78)

In [97]:
A[2, 1]

tensor(11)

In [92]:
A[1]

tensor([14, 46, 69])

In [93]:
A[:, 1]

tensor([32, 46, 11])

In [94]:
A[1:2, :], A[1:2, :].shape

(tensor([[14, 46, 69]]), torch.Size([1, 3]))

In [95]:
A[1:, 1:]

tensor([[46, 69],
        [11, 65]])

In [96]:
A[:2, :2]

tensor([[78, 32],
        [14, 46]])

_____

## Reshaping & Expanding

In [110]:
X = torch.tensor([1, 2, 3, 4])
X

tensor([1, 2, 3, 4])

In [111]:
X = X.repeat(3, 1) # repeat it 3 times along 0th dimension and 2 times along first dimension
X, X.shape

(tensor([[1, 2, 3, 4],
         [1, 2, 3, 4],
         [1, 2, 3, 4]]), torch.Size([3, 4]))

In [112]:
# equivalent of 'reshape' in numpy (view does not allocate new memory!)
Y = X.view(2, 6)
Y

tensor([[1, 2, 3, 4, 1, 2],
        [3, 4, 1, 2, 3, 4]])

In [113]:
Y = X.view(-1)  # -1 tells PyTorch to infer the number of elements along that dimension
Y, Y.shape

(tensor([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]), torch.Size([12]))

In [114]:
Y = X.view(-1, 2)
Y, Y.shape

(tensor([[1, 2],
         [3, 4],
         [1, 2],
         [3, 4],
         [1, 2],
         [3, 4]]), torch.Size([6, 2]))

In [116]:
Y = X.view(-1, 4)
Y, Y.shape

(tensor([[1, 2, 3, 4],
         [1, 2, 3, 4],
         [1, 2, 3, 4]]), torch.Size([3, 4]))

In [117]:
Y = torch.ones(5)
Y, Y.shape

(tensor([1., 1., 1., 1., 1.]), torch.Size([5]))

In [118]:
Y = Y.view(-1, 1)
Y, Y.shape

(tensor([[1.],
         [1.],
         [1.],
         [1.],
         [1.]]), torch.Size([5, 1]))

In [119]:
Y.expand(5, 5)  # similar to repeat but does not actually allocate new memory

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [120]:
X = torch.eye(4)
Y = X[3:, :]
Y, Y.shape

(tensor([[0., 0., 0., 1.]]), torch.Size([1, 4]))

In [121]:
Y = Y.squeeze() # removes all dimensions of size '1'
Y, Y.shape

(tensor([0., 0., 0., 1.]), torch.Size([4]))

In [122]:
Y = Y.unsqueeze(1)
Y, Y.shape

(tensor([[0.],
         [0.],
         [0.],
         [1.]]), torch.Size([4, 1]))

## Your turn!

**_Create the tensor:_**

$ \begin{bmatrix}
7 & 5 & 5 & 5 & 5 \\
5 & 7 & 5 & 5 & 5 \\
5 & 5 & 7 & 5 & 5 \\
5 & 5 & 5 & 7 & 5 \\
5 & 5 & 5 & 5 & 7 
\end{bmatrix}  $

Hint: You can use matrix sum and scalar multiplication

In [196]:
torch.eye(5) * 2 + 5

tensor([[7., 5., 5., 5., 5.],
        [5., 7., 5., 5., 5.],
        [5., 5., 7., 5., 5.],
        [5., 5., 5., 7., 5.],
        [5., 5., 5., 5., 7.]])

**_Create the tensor:_**

$ \begin{bmatrix}
4 & 6 & 8 & 10 & 12 \\
14 & 16 & 18 & 20 & 22 \\
24 & 26 & 28 & 30 & 32
\end{bmatrix}$

In [130]:
torch.arange(4, 34, 2).view(3,5)

tensor([[ 4,  6,  8, 10, 12],
        [14, 16, 18, 20, 22],
        [24, 26, 28, 30, 32]])

**_Create the tensor:_**

$ \begin{bmatrix}
2 & 2 & 2 & 2 & 2 \\
4 & 4 & 4 & 4 & 4 \\
6 & 6 & 6 & 6 & 6 \\
8 & 8 & 8 & 8 & 8
\end{bmatrix}  $

In [179]:
torch.arange(2,10,2).repeat(5,1).t()

tensor([[2, 2, 2, 2, 2],
        [4, 4, 4, 4, 4],
        [6, 6, 6, 6, 6],
        [8, 8, 8, 8, 8]])

_____

## Reductions

In [147]:
X = torch.randint(10, (3, 4)).float()
X

tensor([[5., 0., 9., 6.],
        [4., 8., 2., 7.],
        [1., 7., 6., 2.]])

In [148]:
X.sum()

tensor(57.)

In [149]:
X.sum().item()

57.0

In [152]:
X.sum(0) # colum-wise sum

tensor([10., 15., 17., 15.])

In [153]:
X.sum(dim=1)  # row-wise sum

tensor([20., 21., 16.])

In [154]:
X.mean()

tensor(4.7500)

In [155]:
X.mean(dim=1)

tensor([5.0000, 5.2500, 4.0000])

In [156]:
X.norm(dim=0)

tensor([ 6.4807, 10.6301, 11.0000,  9.4340])

## Your turn!

Compute the norms of the row-vectors in matrix **X** without using _torch.norm()_.

Remember: $$||\vec{v}||_2 = \sqrt{x_1^2 + x_2^2 + \dots + x_n^2}$$

Hint: _X\*\*2_ computes the element-wise square.

In [198]:
X = torch.eye(4) + torch.arange(4).repeat(4, 1).float()

# YOUR TURN

(X**2).sum(1).sqrt()

# SOLUTION: tensor([3.8730, 4.1231, 4.3589, 4.5826]

tensor([3.8730, 4.1231, 4.3589, 4.5826])

## Masking

In [160]:
X = torch.randint(100, (5, 3))
X

tensor([[33, 67, 63],
        [67, 94, 87],
        [44, 13, 38],
        [85, 43, 78],
        [ 4, 83, 98]])

In [161]:
mask = (X > 25) & (X < 75)
mask

tensor([[1, 1, 1],
        [1, 0, 0],
        [1, 0, 1],
        [0, 1, 0],
        [0, 0, 0]], dtype=torch.uint8)

In [162]:
X[mask]  # returns all elements matching the criteria in a 1D-tensor

tensor([33, 67, 63, 67, 44, 38, 43])

In [163]:
mask.sum()  # number of elements that fulfill the condition

tensor(7)

In [164]:
(X == 25) | (X > 60)

tensor([[0, 1, 1],
        [1, 1, 1],
        [0, 0, 0],
        [1, 0, 1],
        [0, 1, 1]], dtype=torch.uint8)

## Your turn!

Get the number of non-zeros in **X**

In [201]:
X = torch.tensor([[1, 0, 2], [0, 6, 0]])

# YOUR TURN
(X != 0).sum()

tensor(3)

Compute the sum of all entries in X that are larger than the mean of all values in X.

In [202]:
# YOUR TURN
mean = X.float().mean().item()
X[X > mean].sum()

tensor(8)

______

## Some useful properties of tensors

In [174]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.shape: \n%s\n" % (x.shape,))
print("x.size(): \n%s\n" % (x.size(),))
print("x.size(1): \n%s\n" % x.size(1))
print("x.dim(): \n%s\n" % x.dim())

print("x.dtype: \n%s\n" % x.dtype)
print("x.device: \n%s\n" % x.device)

x.shape: 
torch.Size([2, 3])

x.size(): 
torch.Size([2, 3])

x.size(1): 
3

x.dim(): 
2

x.dtype: 
torch.float32

x.device: 
cpu



The `nonzero` function returns indices of the non zero elements.

In [180]:
x = torch.Tensor([[0,1,2], [3,4,5]])

print("x.nonzero(): \n%s\n" % x.nonzero())

x.nonzero(): 
tensor([[0, 1],
        [0, 2],
        [1, 0],
        [1, 1],
        [1, 2]])



In [None]:
# press tab to autocomplete
#x.

___

## Converting between PyTorch and numpy

In [181]:
X = np.random.random((5,3))
X

array([[0.86311892, 0.02924595, 0.76903581],
       [0.35833978, 0.95312013, 0.54224762],
       [0.45531708, 0.38173709, 0.21400482],
       [0.84817502, 0.06125757, 0.99774725],
       [0.42526364, 0.76116302, 0.12108892]])

In [182]:
# numpy ---> torch
Y = torch.from_numpy(X)  # Y is actually a DoubleTensor (i.e. 64-bit representation)
Y

tensor([[0.8631, 0.0292, 0.7690],
        [0.3583, 0.9531, 0.5422],
        [0.4553, 0.3817, 0.2140],
        [0.8482, 0.0613, 0.9977],
        [0.4253, 0.7612, 0.1211]], dtype=torch.float64)

In [183]:
Y = torch.rand((2,4))
Y

tensor([[0.7002, 0.5147, 0.7586, 0.3482],
        [0.8200, 0.6556, 0.2769, 0.2919]])

In [184]:
# torch ---> numpy
X = Y.numpy()
X

array([[0.70016754, 0.5147412 , 0.75861543, 0.34824252],
       [0.8199515 , 0.6555791 , 0.27687383, 0.29188764]], dtype=float32)

____

## Using GPUs 

Using **GPU** in pytorch is as simple as calling **`.cuda()`** on your tensor.

But first, you may want to check: 
 - that cuda can actually be used : `torch.cuda.is_available()`
 - how many gpus are available : `torch.cuda.device_count()`

In [185]:
torch.cuda.is_available()

False

In [186]:
torch.cuda.device_count()

0

In [187]:
x = torch.Tensor([[1,2,3], [4,5,6]])
print(x)

tensor([[1., 2., 3.],
        [4., 5., 6.]])


### tensor.cuda

_Note : If you don't have Cuda on the machine, the following examples won't work_

In [None]:
x.cuda(0)
print(x.device)
x = x.cuda(0)
print(x.device)
x = x.cuda(1)
print(x.device)

In [None]:
x = torch.Tensor([[1,2,3], [4,5,6]])

# This will generate an error since you cannot do operation on tensor that are not on the same device
x + x.cuda()

#### Write an if statement that moves x on gpu if cuda is available

In [191]:
if torch.cuda.is_available():
    x = x.cuda()

These kinds of if statements used to be all over the place in people's pytorch code. Recently, a more flexible way was introduced:

### torch.device

A **`torch.device`** is an object representing the device on which a torch.tensor is or will be allocated.

You can easily move a tensor from a device to another by using the **`tensor.to()`** function

In [194]:
cpu = torch.device('cpu')
cuda_0 = torch.device('cuda:0')

x = x.to(cpu)
print(x.device)
#x = x.to(cuda_0)
#print(x.device)

cpu


It can be more flexible since you can check if cuda exists only once in your code

In [195]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = x.to(device)  # We don't need to care anymore about whether cuda is available or not
print(x.device)

cpu


#### Timing GPU

How much faster is GPU ?  See for yourself ...

In [None]:
A = torch.rand(100, 1000, 1000)
B = A.cuda(1)
A.size()

In [None]:
%timeit -n 3 torch.bmm(A, A)

In [None]:
%timeit -n 30 torch.bmm(B, B)

___

## Don't forget to download the notebook, otherwise your changes will be lost!

![Download the notebook](figures/notebook-download.png)