# 1. PyTorch Basics - Tensor Types, Tensor Operations and Methods 

### About this notebook

This notebook was used in the 50.039 Deep Learning course at the Singapore University of Technology and Design.

**Author:** Matthieu DE MARI (matthieu_demari@sutd.edu.sg)

**Version:** 1.2 (16/06/2023)

**Requirements:**
- Python 3 (tested on v3.11.4)
- Matplotlib (tested on v3.7.1)
- Numpy (tested on v1.24.3)
- Time (default Python library)
- Torch (tested on v2.0.1+cu118)

### Imports

In [1]:
# Matplotlib
import matplotlib.pyplot as plt
from matplotlib import cm
# Numpy
import numpy as np
from numpy.random import default_rng
# Torch
import torch
# Time
import time

### On the benefits of using PyTorch to implement a Neural Network

There are several benefits to using PyTorch over NumPy for implementing neural networks:

- PyTorch provides a more intuitive interface for working with tensors and neural networks. NumPy is primarily a numerical computing library, and while it can be used to perform operations on arrays that are similar to those used in neural networks, PyTorch is specifically designed with deep learning in mind and provides a more natural and convenient interface for defining and training neural networks.
- PyTorch has better support for GPU acceleration than NumPy. If a GPU is available, it can significantly speed up the training of our neural network by performing the computations on the GPU using PyTorch. This can be especially useful for training large and complex models.
- PyTorch includes a number of high-level abstractions for building and training neural networks, such as nn.Module, nn.Sequential, and optim. These abstractions make it easier to write and debug code, and can also improve the performance of our model by allowing PyTorch to apply optimization techniques such as graph fusion and automatic differentiation (which is nice as we will no longer have to worry about the gradient update rules to use!).
- PyTorch has a large and active community of users, coming witha wealth of online resources and documentation to help troubleshoot any issues.

Overall, while NumPy is a powerful library for numerical computing, but PyTorch is a more effective choice for implementing and training neural networks, especially if when taking advantage of GPU acceleration or when using more advanced features such as automatic differentiation.

### Setting a GPU/CPU device for computation

You can check for CUDA/GPU capabilities, using the line below. If the CUDA has not been properly installed or the GPU is not compatible, you will be using a CPU instead.

We strongly advise to take a moment to make sure your machine is CUDA enabled, assuming your GPU is compatible. When CUDA is properly installed on a compatible GPU, the line below should display *cuda*, otherwise it will print *cpu*.

In [2]:
# Use GPU if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


### The Tensor object

Tensors are a data structure that are very similar to arrays and matrices. The tensor is PyTorch's basic building block and similar to NumPy arrays, which is why most of the concepts and methods will look similar. However, these come with additional features, which will be useful later on when building Neural Networks with these tensors.

They can be initialized as in NumPy, by using **zeros()** or **ones()** functions, specifying dimensions with tuples. 

In [3]:
# Create a 2D Numpy array and a PyTorch tensor,
# both of size 2 by 5, filled with ones.
ones_array = np.ones((2, 5))
print(ones_array)
ones_tensor = torch.ones(size = (2, 5))
print(ones_tensor)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])


In [4]:
# Create a 1D tensor of size 3, filled zeros.
# (Pay attention to the extra comma in the tuple.)
zeros_tensor = torch.zeros(size = (3, ))
print(zeros_tensor)

tensor([0., 0., 0.])


You could also create a tensor directly from a list (or a list of lists), as shown below.

In [5]:
# Create a Tensor from a list, directly
l = [1, 2, 3, 4]
list_tensor = torch.tensor(l)
print(list_tensor)

tensor([1, 2, 3, 4])


You can also transform a NumPy array into a tensor, using the **from_numpy()** function.

In [6]:
# From a numPy array
numpy_array = np.array([0.1, 0.2, 0.3])
numpy_tensor = torch.from_numpy(numpy_array)
print(numpy_tensor)

tensor([0.1000, 0.2000, 0.3000], dtype=torch.float64)


PyTorch tensors have an attribute called **dtype**, which tracks the types of values stored in the tensor. The most common one is *torch.float64*, but other dtypes exist. See https://www.tensorflow.org/api_docs/python/tf/dtypes for more details on the possible dtypes.

It is possible to change the **dtype** of a tensor
- by either specifying it during its creation;
- or by using the **type()** method on the tensor, specifying a new dtype to use.

In [7]:
# Create a Tensor from a list, directly
# Forcing dtype to be integers on 32bits.
l = [1, 2, 3, 4]
list_tensor = torch.tensor(l, dtype = torch.int32)
print(list_tensor)
# Changing to float 64bits
list_tensor2 = list_tensor.type(torch.float64)
print(list_tensor2)

tensor([1, 2, 3, 4], dtype=torch.int32)
tensor([1., 2., 3., 4.], dtype=torch.float64)


In general, operations between tensors require compatible, and sometimes **identical** dtypes.

In [8]:
# Create a Tensor from a list, directly
# Forcing dtype to be integers on 32bits.
l = [1, 2, 3, 4]
list_tensor = torch.tensor(l, dtype = torch.int32)
print(list_tensor)
# Changing to float 64bits
l2 = [2, 4, 6, 8]
list_tensor2 = torch.tensor(l2, dtype = torch.double)
print(list_tensor2)
# SOme operations on tensors with different datatypes might be problematic
list_tensor3 = torch.dot(list_tensor, list_tensor2)
print(list_tensor3)

tensor([1, 2, 3, 4], dtype=torch.int32)
tensor([2., 4., 6., 8.], dtype=torch.float64)


RuntimeError: dot : expected both vectors to have same dtype, but found Int and Double

Tensors can also be initialized using random generators, as in NumPy. For instance we can use **rand()** for drawing random values in a $ [0, 1] $ uniform distribution, or use **randn()** to draw values from a normal distribution with zero mean and variance one.

Functions and methods both exist for calculating mean values of a tensor, its standard deviation/variance, etc.

Seeding is done with **torch.manual_seed()**.

In [9]:
# Create a 3D tensor, of size 3 by 2 by 2, filled with random values
# drawn from a uniform [0, 1] distribution.
rand_unif_tensor = torch.rand(size = (3, 2, 2))
print(rand_unif_tensor)
# Calculate mean with function (should be close to 0.5)
val = torch.mean(rand_unif_tensor)
print(val)

tensor([[[0.7346, 0.9654],
         [0.1103, 0.4782]],

        [[0.6199, 0.3018],
         [0.2884, 0.8768]],

        [[0.3638, 0.1911],
         [0.8069, 0.6448]]])
tensor(0.5318)


In [10]:
# Seeding
torch.manual_seed(17)

# Create a 4D tensor, of size 4 by 2 by 3 by 7, filled with random values
# drawn from a normal distribution with zero mean and variance one.
rand_normal_tensor = torch.randn(size = (4, 2, 3, 7))
print(rand_normal_tensor.shape)

# Calculate mean with method (should be close to 0)
# (With see 17, should be 0.0865)
val = rand_normal_tensor.mean()
print(val)

torch.Size([4, 2, 3, 7])
tensor(0.0865)


Finally, you can ask the shape of a tensor, like in NumPy, using the **shape** attribute.

In [11]:
print(rand_normal_tensor.shape)

torch.Size([4, 2, 3, 7])


Tensors have **(way too) many** functions and methods you could use, just like the NumPy arrays.

You know the drill, RTFM! (https://pytorch.org/docs/stable/torch.html)

In [12]:
print(dir(rand_normal_tensor))

['H', 'T', '__abs__', '__add__', '__and__', '__array__', '__array_priority__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__div__', '__dlpack__', '__dlpack_device__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__iand__', '__idiv__', '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__r

### Accesing, slicing, updating, browsing

As we said, tensors are very similar to NumPy arrays. All the typical element-wise operations therefore work on tensors as well. For instance, we can:
- access elements using the square bracket notation, multiple square bracket notations and multiple indexes in a single square bracket;
- slice a tensor using the square bracket notation and colon symbol;
- update elements of a tensor using the square bracket notation;
- browse thropugh elements of a tensor using a for loop.

In [13]:
# Create a 3D tensor, of size 3 by 2 by 2, filled with random values
# drawn from a uniform [0, 1] distribution.
torch.manual_seed(17)
rand_unif_tensor = torch.rand(size = (3, 2, 2))
print(rand_unif_tensor)

# Indexing
element1 = rand_unif_tensor[2]
print("Element1: ", element1)
element2 = rand_unif_tensor[2][0]
print("Element2: ", element2)
element3 = rand_unif_tensor[2, 0, 1]
print("Element3: ", element3)

tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]],

        [[0.4683, 0.0172],
         [0.0214, 0.3664]]])
Element1:  tensor([[0.4683, 0.0172],
        [0.0214, 0.3664]])
Element2:  tensor([0.4683, 0.0172])
Element3:  tensor(0.0172)


In [14]:
# Create a 3D tensor, of size 3 by 2 by 2, filled with random values
# drawn from a uniform [0, 1] distribution.
torch.manual_seed(17)
rand_unif_tensor = torch.rand(size = (3, 2, 2))
print(rand_unif_tensor)

# Slicing
slice1 = rand_unif_tensor[0:2]
print("Slice1: ", slice1)
slice2 = rand_unif_tensor[:2]
print("Slice2: ", slice2)
slice3 = rand_unif_tensor[1:]
print("Slice3: ", slice3)
slice4 = rand_unif_tensor[0, :, :]
print("Slice4: ", slice4)

tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]],

        [[0.4683, 0.0172],
         [0.0214, 0.3664]]])
Slice1:  tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]]])
Slice2:  tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]]])
Slice3:  tensor([[[0.0293, 0.5494],
         [0.3825, 0.5463]],

        [[0.4683, 0.0172],
         [0.0214, 0.3664]]])
Slice4:  tensor([[0.4342, 0.5351],
        [0.8302, 0.1239]])


In [15]:
# Create a 3D tensor, of size 3 by 2 by 2, filled with random values
# drawn from a uniform [0, 1] distribution.
torch.manual_seed(17)
rand_unif_tensor = torch.rand(size = (3, 2, 2))
print(rand_unif_tensor)

# Before
element4 = rand_unif_tensor[2, 1, 1]
print("Element4: ", element4)
# Updating
rand_unif_tensor[2, 1, 1] = 0.5
# After
element4 = rand_unif_tensor[2, 1, 1]
print("New Element4: ", element4)

tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]],

        [[0.4683, 0.0172],
         [0.0214, 0.3664]]])
Element4:  tensor(0.3664)
New Element4:  tensor(0.5000)


In [16]:
# Create a 3D tensor, of size 3 by 2 by 2, filled with random values
# drawn from a uniform [0, 1] distribution.
torch.manual_seed(17)
rand_unif_tensor = torch.rand(size = (3, 2, 2))
print(rand_unif_tensor)

# Browsing
for sub_tensor in rand_unif_tensor:
    print("---")
    print(sub_tensor)

tensor([[[0.4342, 0.5351],
         [0.8302, 0.1239]],

        [[0.0293, 0.5494],
         [0.3825, 0.5463]],

        [[0.4683, 0.0172],
         [0.0214, 0.3664]]])
---
tensor([[0.4342, 0.5351],
        [0.8302, 0.1239]])
---
tensor([[0.0293, 0.5494],
        [0.3825, 0.5463]])
---
tensor([[0.4683, 0.0172],
        [0.0214, 0.3664]])


### Operations on Tensors

All NumPy array operations work on tensors and equivalent methods have been writen in torch as well.

In [17]:
# Define two simple 2D tensors
a = torch.tensor([[1, 2, 3], [1, 2, 3]])
b = torch.tensor([[1, 2, 3], [4, 5, 6]])

In [18]:
# Element-wise addition
c = a + b
print(c)
c = torch.add(a, b)
print(c)

tensor([[2, 4, 6],
        [5, 7, 9]])
tensor([[2, 4, 6],
        [5, 7, 9]])


In [19]:
# Element-wise substraction
c = a - b
print(c)
c = torch.sub(a, b)
print(c)

tensor([[ 0,  0,  0],
        [-3, -3, -3]])
tensor([[ 0,  0,  0],
        [-3, -3, -3]])


In [20]:
# Element-wise multiplication
c = a * b
print(c)
c = torch.mul(a,b)
print(c)

tensor([[ 1,  4,  9],
        [ 4, 10, 18]])
tensor([[ 1,  4,  9],
        [ 4, 10, 18]])


In [21]:
# Element-wise division
c = a / b
print(c)
c = torch.div(a, b)
print(c)

tensor([[1.0000, 1.0000, 1.0000],
        [0.2500, 0.4000, 0.5000]])
tensor([[1.0000, 1.0000, 1.0000],
        [0.2500, 0.4000, 0.5000]])


In [22]:
# Transposition 
c = a.T
print(a)
print(c)

tensor([[1, 2, 3],
        [1, 2, 3]])
tensor([[1, 1],
        [2, 2],
        [3, 3]])


In [23]:
# Transpose and swap dimensions 0 and 1
# (could specify other dimensions if ND tensor)
d = b.transpose(0, 1)
print(b)
print(d)

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])


The matrix multiplication, not to be confused with the element-wise multiplication, is performed using the **matmul()** function.

The dot product, on the other hand, is typically performed using the **dot()** function. When used on two 1D tensors, you obtain the inner product. When used on two 2D tensors, it is equivalent to matmul().

In [24]:
# Matrix multiplication
e = torch.matmul(a, d)
print(e)

tensor([[14, 32],
        [14, 32]])


In [25]:
# Define two simple 1D tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Dot operation, used for computing
# the dot product of two 1D tensors.
f = torch.dot(a, b)
print(f)
g = torch.matmul(a, b.T)
print(g)

tensor(32)
tensor(32)


  g = torch.matmul(a, b.T)


### A quick note on broadcasting

Tensors, just like NumPy arrays, support broadcasting. Two tensors are “broadcastable” if the following rules hold:
- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist.

If two tensors x, y are “broadcastable”, the resulting tensor size is calculated as follows:
- If the number of dimensions of x and y are not equal, prepend 1 to the dimensions of the tensor with fewer dimensions to make them equal length.
- Then, for each dimension size, the resulting dimension size is the max of the sizes of x and y along that dimension.

In [26]:
# Same shapes are always broadcastable
# (i.e. the above rules always hold)
x = torch.ones(5, 7, 3)
y = torch.ones(5, 7, 3)
z = (x+y)
print(z.shape)

torch.Size([5, 7, 3])


In [27]:
# Tensors x and y are not broadcastable,
# because x does not have at least 1 dimension
x = torch.ones((0,))
y = torch.ones(2,2)
z = (x+y)
print(z.shape)

RuntimeError: The size of tensor a (0) must match the size of tensor b (2) at non-singleton dimension 1

In [28]:
# You can line up trailing dimensions
# Tensors x and y are broadcastable.
# 1st trailing dimension: both have size 1
# 2nd trailing dimension: y has size 1, using size of x and broadcasting
# 3rd trailing dimension: x size is same as y size
# 4th trailing dimension: y dimension doesn't exist, using x only
x = torch.ones(5, 3, 4, 1)
y = torch.ones(3, 1, 1)
z = (x+y)
print(z.shape)

torch.Size([5, 3, 4, 1])


In [29]:
# However, x and y are not broadcastable, 
# because of third trailing dimension (2 != 3).
x = torch.ones(5, 2, 4, 1)
y = torch.ones(3, 1, 1)
z = (x+y)
print(z.shape)

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1

### A quick note on tensor locations

By default, all tensors are used by the CPU when performing calculations on them. If your device has been enabled for GPU/CUDA computation, you will have to transfer the tensor to the GPU for faster computation. This is done in three ways:
- Using **.to(device)** method will transfer to the best device available for computation (we defined the value of the device variable earlier, when we checked for cuda/cpu).
- Using **.cpu()** or **.cuda()** will force transfer to the cpu or cuda respectively. Note that it might fail if you machine is not CUDA compatible.

In doubt, you can check the device attribute of your tensors to find where their computations will occur. In general, two tensors with different devices cannot be used for the same computation!

In [30]:
# A tensor will by default be hosted on CPU
a = torch.ones(2, 3)
print(a)
print(a.device)

tensor([[1., 1., 1.],
        [1., 1., 1.]])
cpu


In [31]:
# Best option, use GPU/CUDA if available, else use CPU
b = torch.ones(2, 3).to(device)
print(b)
print(b.device)

tensor([[1., 1., 1.],
        [1., 1., 1.]], device='cuda:0')
cuda:0


In [32]:
# Force tensor to CPU
c = torch.ones(2, 3).cpu()
print(c)
print(c.device)

tensor([[1., 1., 1.],
        [1., 1., 1.]])
cpu


In [33]:
# Force tensor to GPU/CUDA
# (will fail if not CUDA compatible)
d = torch.ones(2, 3).cuda()
print(d)
print(d.device)

tensor([[1., 1., 1.],
        [1., 1., 1.]], device='cuda:0')
cuda:0


In [34]:
# Operations require tensors
# to be on same device
c = torch.ones(2, 3).cpu()
print(c)
print(c.device)
d = torch.ones(2, 3).cuda()
print(d)
print(d.device)
f = c + d
print(f)

tensor([[1., 1., 1.],
        [1., 1., 1.]])
cpu
tensor([[1., 1., 1.],
        [1., 1., 1.]], device='cuda:0')
cuda:0


RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

### A few practice activities

In order to practice your PyTorch Tensor skills, you may try to manually implement your own version of typical algorithms we ran on lists/Numpy arrays in previous classes, using the basic operations on PyTorch tensors.

For instance, try writing algorithms:
- Finding the maximum, minimum, average, median values of a given 1D tensor,
- Transposing a given 2D tensor,
- Sorting a given 1D tensor (bubble sort, insertion sort, selection sort, quick sort, merge sort),
- Generating a 1D array containing the first K Fibonacci numbers with K given,
- Etc.

Later, you can check their performance times compared to their numpy/pytorch implementations when running them on both CPU and CUDA (if available).

In which scenarios is it slower to implement said functions and run them on GPU?

### What's next?

In the next notebook, we will investigate how to use the PyTorch framework, and start implementing Neural Networks more efficiently, starting with the init, forward propagation and loss for our model.