In [1]:
import torch
import numpy as np
print(f'PyTorch version: {torch.__version__}')
print(f'Is MPS available? {torch.backends.mps.is_available()}')
print(f'Does MPS exists? {torch.backends.mps.is_built()}')

PyTorch version: 2.6.0
Is MPS available? True
Does MPS exists? True


# Introduction

- *Neural networks are powerful tools for tackling a wide variety of problems, thanks to their ability to learn and extract patterns from data. For instance, they can be used to detect tumors in medical images or convert text into audio.*
- *How do we train these models? By example. Imagine you want to teach a neural network to distinguish between dogs and cats. The first step is to collect a large set of images containing both animals, and label each image as either "dog" or "cat." The model uses these image-label pairs to learn the patterns and features that differentiate the two classes. After training, we compare the model’s predictions to the actual labels and adjust its parameters to improve accuracy.*
- *Here’s the catch: computers don’t understand images, text, or audio the way humans do—they only process numbers. So, before training a neural network, we need to convert our data into a numerical format that the computer can work with.*
- *PyTorch makes this easy by providing a data structure called a **tensor**. Tensors let us represent all kinds of data—images, text, audio—as numbers. In PyTorch, a tensor is essentially a **multidimensional array** (think NumPy arrays), but with extra features that make them ideal for deep learning tasks.*

<img src="attachments/tensors-in-pytorch.png" width="1400" height="600" style="display: block; margin: 0 auto;" />

## Create tensors

### Scalars

In [2]:
# Create a scalar
scalar = torch.tensor(7.0)
scalar

tensor(7.)

- *Tensors are n-dimensional arrays. The number of dimensions tells us how many indices we need to access a single value. For example, a scalar is a 0-dimensional tensor—no indices required to get its value.*
- *As we add dimensions, we move from scalars to vectors, matrices, and beyond.*

In [6]:
scalar.ndim

0

- *You can extract the number inside the tensor with the `item()` method.*

In [7]:
# Get a tensor back as an integer or float
print(f'Escalar: {scalar.item()}. Type: {type(scalar.item())}')

Escalar: 7.0. Type: <class 'float'>


### Vectors

- *Vectors are **one-dimensional** tensors. Using the `.ndim` attribute, we can see that the tensor we created has a single dimension (because to get a scalar we need a single index).*

In [8]:
# Create a vector
vector = torch.tensor([7, 7])
print(f'Vector = {vector}')
print(f'Dimensions: {vector.ndim}')

Vector = tensor([7, 7])
Dimensions: 1


- *If we use the `.shape` attribute, we can see the size of each dimension. In this case, our tensor has only one dimension with two elements.*

In [9]:
print(f'Shape: {vector.shape}')

Shape: torch.Size([2])


### Matrices

- *Matrices are **two-dimensional** tensors, where the first dimension represents rows and the second dimension represents columns. The size of each dimension refers to the number of rows and columns in the matrix.*

In [13]:
MATRIX = torch.tensor([[7, 8], [8, 9]])
print(f'Matrix =\n {MATRIX}')
print(f'Dimensions: {MATRIX.ndim}')
print(f'Shape: {MATRIX.shape}')

Matrix =
 tensor([[7, 8],
        [8, 9]])
Dimensions: 2
Shape: torch.Size([2, 2])


### Tensors

- *While vectors and matrices are, at least in PyTorch, 1D and 2D tensors, respectively, we call **n-dimensional** arrays (with $n > 2$) tensors. For example, below we have a three-dimensional tensor.*

In [14]:
# Create a tensor
TENSOR = torch.tensor([[[1, 2, 3],
                        [4, 5, 6],
                        [7, 8, 9]]])
TENSOR

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])

In [15]:
print(f'Dimensions: {TENSOR.ndim}')
print(f'Shape: {TENSOR.shape}')

Dimensions: 3
Shape: torch.Size([1, 3, 3])


- *To create a tensor using other tensor(s) as input(s), you need to use functions like `torch.stack()` or `torch.vstack()`.*
- *The `torch.tensor()` class only accepts native Python objects. In this case, we use the `torch.stack()` function which allows us to concatenate tensors along a new dimension.*

In [16]:
A = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = torch.tensor([[10, 11, 12], [13, 14, 15], [16, 17, 18]])

TENSOR = torch.stack([A, B])

print(f'Dimension of A = {A.ndim}, Shape of A = {A.shape}')
print(f'Dimension of B = {B.ndim}, Shape of B = {B.shape}')
print(f'Dimension of TENSOR = {TENSOR.ndim}, Shape of TENSOR = {TENSOR.shape}')

Dimension of A = 2, Shape of A = torch.Size([3, 3])
Dimension of B = 2, Shape of B = torch.Size([3, 3])
Dimension of TENSOR = 3, Shape of TENSOR = torch.Size([2, 3, 3])


In [17]:
TENSOR

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]]])

## Tensors in memory

- *Unlike native Python data structures like lists, tensors are stored in memory as a **single contiguous block**. What does this mean? Unlike lists, where each element is a Python object that has its own space in memory, tensors are stored in a single memory space. That is, no matter how many dimensions our tensor has, it's stored in memory as a one-dimensional vector where each element is located next to each other. This makes it much faster to sequentially access elements in a tensor and much easier to vectorize operations with them.*

- *This is related to PyTorch's architecture. When we create an instance of `torch.Tensor`, we also create an instance of `torch.Storage`. This `torch.Storage` instance is nothing more than a one-dimensional vector that contains the data in memory. The `torch.Tensor` instance, which is what we users interact with, is a view of the `torch.Storage` instance, that is, a view of the data in memory. This makes PyTorch very efficient in data handling, because it allows us to create multiple tensors without having to duplicate our data in memory (i.e., we only need to create a new reference to the object in memory).*

- *The following image is very representative of how a list (or other native Python objects) is stored in memory compared to tensors:*
<figure>
    <img src="attachments/tensors-in-memory.png" width="1400" height="600" align="center" style="display: block; margin: 0 auto;"/>
    <figcaption style="text-align: center;">Stevens, Eli. (2020). Python object (boxed) numeric values versus tensor (unboxed array) numeric values. In Stevens, Eli, <i>Deep Learning with PyTorch</i> (p. 44).</figcaption>
</figure>

### Metadata

- *As we already mentioned, tensors are stored in memory as one-dimensional vectors, even though we may have created them with more than one dimension. To be able to view tensors with the desired dimensions, the `torch.Tensor` object contains several important attributes:*
    - *`size`. Tells us how many elements the tensor has in each dimension.*
    - *`stride`. Tells us how many elements in memory we need to move to get the next element of a dimension. For example, in the image below we can see that to get the next element in the second dimension (columns) we need to move one place in memory, and to get the next element in the first dimension (rows) we need to move three places in memory.*
    - *`offset`. Tells us what the first element of the tensor is in memory.*
    - *`storage`. Tells us where the tensor is stored in memory.*
<figure>
    <img src="attachments/tensor-metadata.png" width="900" height="700" align="center" style="display: block; margin: 0 auto;"/>
    <figcaption style="text-align: center;">Stevens, Eli. (2020). Relationship between tensor offser, size, and stride. In Stevens, Eli, <i>Deep Learning with PyTorch</i> (p. 56).</figcaption>
</figure>

## Random tensors

- *Why do we want to learn to generate random tensors? Because neural network weights are initialized this way. So, when starting the training of our neural network, we must pass it a tensor that represents the initial weights, which we need to be able to generate randomly.*

In [20]:
# Tensor de dos dimensiones de tamaño (3, 4)
random_tensor = torch.rand(size=(2, 3))
print(f'random_tensor =\n {random_tensor}')

random_tensor =
 tensor([[0.6386, 0.5047, 0.0468],
        [0.9551, 0.8437, 0.5572]])


In [22]:
print(f'Dimensions: {random_tensor.ndim}')
print(f'Size: {random_tensor.shape}')

Dimensions: 2
Size: torch.Size([2, 3])


In [23]:
random_tensor = torch.rand(size=(10, 2, 3))
print(f'random_tensor =\n {random_tensor}')

random_tensor =
 tensor([[[0.2895, 0.8592, 0.3154],
         [0.7859, 0.4792, 0.2665]],

        [[0.1918, 0.3231, 0.9610],
         [0.7503, 0.5626, 0.2700]],

        [[0.2092, 0.8292, 0.7639],
         [0.6841, 0.6064, 0.9029]],

        [[0.4516, 0.9106, 0.9012],
         [0.2175, 0.6269, 0.4178]],

        [[0.9009, 0.8265, 0.8337],
         [0.9213, 0.0376, 0.2380]],

        [[0.5426, 0.7731, 0.0437],
         [0.2255, 0.9555, 0.9879]],

        [[0.6404, 0.5305, 0.2738],
         [0.4166, 0.6042, 0.3748]],

        [[0.7394, 0.3000, 0.6479],
         [0.0416, 0.6007, 0.4241]],

        [[0.8982, 0.4961, 0.6603],
         [0.4529, 0.2528, 0.1696]],

        [[0.5100, 0.7112, 0.9630],
         [0.6973, 0.5117, 0.5903]]])


In [24]:
print(f'Dimensions: {random_tensor.ndim}')
print(f'Size: {random_tensor.shape}')

Dimensions: 3
Size: torch.Size([10, 2, 3])


- *We will represent images as three-dimensional tensors, where the first dimension represents the color channels (RGB), while the second and third dimensions represent the number of pixels in the image. For example, if we have a 1024x1024 color image and convert it to a tensor, then we would have a three-dimensional tensor with size `torch.Size([3, 1024, 1024])`.*

In [25]:
random_image_tensor = torch.rand(size=(3, 1024, 1024))
random_image_tensor

tensor([[[0.4814, 0.0632, 0.9753,  ..., 0.1883, 0.0756, 0.9011],
         [0.3321, 0.1456, 0.4719,  ..., 0.8391, 0.7805, 0.9025],
         [0.8482, 0.3054, 0.4833,  ..., 0.8161, 0.4669, 0.0689],
         ...,
         [0.6489, 0.1121, 0.1003,  ..., 0.7453, 0.2007, 0.9964],
         [0.2765, 0.6053, 0.9880,  ..., 0.5661, 0.5327, 0.3379],
         [0.1350, 0.9351, 0.9738,  ..., 0.9875, 0.6422, 0.8706]],

        [[0.3541, 0.7974, 0.2714,  ..., 0.3374, 0.7037, 0.5539],
         [0.7805, 0.4649, 0.8135,  ..., 0.0389, 0.6538, 0.3782],
         [0.9135, 0.1864, 0.4188,  ..., 0.3658, 0.5919, 0.9976],
         ...,
         [0.8582, 0.4580, 0.7376,  ..., 0.6061, 0.8997, 0.4062],
         [0.8784, 0.2916, 0.4300,  ..., 0.4109, 0.7739, 0.8302],
         [0.6331, 0.1060, 0.0243,  ..., 0.3143, 0.0407, 0.0370]],

        [[0.2824, 0.8100, 0.5089,  ..., 0.4428, 0.2192, 0.8499],
         [0.1997, 0.4274, 0.6938,  ..., 0.2850, 0.7926, 0.9036],
         [0.9079, 0.3626, 0.6193,  ..., 0.9576, 0.1364, 0.

In [27]:
print(f'Dimensiones: {random_image_tensor.ndim}')
print(f'Size: {random_image_tensor.shape}')

Dimensiones: 3
Size: torch.Size([3, 1024, 1024])


- *Whenever we work with random (or pseudo-random) processes, we will observe differences in the results. Even with the same seed, the same random process may generate different results if we run it on different devices (i.e., CPU or GPU). For more information about working with non-deterministic processes in PyTorch, read the following [documentation](https://pytorch.org/docs/stable/notes/randomness.html).*

- *We can define a seed that allows us to reproduce the results of an experiment using the `torch.manual_seed()` function (the same seed is defined for all devices). Now, running the same code multiple times returns the same result.*

In [28]:
torch.manual_seed(42)
random_tensor_1 = torch.rand(size=(1, 5))
random_tensor_2 = torch.rand(size=(1, 5))

print(f'Random tensor 1: {random_tensor_1}')
print(f'Random tensor 2: {random_tensor_2}')

# We can compare two tensors using the function torch.equal() or the operator ==
print(f'Are the tensors equal? {torch.equal(random_tensor_1, random_tensor_2)}')

Random tensor 1: tensor([[0.8823, 0.9150, 0.3829, 0.9593, 0.3904]])
Random tensor 2: tensor([[0.6009, 0.2566, 0.7936, 0.9408, 0.1332]])
Are the tensors equal? False


- *Note that, despite using the same seed, the tensors we generate are not equal to each other. To get two equal tensors we would need to run `torch.manual_seed()` before creating each of the tensors.*

In [29]:
torch.manual_seed(42) # Definimos la semilla para el primer tensor
random_tensor_1 = torch.rand(size=(1, 5))
torch.manual_seed(42) # Definimos la semilla para el segundo tensor
random_tensor_2 = torch.rand(size=(1, 5))

print(f'Random tensor 1: {random_tensor_1}')
print(f'Random tensor 2: {random_tensor_2}')

# We can compare two tensors using the function torch.equal() or the operator ==
print(f'Are the tensors equal? {torch.equal(random_tensor_1, random_tensor_2)}')

Random tensor 1: tensor([[0.8823, 0.9150, 0.3829, 0.9593, 0.3904]])
Random tensor 2: tensor([[0.8823, 0.9150, 0.3829, 0.9593, 0.3904]])
Are the tensors equal? True


## Tensor-like objects

In [30]:
range_tensor = torch.arange(start=0, end=1000, step=36)
range_tensor

tensor([  0,  36,  72, 108, 144, 180, 216, 252, 288, 324, 360, 396, 432, 468,
        504, 540, 576, 612, 648, 684, 720, 756, 792, 828, 864, 900, 936, 972])

In [31]:
print(f'Dimension: {range_tensor.ndim}')
print(f'Shape: {range_tensor.shape}')
print(f'Dtype: {range_tensor.dtype}')

Dimension: 1
Shape: torch.Size([28])
Dtype: torch.int64


- *When we say we're going to create or that we created an object that is "tensor-like", we refer to creating a new tensor that has the same dimension, size, and data type as another existing tensor, but with different values.*

- *For example, we can create a new tensor from the `range_tensor` tensor that has the same dimension, size, and data type, but contains different information (only zeros).*

In [32]:
zeros = torch.zeros_like(input=range_tensor)
zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0])

In [33]:
print(f'Dimension: {zeros.ndim}')
print(f'Shape: {zeros.shape}')
print(f'Dtype: {zeros.dtype}')

Dimension: 1
Shape: torch.Size([28])
Dtype: torch.int64


## Tensor attributes

### Data type

- *By default, PyTorch creates tensors using the `torch.float32` data type. There are many more data types [available](https://pytorch.org/docs/stable/tensors.html#data-types), which differ mainly in the numerical precision they provide. For example, the `torch.float64` data type has much more numerical precision than the `torch.float32` data type (almost double), but consumes much more memory.*

- *We can specify the data type we want for our tensor using the `dtype` argument.*

In [34]:
TENSOR_0 = torch.tensor([3., 2., 1.], dtype=torch.float64)
TENSOR_0

tensor([3., 2., 1.], dtype=torch.float64)

In [35]:
TENSOR_0.dtype

torch.float64

- *We can modify the data type using the `torch.Tensor.type()` method or the `torch.Tensor.to()` method. The latter method is the most recommended, as it also allows us to modify other tensor attributes.*

In [37]:
# Change tensor's datatype
TENSOR_1 = TENSOR_0.to(dtype=torch.float16) # or use torch.Tensor.to()
TENSOR_1

tensor([3., 2., 1.], dtype=torch.float16)

- *When performing mathematical operations between two tensors with different data types, PyTorch uses a **promotion rule** to define the data type of the resulting tensor (it will have the data type with higher precision). For example, if we multiply a tensor with data type `torch.float32` and another with data type `torch.float64`, the tensor we get after the operation is `torch.float64`.*

- *This is something important to keep in mind because:*
    1. *We can end up consuming more memory than expected if the operations we perform are constantly increasing the numerical precision of our tensors.*
    2. *Operations between tensors with different data types are usually less performant because PyTorch also has to handle data conversion.*
    3. *Some devices have limited support for some data types (e.g., GPUs don't support tensors with `torch.float64` data types or `mps` doesn't support tensors with `torch.float16` data types).*

In [38]:
TENSOR_0 * TENSOR_1

tensor([9., 4., 1.], dtype=torch.float64)

### Device

- *We can also define the **device** where the tensor is stored and executed. There are several types of devices we can use, but the most commonly used are:*
    1. *`cpu`: allows us to store and execute tensors in our computer's or virtual machine's RAM.*
    2. *`cuda`: allows us to store and execute tensors on the GPU. If we have more than one GPU available, we can specify which one we want to use, e.g., `cuda:0` (first GPU), `cuda:1` (second GPU), `cuda:2` (third GPU), etc.*
    3. *`mps`: allows us to use the integrated GPU in computers with Apple Silicon (similar to how `cuda` allows us to use NVIDIA GPUs). For more information about using MPS with PyTorch, read the following [documentation](https://pytorch.org/docs/stable/notes/mps.html).*

- *Why is it important to define the device?:*
    1. *If we have two tensors that are stored on two different devices, we cannot perform operations between them.*
    2. *Moving tensors from one device to another can be computationally expensive.*
    3. *The data type we have available to use depends on the device we use.*

In [39]:
TENSOR.device

device(type='cpu')

- *We can choose the device where we want to store and execute the tensor at the time of creation:*

In [41]:
# Define a torch.device
torch_device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Create a new tensor with specified device
TENSOR_2 = torch.rand(size=(3, 1024, 1024), device=torch_device)
print(f'Tensor: {TENSOR_2}')
print(f'Device: {TENSOR_2.device}')

Tensor: tensor([[[0.2137, 0.1287, 0.7578,  ..., 0.5280, 0.1456, 0.7621],
         [0.8253, 0.4947, 0.8110,  ..., 0.7116, 0.7734, 0.5908],
         [0.3095, 0.3278, 0.5857,  ..., 0.7197, 0.9459, 0.6801],
         ...,
         [0.8766, 0.0069, 0.1601,  ..., 0.7396, 0.1457, 0.8224],
         [0.9497, 0.4182, 0.1513,  ..., 0.2655, 0.8717, 0.3968],
         [0.3618, 0.1220, 0.6681,  ..., 0.9665, 0.2104, 0.3793]],

        [[0.5913, 0.7940, 0.0250,  ..., 0.3615, 0.9619, 0.2593],
         [0.3742, 0.9553, 0.4215,  ..., 0.9176, 0.5058, 0.0018],
         [0.3236, 0.1479, 0.7749,  ..., 0.4080, 0.7298, 0.5154],
         ...,
         [0.1911, 0.9423, 0.8503,  ..., 0.5304, 0.1596, 0.3789],
         [0.2172, 0.7798, 0.9750,  ..., 0.1987, 0.9614, 0.8083],
         [0.8856, 0.7259, 0.3923,  ..., 0.7049, 0.5056, 0.8305]],

        [[0.7707, 0.6171, 0.6775,  ..., 0.1210, 0.4782, 0.9287],
         [0.0903, 0.1733, 0.3913,  ..., 0.9720, 0.0357, 0.6195],
         [0.6506, 0.2457, 0.7716,  ..., 0.9107, 0.

- *We can also use the `torch.Tensor.to` function to change the data type or device of a tensor (note that it's necessary to reassign the tensor to the variable):*

In [42]:
TENSOR_2 = TENSOR_2.to(device='mps')
print(f'Device: {TENSOR_2.device}')

Device: mps:0


### Gradients

- *Another important argument for tensors is the `requires_grad` argument. When we define `requires_grad=True` in a tensor, we're asking PyTorch to save all operations performed on that tensor, so we can calculate gradients. This is very useful when we want to train a Deep Learning model, as it allows us to calculate the gradients of the loss function with respect to the model weights using backpropagation.*

- *Note that in the following example we create a new variable, `y`, which equals the sum of the elements in `X`. Then, to calculate the gradients of `y` with respect to the elements in `X` (i.e., how the value of `y` changes with an infinitesimal change in any element of `X`), we call the `.backward()` function from variable `y`. However, we can find the gradients in tensor `X` (this makes sense because the gradients tell us how `y` changes when an element of `X` changes).*

In [43]:
# Creates a tensor and operates on it
X = torch.tensor([3., 2., 1.], requires_grad=True, device='mps')

# Sum all elements of the tensor and compute the gradients
y = X.sum()
y.backward()
print(f'Gradients after automatic differentiation: {X.grad}')

Gradients after automatic differentiation: tensor([1., 1., 1.], device='mps:0')


## PyTorch and NumPy

- *We can create tensors from NumPy arrays, and vice versa. This is very useful because we can use NumPy functionalities to manipulate our data, and then use PyTorch tensors to train our Deep Learning models.*

- *We can create a tensor from a NumPy array using the `torch.from_numpy(ndarray)` function. If we want to convert a tensor to a NumPy array, we have to use the `torch.Tensor.numpy()` method.*

In [48]:
array = np.arange(1., 10.).reshape((3, 3))
print(f'Array de NumPy: \n{array}')

Array de NumPy: 
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


In [49]:
tensor = torch.from_numpy(array)
print(f'Tensor de PyTorch: \n{tensor}')

Tensor de PyTorch: 
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]], dtype=torch.float64)


- *Something very important to keep in mind is that the default data type for NumPy is different from PyTorch's (NumPy uses `float64` while PyTorch uses `float32`). The tensor we create from the NumPy array inherits the array's data type, which could cause problems when trying to operate between tensors (if the data types don't match).*

In [50]:
print(f"Array's datatype: {array.dtype}")
print(f"Tensor's datatype: {tensor.dtype}")

Array's datatype: float64
Tensor's datatype: torch.float64


In [51]:
tensor = tensor.to(dtype=torch.float32)
print(f"Tensor's datatype: {tensor.dtype}")

Tensor's datatype: torch.float32
