<a href="https://colab.research.google.com/github/mouha-ndour/PyTorch-fundamentals/blob/main/PyTorch_in_One_Hour.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**I. WAHT IS PYTORCH ?**

- An open source Pyhon-based deep learning library.
- Pytorch has been the most widely used deep learning library for research papers since 2019 by a wide margin.
- One of the reasons Pytorch is so popular is its user-friendly interface and efficiency.

**1. The three core components of PyTorch**

- Tensor library: that extends the concept of aaray-oriented programming library NumPy with the additional feature of accelerated computation on GPUs, thus providing a seamless swith CPUs and GPUs.
- Automatic differentiation engine: alse known as autograd, which enables the automatic computation of gradients for tensor operations, simplifying backpropageion and model optimization.

In [1]:
# Installing PyTorch

# A leaner version that only supports CPU computing and a
# version that supports both CPU and GPU computing.

!pip install torch
import torch
torch.__version__



'2.9.0+cu126'

**II. Understanding tensors**

Tensors represent a mathematical concept that generalizes vectorss and matrices to potentially higher dimensions. Tensors are mathemaical objects that can be characterized by their order (or rank), which provides the number of dimensions

In [2]:
# Scalars, vectors, matices, and tensors

# We can create objects of PyTORCH's Tensor class using the orch.tensor function as follows:

import torch

# Create a 0D tensor (scaalar) from a Python integer
tensor0d=torch.tensor(1)

# Create a 1D tensor(vector) from a python list
tensor1d=torch.tensor([1,2,3])

# Create a 2D tensor from a nested Python list
tensor2d=torch.tensor([[1,2], [3,4]])

# create a 3D tensor from a nested Python list
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])


In [3]:
# Tensor data types

tensor1d=torch.tensor([1,2,3])
print(tensor1d.dtype)

# If we create tensors from Python floats, PyTorch creates tensors with a 32-bit precision
# by default, as we can see below.
floatvec=torch.tensor([1.0,2.0,3.0])
print(floatvec.dtype)

torch.int64
torch.float32


**III. Tensor Data Types**

PyTorch tensors can hold data of various types. The choice of data type is crucial as it affects memory consumption, computational speed, and precision. PyTorch supports several data types, including:

*   `torch.float32` or `torch.float`: Standard floating-point numbers.
*   `torch.float64` or `torch.double`: Double-precision floating-point numbers.
*   `torch.int8`: Signed 8-bit integers.
*   `torch.int16` or `torch.short`: Signed 16-bit integers.
*   `torch.int32` or `torch.int`: Signed 32-bit integers.
*   `torch.int64` or `torch.long`: Signed 64-bit integers.
*   `torch.bool`: Boolean values (True/False).

By default, PyTorch operations often use `torch.float32` for floating-point tensors and `torch.int64` for integer tensors.

You can check the data type of a tensor using the `.dtype` attribute and change it using the `.to()` method or `.type()` method.

In [4]:
import torch

# Create a tensor with default data type (float32)
float_tensor = torch.tensor([1.0, 2.0, 3.0])
print(f"Default float tensor: {float_tensor} | Data type: {float_tensor.dtype}")

# Create an integer tensor with default data type (int64)
int_tensor = torch.tensor([1, 2, 3])
print(f"Default integer tensor: {int_tensor} | Data type: {int_tensor.dtype}")

# Specify a data type during creation
int16_tensor = torch.tensor([1, 2, 3], dtype=torch.int16)
print(f"Int16 tensor: {int16_tensor} | Data type: {int16_tensor.dtype}")

# Change the data type using .to()
float64_tensor = float_tensor.to(torch.float64)
print(f"Float64 tensor: {float64_tensor} | Data type: {float64_tensor.dtype}")

# Change the data type using .type()
int_from_float = float_tensor.type(torch.int32)
print(f"Int32 tensor from float: {int_from_float} | Data type: {int_from_float.dtype}")

# Create a boolean tensor
bool_tensor = torch.tensor([True, False, True])
print(f"Boolean tensor: {bool_tensor} | Data type: {bool_tensor.dtype}")

Default float tensor: tensor([1., 2., 3.]) | Data type: torch.float32
Default integer tensor: tensor([1, 2, 3]) | Data type: torch.int64
Int16 tensor: tensor([1, 2, 3], dtype=torch.int16) | Data type: torch.int16
Float64 tensor: tensor([1., 2., 3.], dtype=torch.float64) | Data type: torch.float64
Int32 tensor from float: tensor([1, 2, 3], dtype=torch.int32) | Data type: torch.int32
Boolean tensor: tensor([ True, False,  True]) | Data type: torch.bool


In [5]:
# Common PyTorch tensor operations

tensor2d=torch.tensor([[1,2,3], [4,5,6]])
tensor2d



tensor([[1, 2, 3],
        [4, 5, 6]])

In [6]:
# The .shape attribute allows us to access the shape of a tensor
print(tensor2d.shape)

# Note that the more common command for reshaping tensors in PyTorch is
# .view
tensor2d.view(3,2)

torch.Size([2, 3])


tensor([[1, 2],
        [3, 4],
        [5, 6]])

**IV- SEEING MODELS AS COMPUTATION GRAPHS**


In the previous section, we covered one of the major three components of PyTorch, namely, its tensor library. Next in line is PyTorch's automatic differenctiation engine, also known as autograd. Before we dive deeper into computing gradients in the next section, let's define the concept of a computational graph.

A computational graph is a directed graph that allows us to express andvisualize mathematical expressions. In the context of deep learning, a computation graph lays ou the sequence of calculations neede to compute the output of a neural networks.

Let's look at a concrete example to illustrate the concept of a computation graph: a simple logistic regression classifier (which can be seen as single layer neural network).

In [7]:
import torch.nn.functional as F

y=torch.tensor([1.0])
x1=torch.tensor([1.1])
w1=torch.tensor([2.2])
b=torch.tensor([0.0])

z=x1*w1 + b # net input
a=torch.sigmoid(z) # activation & output
loss=F.binary_cross_entropy(a, y)
print(loss)



tensor(0.0852)


**V. AUTOMATIC DIFFERENTIATION MADE EASY**


- requires_grad := In the previous section, we introduced the concept of computation graphs. If we carry out computations in PyTorch, it will build such a graph internally by default if one of its terminal nodes has the requires_grad attribute set to True. This is useful if we want to compute gradients. Gradients are required when training neural networks via the popular backpropagation algorithm, which can be thought of as an implementation of the chain rule from calculus for neural networks

- retain_graph :=By default, PyTorch destroys the computation graph after calculating the gradients to free memory. However, since we are going to reuse this computation graph shortly, we set retain_graph=True so that it stays in memory.

- .backward :=
- .grad := We can call .backward on the loss, and PyTorch will compute the gradients of all the leaf nodes in the graph, which will be stored via the tensors’ .grad attributes:

In [8]:
import torch.nn.functional as F
from torch.autograd import grad

y=torch.tensor([1.0])
x1=torch.tensor([1.1])
w1=torch.tensor([2.2], requires_grad=True)
b=torch.tensor([0.0], requires_grad=True)

z=x1*w1 + b
a=torch.sigmoid(z)

loss=F.binary_cross_entropy(a, y)

grad_L_w1= grad(loss, w1, retain_graph=True)
grad_L_b= grad(loss, b, retain_graph=True)



In [9]:
loss.backward()

print(w1.grad)
print(b.grad)

tensor([-0.0898])
tensor([-0.0817])


**VI. IMPLEMENTING MULTILAYER NEURAL NETWORKS**



In [10]:
class NeuralNetwork(torch.nn.Module):
  def __init__(self, num_inputs, num_outputs):
    super().__init__()

    self.layers = torch.nn.Sequential(

                                      # 1st hidden layer
                                      torch.nn.Linear(num_inputs, 30),
                                      torch.nn.ReLU(),

                                      # 2nd hidden layer
                                      torch.nn.Linear(30, 20),
                                      torch.nn.ReLU(),

                                      # output layer
                                      torch.nn.Linear(20, num_outputs),

       )

  def forward(self, x):
    logits = self.layers(x)
    return logits

In [11]:
# We can then instantiate a new neural network object as follows:
model = NeuralNetwork(50, 3)

# Let's see the summary of its structure
print(model)

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)


In [12]:
# Now let's check the total number of trainable parameters of this model
num_params = sum(
    p.numel() for p in model.parameters() if p.requires_grad
)

print("Total number of trainable model parameters:", num_params)

Total number of trainable model parameters: 2213


In [13]:
# Based on the print(model) call we executed above, we can see that the first
# Linear layer is at index position 0 in the layers attribute. We can access the corresponding weight parameter matrix as follows:

print(model.layers[0].weight)
print(model.layers[0].weight.shape)

Parameter containing:
tensor([[-0.1391, -0.1022,  0.0838,  ...,  0.1101, -0.0009,  0.0779],
        [ 0.0293, -0.0160,  0.0398,  ...,  0.0220,  0.0436, -0.1319],
        [-0.1135,  0.1195, -0.0367,  ..., -0.0077, -0.0801, -0.0291],
        ...,
        [ 0.0975, -0.0606, -0.1159,  ..., -0.0106,  0.0931, -0.0603],
        [ 0.0867, -0.1358, -0.0711,  ...,  0.1199,  0.1392,  0.1003],
        [-0.0667,  0.0932,  0.0806,  ...,  0.1332,  0.0578,  0.0098]],
       requires_grad=True)
torch.Size([30, 50])


In [14]:
torch.manual_seed(123)

model = NeuralNetwork(50, 3)
print(model.layers[0].weight)

Parameter containing:
tensor([[-0.0577,  0.0047, -0.0702,  ...,  0.0222,  0.1260,  0.0865],
        [ 0.0502,  0.0307,  0.0333,  ...,  0.0951,  0.1134, -0.0297],
        [ 0.1077, -0.1108,  0.0122,  ...,  0.0108, -0.1049, -0.1063],
        ...,
        [-0.0787,  0.1259,  0.0803,  ...,  0.1218,  0.1303, -0.1351],
        [ 0.1359,  0.0175, -0.0673,  ...,  0.0674,  0.0676,  0.1058],
        [ 0.0790,  0.1343, -0.0293,  ...,  0.0344, -0.0971, -0.0509]],
       requires_grad=True)


In [15]:
torch.manual_seed(123)

X=torch.rand((1, 50))
out=model(X)
print(out)

tensor([[-0.1262,  0.1080, -0.1792]], grad_fn=<AddmmBackward0>)


In [16]:
with torch.no_grad():
  out = model(X)
print(out)

tensor([[-0.1262,  0.1080, -0.1792]])


In [17]:
with torch.no_grad():
  out = torch.softmax(model(X), dim=1)
print(out)

tensor([[0.3113, 0.3934, 0.2952]])


Let’s start by creating a simple toy dataset of five training examples with two features each. Accompanying the training examples, we also create a tensor containing the corresponding class labels: three examples belong to class 0, and two examples belong to class 1. In addition, we also make a test set consisting of two entries. The code to create this dataset is shown below.

In [18]:

X_train = torch.tensor([
    [-1.2, 3.1],
    [-0.9, 2.9],
    [-0.5, 2.6],
    [2.3, -1.1],
    [2.7, -1.5]
])

y_train = torch.tensor([0, 0, 0, 1, 1])

X_test = torch.tensor([
    [-0.8, 2.8],
    [2.6, -1.6],
])

y_test = torch.tensor([0, 1])

Next, wee create a custom dataset class, ToyDataset, by subclassing from PyTorch'sDataset parent class, as shown below.

In [19]:
from torch.utils.data import Dataset

class ToyDataset(Dataset):
  def __init__(self, X, y):
    self.features = X
    self.labels = y

  def __getitem__(self, index):
    one_x = self.features[index]
    one_y = self.labels[index]
    return one_x, one_y

  def __len__(self):
    return self.labels.shape[0]

train_ds = ToyDataset(X_train, y_train)
test_ds = ToyDataset(X_test, y_test)

This custom ToyDataset class’s purpose is to use it to instantiate a PyTorch DataLoader. But before we get to this step, let’s briefly go over the general structure of the ToyDataset code.

In PyTorch, the three main components of a custom Dataset class are the __init__ constructor, the __getitem__ method, and the __len__ method, as shown in code ToyDataset code above.

In the __init__ method, we set up attributes that we can access later in the __getitem__ and __len__ methods. This could be file paths, file objects, database connectors, and so on. Since we created a tensor dataset that sits in memory, we are simply assigning X and y to these attributes, which are placeholders for our tensor objects.

In the __getitem__ method, we define instructions for returning exactly one item from the dataset via an index. This means the features and the class label corresponding to a single training example or test instance. (The data loader will provide this index, which we will cover shortly.)

Finally, the __len__ method contains instructions for retrieving the length of the dataset. Here, we use the .shape attribute of a tensor to return the number of rows in the feature array. In the case of the training dataset, we have five rows, which we can double-check as follows:

In [20]:
from torch.utils.data import DataLoader

torch.manual_seed(123)

train_loader = DataLoader(
    dataset=train_ds,
    batch_size=2,
    shuffle=True,
    num_workers=0
)

In [21]:
from torch.utils.data import DataLoader

torch.manual_seed(123)

test_loader=DataLoader(
    dataset=test_ds,
    batch_size=2,
    shuffle=True,
    num_workers=0
)

In [22]:
train_loader = DataLoader(
    dataset=train_ds,
    batch_size=2,
    shuffle=True,
    num_workers=0,
    drop_last=True
)

In [23]:
# For iterating over the training loader, we can see that the last batch is omitted

for idx, (x,y) in enumerate(train_loader):
  print(f"Batch {idx}:", x,y)

Batch 0: tensor([[ 2.3000, -1.1000],
        [-0.9000,  2.9000]]) tensor([1, 0])
Batch 1: tensor([[-1.2000,  3.1000],
        [-0.5000,  2.6000]]) tensor([0, 0])


**VIII. A typical training loop**

So far, we’ve discussed all the requirements for training neural networks: PyTorch’s tensor library, autograd, the Module API, and efficient data loaders. Let’s now combine all these things and train a neural network on the toy dataset from the previous section. The training code is shown in code below.

In [24]:
import torch.nn.functional as F

torch.manual_seed(123)
model=NeuralNetwork(num_inputs=2, num_outputs=2)
optimizer=torch.optim.SGD(model.parameters(), lr=0.5)

num_epochs=3

for epoch in range(num_epochs):
  model.train()
  for batch_idx, (features, labels) in enumerate(train_loader):

    logits=model(features)

    loss=F.cross_entropy(logits, labels)  # Loss function

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    ### LOGGING
    print(f"Epoch: {epoch}/{num_epochs:03d}"
          f" | Batch {batch_idx:03d}/{len(train_loader):03d}"
          f" | Train/Val Loss: {loss:.2f}"
    )

    model.eval()
    # Optional model evaluation

Epoch: 0/003 | Batch 000/002 | Train/Val Loss: 0.75
Epoch: 0/003 | Batch 001/002 | Train/Val Loss: 0.65
Epoch: 1/003 | Batch 000/002 | Train/Val Loss: 0.44
Epoch: 1/003 | Batch 001/002 | Train/Val Loss: 0.13
Epoch: 2/003 | Batch 000/002 | Train/Val Loss: 0.03
Epoch: 2/003 | Batch 001/002 | Train/Val Loss: 0.00


As we can see, the loss reaches zero after 3 epochs, a sign that the model converged on the training set. However, before we evaluate the model’s predictions, let’s go over some of the details of the preceding code.

First, note that we initialized a model with two inputs and two outputs. That’s because the toy dataset from the previous section has two input features and two class labels to predict. We used a stochastic gradient descent (SGD) optimizer with a learning rate (lr) of 0.5. The learning rate is a hyperparameter, meaning it’s a tunable setting that we have to experiment with based on observing the loss. Ideally, we want to choose a learning rate such that the loss converges after a certain number of epochs – the number of epochs is another hyperparameter to choose.

In practice, we often use a third dataset, a so-called validation dataset, to find the optimal hyperparameter settings. A validation dataset is similar to a test set. However, while we only want to use a test set precisely once to avoid biasing the evaluation, we usually use the validation set multiple times to tweak the model settings.

We also introduced new settings called model.train() and model.eval(). As these names imply, these settings are used to put the model into a training and an evaluation mode. This is necessary for components that behave differently during training and inference, such as dropout or batch normalization layers. Since we don’t have dropout or other components in our NeuralNetwork class that are affected by these settings, using model.train() and model.eval() is redundant in our code above. However, it’s best practice to include them anyway to avoid unexpected behaviors when we change the model architecture or reuse the code to train a different model.

As discussed earlier, we pass the logits directly into the cross_entropy loss function, which will apply the softmax function internally for efficiency and numerical stability reasons. Then, calling loss.backward() will calculate the gradients in the computation graph that PyTorch constructed in the background. The optimizer.step() method will use the gradients to update the model parameters to minimize the loss. In the case of the SGD optimizer, this means multiplying the gradients with the learning rate and adding the scaled negative gradient to the parameters.

**Preventing undesired gradient accumulation**

It is important to include an optimizer.zero_grad() call in each update round to reset the gradients to zero. Otherwise, the gradients will accumulate, which may be undesired.

In [26]:
model.eval()

with torch.no_grad():
    outputs=model(X_train)

print(outputs)

tensor([[ 2.8569, -4.1618],
        [ 2.5382, -3.7548],
        [ 2.0944, -3.1820],
        [-1.4814,  1.4816],
        [-1.7176,  1.7342]])


In [28]:
# To obtain the class memership probabilities, we can use PytOrch's softmax function, as follows:

torch.set_printoptions(sci_mode=False)
probas=torch.softmax(outputs, dim=1)
print(probas)

tensor([[0.9991, 0.0009],
        [0.9982, 0.0018],
        [0.9949, 0.0051],
        [0.0491, 0.9509],
        [0.0307, 0.9693]])


In [29]:
# We can convert these values into class labels predictions using
# PyTorch's argmax function, which returns the index position of the highest value in each row
# if we set dim=1 (setting dim=0 would return the highest value in each column, instead)

predictions=torch.argmax(probas, dim=1)
print(predictions)

tensor([0, 0, 0, 1, 1])


In [30]:
# It's unnecessary to compute softmax probas to obtain the class
# labels. We could also apply the argmax function to the logits (outputs) directly

predictions=torch.argmax(outputs, dim=1)
print(predictions)

tensor([0, 0, 0, 1, 1])


In [31]:
# Using torch.sum, we can count the number of correct prediction as follows:

torch.sum(predictions == y_train)

tensor(5)



Note that the following compute_accuracy function iterates over a data loader to compute the number and fraction of the correct predictions. This is because when we work with large datasets, we typically can only call the model on a small part of the dataset due to memory limitations. The compute_accuracy function above is a general method that scales to datasets of arbitrary size since, in each iteration, the dataset chunk that the model receives is the same size as the batch size seen during training.

Notice that the internals of the compute_accuracy function are similar to what we used before when we converted the logits to the class labels.

In [34]:
def compute_accuracy(model, dataloader):

  model=model.eval()
  correct=0.0
  total_examples=0

  for idx, (features, labels) in enumerate(dataloader):

    with torch.no_grad():
      logits=model(features)

    predictions=torch.argmax(logits, dim=1)
    compare = (labels == predictions) # Compare labels and predictions
    correct += torch.sum(compare).item() # Sum the True values (correct predictions)
    total_examples += len(labels) # Add the number of examples in the current batch

  return (correct / total_examples).item()