# Three Core Components of PyTorch

<img src="../asssets/a1-three-components-of-pytorch.png"/>

1. **Tensor Library:** Extends the concept of array-oriented programming library, *`NumPy`* with the *`GPU`* support.

2. **Automatic Differentiation Engine `(Autograd)`:** Enables Automatic Calculation of Gradients for Tensor Operations simplifying the process of backpropagation and model optimization.

3. **Deep Learning Library:** Offers Modular, Flexible and Efficient Building Blocks including Pretrained Models, Loss Functions and Optimizers for designing and training a wide range of deep learning models.


> In the news, **LLMs** are often referred to as **AI models**. However, LLMs are also a type of  **deep  neural  network**,  and **PyTorch**  is  a  deep  learning  library.

<img src="../asssets/ai-ml-dl.png"/>

<img src="../asssets/a3-supervised-learning.png"/>

<img src="../asssets/apple silicon.png"/>

# Tensors

In [17]:
import torch

torch.__version__

'2.2.0'

In [18]:
torch.cuda.is_available()

False

## Understanding Tensors

<img src="../asssets/tensors.png">

## Scalars, Vectors, Matrices and Tensors

<img src="../asssets/create-tensors.png"/>

In [19]:
import torch

tensor0d: torch.Tensor = torch.tensor(data=1)
print(tensor0d)

tensor(1)


In [20]:
import torch

tensor1d: torch.Tensor = torch.tensor(data=[1, 2, 3])
print(tensor1d)

tensor([1, 2, 3])


In [21]:
import torch

tensor2d: torch.Tensor = torch.tensor(data=[[1, 2], [3, 4]])
print(tensor2d)

tensor([[1, 2],
        [3, 4]])


In [22]:
import torch

tensor3d: torch.Tensor = torch.tensor(
    data=[[[1, 2], [3, 4]], [[5, 6], [7, 8]]],
)
print(tensor3d)

tensor([[[1, 2],
         [3, 4]],

        [[5, 6],
         [7, 8]]])


## Tensor Datatypes

- PyTorch  adopts  the  default  `64-bit  integer`  data  type  from  Python.  We  can  access  the data type of a tensor via the *`.dtype`* attribute of a tensor:

In [23]:
import torch

tensor1d: torch.Tensor = torch.tensor(data=[1, 2, 3])
print(tensor1d)
print(tensor1d.dtype)

tensor([1, 2, 3])
torch.int64


- If we create tensors from Python floats, PyTorch creates tensors with a *`32-bit precision`* by default:

In [24]:
import torch

floatvector: torch.Tensor = torch.tensor(data=[1.0, 2.0, 3.0])
print(floatvector)
print(floatvector.dtype)

tensor([1., 2., 3.])
torch.float32


**This choice is primarily due to the *balance between precision and computational efficiency*. A 32-bit floating-point number offers sufficient precision for most deep learning tasks while consuming less memory and computational resources than a 64-bit floating-point number. Moreover, *GPU architectures are optimized for 32-bit* computations, and using this data type can significantly speed up model training and inference.**

> Moreover, it is possible to change the precision using a tensor’s **`.to`** method.

In [25]:
import torch

floatvector64: torch.Tensor = torch.tensor(data=[1, 2, 3])
print(floatvector64)
print(floatvector64.dtype, "\n")

floatvector32: torch.Tensor = torch.tensor(data=[1, 2, 3]).to(dtype=torch.float32)
print(floatvector32)
print(floatvector32.dtype)


tensor([1, 2, 3])
torch.int64 

tensor([1., 2., 3.])
torch.float32


## Common PyTorch Tensor Operations

- The *`.shape`* attribute allows us to access the shape of a tensor:

In [26]:
import torch

tensor2d: torch.Tensor = torch.tensor(data=[[1, 2, 3], [4, 5, 6]])
print(tensor2d)
print(tensor2d.shape)

tensor([[1, 2, 3],
        [4, 5, 6]])
torch.Size([2, 3])


As you can see, *`.shape`* returns `[2, 3]`, meaning the tensor has *2 rows* and *3 columns*. To reshape the tensor into a `3 × 2` tensor, we can use the *`.reshape`* method:

In [27]:
print(tensor2d.reshape(3, 2))

tensor([[1, 2],
        [3, 4],
        [5, 6]])


However, note that the *more common command for reshaping* tensors in PyTorch is *`.view()`*:

In [28]:
print(tensor2d.view(3, 2))

tensor([[1, 2],
        [3, 4],
        [5, 6]])


The key difference between `.view()` and `.reshape()` in PyTorch lies in how they handle memory layout: `.view()` requires the tensor to be **contiguous** (data stored in a continuous block of memory) and will raise an error if it isn’t, as it only *provides a new "view" into the existing data* **without copying it**. In contrast, `.reshape()` works regardless of whether the tensor is contiguous; if needed, it creates a new, contiguous copy of the data to ensure the desired shape. Use `.view()` for efficiency when the tensor is contiguous and `.reshape()` for flexibility.


- We can use **`.T`** to transpose a tensor, which means flipping it across its diagonal. Note that this is similar to reshaping a tensor, as you can see based on the following result:

In [29]:
print(tensor2d)
print(tensor2d.T)

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])


The common way to multiply two matrices in PyTorch is the **`.matmul`** method:

In [30]:
print(tensor2d)
print(tensor2d.T)
print(tensor2d.matmul(other=tensor2d.T))

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])
tensor([[14, 32],
        [32, 77]])


We can also adopt the **`@`** operator, which accomplishes the same thing more compactly:

In [31]:
print(tensor2d)
print(tensor2d.T)
print(tensor2d @ tensor2d.T)

tensor([[1, 2, 3],
        [4, 5, 6]])
tensor([[1, 4],
        [2, 5],
        [3, 6]])
tensor([[14, 32],
        [32, 77]])


# Autograd

## Seeing Models as Computational Graphs
Now let’s look at PyTorch’s *`automatic differentiation engine`*, also known as *`autograd`*. PyTorch’s autograd system provides *functions to compute gradients* in dynamic computational graphs automatically. 

- A **`computational graph`** is a `directed graph` that allows us to **express** and **visualize mathematical expressions**. In the context of deep learning, a computation graph lays out the sequence of calculations needed to compute the output of a neural network we  will  need  this  to  compute  the  required  gradients  for backpropagation,  the  main training algorithm for neural networks.

The code in the following listing implements the **forward pass (prediction step)** of a **simple logistic regression classifier**, which can be seen as a `single-layer neural network`. It returns a score between 0 and 1, which is compared to the true class label (0 or 1) when computing the loss.


<img src="../asssets/logistic-regression-forward-pass.png"/>


In [32]:
import torch
import torch.nn.functional as F

y: torch.Tensor = torch.tensor(data=[1.0])  # True Label

x1: torch.Tensor = torch.tensor(data=[1.1])  # Indepndent Variable
w1: torch.Tensor = torch.tensor(data=[2.2])  # Weight

b: torch.Tensor = torch.tensor(data=[0.0])  # Bias

z: torch.Tensor = x1 * w1 + b  # Linear Function
a: torch.Tensor = torch.sigmoid(input=z)  # Activation Function

loss: torch.Tensor = F.binary_cross_entropy(input=a, target=y)  # Loss

<img src="../asssets/computational-graph.png">

PyTorch builds such a computation graph in the background, and we can use this to *`calculate gradients(slope) of a loss function with respect to the model parameters`* (here **`w1`** and **`b`**) *`to train the model.`*

## Automatic Differentiation Made Easy
If we carry out computations in PyTorch, it will build a computational graph internally by default if one of its terminal  nodes has the **`requires_grad`** attribute  set to `True`. This is useful if we want to compute gradients. **Gradients are required when training neural networks** via the popular **`backpropagation algorithm`**, which can be considered an *`implementation of the chain rule`* from calculus for neural networks.

<img src="../asssets/partial-derivative.png">

## PARTIAL DERIVATIVES AND GRADIENTS

- **`Partial Derivatives:`** measure *`the rate at which a function changes with respect to one of its variables`*. 

- A **`gradient (slope)`** is a *`vector containing all of the partial derivatives of a multivariate function`*, *a function with more than one variable as input*.


> On a high level, the Chain Rule is a way to compute the gradients (slope) of a Loss Function given the Model's Parameters in a Computational Graph. This provides the information needed to **update** each of the Model's Parameter to **Minimize** the Loss Function, which serves as a Proxy for measuring the **Performance** of the Model using **Gradient Descent**.

**So, Why Autograd?**

- It **automatically** builds a `computational graph` for us. How? By tracking every operation performed on Tensors.
- It **automatically** computes the `gradients` for us. How? By calling the **`grad`** function, we can compute the `gradients of a loss function` with respect to the model parameters (weights and biases).

<img src="../asssets/compute-gradients-with-autograd.png"/>


In [None]:
from typing import Tuple
from torch import Tensor
import torch.nn.functional as F
from torch.autograd import grad

y: Tensor = torch.tensor(data=[1.0])  # True Label

x1: Tensor = torch.tensor(data=[1.1])  # Indepndent Variable
w1: Tensor = torch.tensor(data=[2.2], requires_grad=True)  # Weight

b: Tensor = torch.tensor(data=[0.0], requires_grad=True)  # Bias

z: Tensor = x1 * w1 + b  # Linear Function
a: Tensor = torch.sigmoid(input=z)  # Activation Function

loss: Tensor = F.binary_cross_entropy(input=a, target=y)  # Loss

gradients_of_loss_wrt_w1: Tuple[Tensor, ...] = grad(
    outputs=loss,
    inputs=w1,
    retain_graph=True,
)
gradients_of_loss_wrt_b: Tuple[Tensor, ...] = grad(
    outputs=loss,
    inputs=b,
    retain_graph=True,
)

print(gradients_of_loss_wrt_w1)
print(gradients_of_loss_wrt_b)


# More efficient and compact way to compute gradients of the Loss Function with respect to the Model's Parameters
loss.backward()  # Calculates the Gradients of the Loss Function wrt all those Tensors that have requires_grad=True
print(w1.grad, b.grad)

(tensor([-0.0898]),)
(tensor([-0.0817]),)
tensor([-0.0898]) tensor([-0.0817])


*While calling `loss.backward()` how does pytorch knows to calculate the `gradients of` **`loss`** wrt whom?*

When you call `loss.backward()`, it calculates the gradients of the loss with respect to all those tensors that have `requires_grad=True`.

> **Note:** While the Calculus Jargon is a means to explain PyTorch's *`autograd`* component , all we need to take away is the PyTorch takes care of the Calculus for us via the **`.backward()`** method.

# Implementing Multilayer Neural Networks

> Now we focus on PyTorch as a library for implementing Deep Neural Networks.

<img src="../asssets/multilayer-perceptron.png"/>




> Each Layer can have Multiple Nodes.

When implementing a Neural Network in PyTorch, we can `subclass` the **`torch.nn.Module`** `class` to define our own Custom Network Architecture. This *`Module`* Base Class provides a lot of functionality, making it easier to Build and Train Models. For example, it allows us to *`Encapsualte Layers`* and *`Operations`* and *`Keep Track of the Model's Parameters`*.

Within this Sub Class, we *`define the Network Layers`* in the *`__init__ constructor`* and specify *`how the Layers interact in the Forward Method`*. The *`Forward Method`* `describes how the Input Data passes through the Network` and `comes together as a Computation Graph.` And the *`Backward Method`* `computes the Gradients of the Loss Function with respect to the Model's Parameters during Training.`

```python
import torch

class Model(torch.nn.Module):
    pass
```

**Inherited Methods:** 
The class inherits numerous methods from the `Module` base class, including:
- `forward()`: Defines the computation performed at every call.

- `parameters()`: Returns an iterator over module parameters.
- `state_dict()`: Returns a dictionary containing the module's state.
- `load_state_dict()`: Copies parameters and buffers from a state dict.
- `to()`: Moves and/or casts the parameters and buffers.
- `cuda()`, `cpu()`: Moves all model parameters and buffers to the GPU/CPU.
- `train()`, `eval()`: Sets the module in training/evaluation mode.
- Many other utility methods for registering hooks, buffers, and parameters.

<img src="../asssets/multilayer-perceptron-with-2-hidden-layers.png" />

In [40]:
import torch


class NeuralNetwork(torch.nn.Module):
    def __init__(self, number_of_inputs: int, number_of_outputs: int):
        super().__init__()
        self.layers = torch.nn.Sequential(
            # 1st Hidden Layer
            torch.nn.Linear(in_features=number_of_inputs, out_features=30),
            torch.nn.ReLU(),
            # 2nd Hidden Layer
            torch.nn.Linear(in_features=30, out_features=20),
            torch.nn.ReLU(),
            # Output Layer
            torch.nn.Linear(in_features=20, out_features=number_of_outputs),
        )

    # Shape of Input Tensor: [Batch Size, Number of Inputs]
    def forward(self, input_tensor: torch.Tensor) -> torch.Tensor:
        logits = self.layers(input_tensor)
        return logits


model = NeuralNetwork(number_of_inputs=50, number_of_outputs=3)

print(model)

NeuralNetwork(
  (layers): Sequential(
    (0): Linear(in_features=50, out_features=30, bias=True)
    (1): ReLU()
    (2): Linear(in_features=30, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
  )
)


> **Note:** that we used the *`Sequential Class`* when defining our NeuralNetwork Class because it makes our life easier if we have a series of Layers we want to execute in a specific order, as we are doing here. This way, after instantiating *`self.layers = torch.nn.Sequential()`*, in the *`__init__`* constructor, we just have to now call the *`self.layers`* attribute instead of calling each layer individually in the *`NeuralNetwork's forward method`*.

To check the *`Total Number of Trainable Parameters`* in our Model, we can use the `parameters()` Method.

In [47]:
total_number_of_parameters = sum(
    p.numel() for p in model.parameters() if p.requires_grad
)
print(total_number_of_parameters)

2213


> **Note:** Each parameter for which `requires_grad=True` counts as one `trainable parameter`.