<a name="0"></a>
<div style="padding: 10px;
            color: #EE4C2C; /* PyTorch color */
            margin: 10px;
            font-size: 150%;
            display: fill;
            border-radius: 1px;
            border-style: solid;
            border-color: #EE4C2C; /* PyTorch color */
            background-color: #313131; /* PyTorch dark theme background color */
            overflow: hidden;">
    <center>
        <a id='top'></a>
        <b>Table of Contents</b>
    </center>
    <br>
    <ul>
        <li>
            <a href="#1" style="color: #EE4C2C;">1 - Overview and What is PyTorch</a>
        </li>
        <li>
            <a href="#2" style="color: #EE4C2C;">2 - Tensors API</a>
            <ul>
                <li>
                    <a href="#2-1" style="color: #EE4C2C;">2.1 - Tensors Creation</a>
                </li>
                <li>
                    <a href="#2-2" style="color: #EE4C2C;">2.2 - Operations on Tensors</a>
                </li>
                <li>
                    <a href="#2-3" style="color: #EE4C2C;">2.3 - Autograd and Gradients</a>
                </li>
                <li>
                    <a href="#2-4" style="color: #EE4C2C;">2.4 - nn.Parameter class</a>
                </li>
            </ul>
        </li>
        <li>
            <a href="#3" style="color: #EE4C2C;">3 - Building Neural Networks (NN API)</a>
            <ul>
                <li>
                    <a href="#3-1" style="color: #EE4C2C;">3.1 - Layers and Modules</a>
                </li>
                <li>
                    <a href="#3-2" style="color: #EE4C2C;">3.2 - Activation Functions</a>
                </li>
                <li>
                    <a href="#3-3" style="color: #EE4C2C;">3.3 - Building Models</a>
                </li>
                <li>
                    <a href="#3-4" style="color: #EE4C2C;">3.4 - Callbacks</a>
                </li>
                <li>
                    <a href="#3-5" style="color: #EE4C2C;">3.5 - Loss Functions</a>
                </li>
                <li>
                    <a href="#3-6" style="color: #EE4C2C;">3.6 - Optimizers</a>
                </li>
            </ul>
        </li>
        <li>
            <a href="#4" style="color: #EE4C2C;">4 - Datasets API</a>
            <ul>
                <li>
                    <a href="#4-1" style="color: #EE4C2C;">4.1 - DataLoaders</a>
                </li>
                <li>
                    <a href="#4-2" style="color: #EE4C2C;">4.2 - Preprocessing Techniques</a>
                </li>
                <li>
                    <a href="#4-3" style="color: #EE4C2C;">4.3 - Creating Custom Datasets</a>
                </li>
            </ul>
        </li>
        <li>
            <a href="#5" style="color: #EE4C2C;">5 -Training Deep Learning Models</a>
            <ul>
               <li>
                    <a href="#5-1" style="color: #EE4C2C;">5.1 -Defining Helper Functions</a>
                </li>
            <li>
                    <a href="#5-2" style="color: #EE4C2C;">5.2 -Traing LinearRegression</a>
                </li>
                <li>
                    <a href="#5-3" style="color: #EE4C2C;">5.3 - Training an ANN</a>
                </li>
                <li>
                    <a href="#5-4" style="color: #EE4C2C;">5.4 - Training a CNN</a>
                </li>
                <li>
                    <a href="#5-5" style="color: #EE4C2C;">5.5 - Training an LSTM</a>
                </li>
                <li>
                    <a href="#5-6" style="color: #EE4C2C;">5.6 - Transfer Learning</a>
                </li>
            </ul>
        </li>
<li>
    <a href="#6" style="color: #EE4C2C;">6 - Parallelism and Multi-GPU</a>
    <ul>
        <li>
            <a href="#6-1" style="color: #EE4C2C;">6.1 - Utilizing GPUs</a>
            <ul>
                <li>
                    <a href="#6-1-1" style="color: #EE4C2C;">6.1.1 - GPU Setup </a>
                </li>
                <li>
                    <a href="#6-1-2" style="color: #EE4C2C;">6.1.2 - GPU Usage in PyTorch</a>
                </li>
            </ul>
        </li>
        <li>
            <a href="#6-2" style="color: #EE4C2C;">6.2 - Multi-GPU Training</a>
            </ul>
        </li>
<li>
    <a href="#7" style="color: #EE4C2C;">7 - Using TPU</a>
</li>
    <a href="#8" style="color: #EE4C2C;">8 - Thank you</a>
</li>
</div>

<a id="1"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Overview</center></h1>

# Overview
 ![images.png](attachment:bb5c290f-97cc-41dd-bf9e-8679380523a7.png)

**[PyTorch](https://pytorch.org/) is a deep learning framework used for research and development in machine learning and artificial intelligence.**

**In this notebook, we'll dive deep into PyTorch and explore its various features and capabilities. Let's get started!**


In [1]:
import torch
import torch.nn as nn
import numpy as np

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>

<a id="2"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Tensors API</center></h1>
    
# [Tensors API](https://pytorch.org/docs/stable/tensors.html)

<a id="2-1"></a>
 
## Tensors Creation

 ![scalar-vector-matrix-tensor.png](attachment:1f8f16f1-b843-4f62-93a2-321e87465ade.png)

**A `tensor` is a fundamental data structure that is similar to `arrays` or `matrices` in other programming languages. Tensors are the building blocks of `neural networks` and are used to represent data in the form of multi-dimensional arrays. They play a crucial role in deep learning because all the operations and computations in neural networks are performed on tensors.**

**Now, let's discuss the different types of tensors in PyTorch:**

* **A `scalar` tensor represents a single value, such as a floating-point number or an integer. It has zero dimensions. In PyTorch, you can create a scalar tensor using the following syntax:** 
```python
scalar = torch.tensor(42.0) # Creates a scalar tensor with the value 42.0
```
****
* **A `vector` tensor is a one-dimensional tensor, often used to represent a list of values. It can be thought of as a row or column of numbers.. In PyTorch, you can create a vector using the following syntax:** 
```python
vector = torch.tensor([1, 2, 3, 4, 5])  # Creates a 1-D tensor with 5 elements
```
****
* **A `matrix` tensor is a two-dimensional tensor, typically used to represent tabular data or images. It has rows and columns. In PyTorch, you can create a matrix using the following syntax:** 
```python
matrix = torch.tensor([[1, 2, 3], [4, 5, 6]])# Creates a 2-D tensor with 2 rows and 3 columns
```
****
* **A `multi-dimensional` tensor is a tensor with more than two dimensions, allowing you to represent complex data structures. For instance, a 3-D tensor can represent a sequence of 2D images over time, and a 4-D tensor can represent a batch of 3D images.. In PyTorch, you can create a high dimensional tensor using the following syntax:** 
```python
# Create a 4-D tensor with shape (batch_size, channels, height, width)
four_dim_tensor = torch.randn(32, 3, 64, 64)
```

**Note that `torch.randn()` is used to create a new tensor filled with random numbers drawn from a normal distribution**

**When creating a tensor we can provide multiple argument**

```python
tensor = torch.tensor(data=[[1, 2, 3], [4, 5, 6]], 
dtype=torch.float32, 
device='cpu', 
requires_grad=False)
```
**Where:**

*  `data (required)`: **This argument specifies the data or values you want to store in the tensor. It can be a Python list, a NumPy array, or another iterable. The data will be copied into the tensor.**

* `dtype (optional)`: **This argument specifies the data type (also known as the tensor's element type) of the tensor. Common data types include torch.float32, torch.int64, torch.bool, etc. If not provided, PyTorch will attempt to infer the data type based on the data argument.**

* `device (optional)`: **You can specify the device (CPU or GPU) on which the tensor should be located using this argument. If not provided, the tensor will be created on the CPU by default.**

* `requires_grad (optional)`: **If set to True, the tensor will be set up to track operations on it for automatic differentiation (autograd) during backpropagation. This is useful for gradient-based optimization and training deep learning models.**

### There are other ways of creating tensor let's take a look at a few

**Creating a tensor from ndarray `torch.from_numpy()`**

In [None]:
np_array = np.arange(0, 10, 1)
torch_tensor = torch.from_numpy(np_array)
np_array, torch_tensor

**Creating a range tensor `torch.arange()`**

In [None]:
arange_tensor = torch.arange(0, 10, 1)
arange_tensor

**Creating a tensor with the same shape as another tensor**

**`torch.zeros_like()` and `torch.ones_like()`**

In [None]:
zeros_tensor = torch.zeros_like(arange_tensor)
ones_tensor = torch.ones_like(arange_tensor)
zeros_tensor, ones_tensor

**PyTorch provides functions to create tensors filled with random values. These are often used for initializing weights in neural networks**

**`torch.rand`** **creates a tensor of random values from a uniform distribution [0, 1]**

**`torch.randn`** **creates a tensor of random values from a normal distribution (mean=0, std=1)**

In [None]:
random_uniform = torch.rand(3, 3)  
random_normal = torch.randn(2, 2) 
print(f'Random Uniform Tensor: \n{random_uniform},\n\n Random Normal Tensor: \n{random_normal}')

**Setting a seed**

By specifying a seed, you ensure consistent random numbers for reproducibility and debugging in your PyTorch code. Without setting a seed, PyTorch generates different random values each time you run your code, hindering result reproduction and issue debugging.

To set a seed in PyTorch:

For CPU-based random number generation, use `torch.manual_seed(seed)`.
For GPU (CUDA) operations, set the seed using `torch.cuda.manual_seed(seed)`.

Note that GPU usage is discussed further in this [section](#6-1)

In [None]:
# Running this cell multiple times gives the same result
torch.manual_seed(42)
random_uniform = torch.rand(3, 3)  
random_normal = torch.randn(2, 2) 
print(f'Random Uniform Tensor: \n{random_uniform},\n\n Random Normal Tensor: \n{random_normal}')

**Creating Identity and Diagonal Tensors: You can create identity matrices and diagonal tensors using specific functions.**

**`torch.eye()` and `torch.diag()`**


In [None]:
identity_matrix = torch.eye(3)
diagonal_tensor = torch.diag(torch.tensor([1, 2, 3]))

print(f'Identity Tensor: \n{identity_matrix},\n\n Diagonal Tensor: \n{diagonal_tensor}')

*****

## Operations on Tensors <a id="2-2"></a>


### Tensor Shape Manipulation and Dimension Operations

In [None]:
tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]]).type(torch.float32)
tensor_a

### Tensor Shape and Size
**You can obtain information about the shape and size of a tensor using the following attributes:**
* ``shape``: **Returns a tuple representing the dimensions of the tensor.**
* ``torch.numel()``: **Returns the total number of elements in the tensor.**

In [None]:
shape = tensor_a.shape 
total_elements = tensor_a.numel()  

print(f"The Tensor has shape: {shape} and has total_elements of: {total_elements}")

### Reshaping Tensors 
**can be done using:** 
* **``torch.reshape()`` function: Returns a tensor with the same data and number of elements but with the specified shape** 


**Ensure that the total number of elements remains the same after reshaping.**

In [None]:
reshaped_tensor = torch.reshape(tensor_a, (3, 2))

print(f"The Tensor has shape: {tensor_a.shape} Reshaped Tensor has shape: {reshaped_tensor.shape}\n")
print(f"Tensor:\n {tensor_a},\n\n Reshaped Tensor:\n {reshaped_tensor}")

**You can use `-1` as a placeholder in the target shape when reshaping a tensor.**
**PyTorch automatically calculates the size for the -1 dimension based on the total number of elements.**

In [None]:
reshaped_tensor = torch.reshape(tensor_a, (-1, 2))
reshaped_tensor

`PyTorch's view()`  Can also be used to reshape a tensor

**Example: Flattening a Tensor:**

Let's consider an example where we have a tensor x with the shape `(batch_size, channels, height, width)`:

```python
import torch

# Example tensor with shape (batch_size, channels, height, width)
x = torch.randn(32, 3, 64, 64)
```

To flatten this tensor, we can use the view() method:



```python
# Flatten the tensor
x_flattened = x.view(x.size(0), -1)
```
After this operation, x_flattened becomes a 2D tensor with the shape `(batch_size, channels * height * width).`

### Expanding Dimensions
**You can add dimensions to a tensor using:**

`torch.unsqueeze()`: **Returns a new tensor with a dimension of size one inserted at the specified position.**

**This is useful when you want to prepare a tensor for operations that require a specific shape.**

In [None]:
expanded_tensor = torch.unsqueeze(tensor_a, dim=0)  # Expand along axis 0
print(f"The expanded tensor dimensions are: {expanded_tensor.shape}\n"
      f"The original tensor has shape of: {tensor_a.shape}")

### Squeezing Dimensions
The `torch.squeeze()` **function removes dimensions with size 1 from a tensor.**

**This can be helpful for eliminating unnecessary singleton dimensions**

In [None]:
squeezed_tensor = torch.squeeze(tensor_a, dim=1)  # Remove dimension with size 1
print(f"The expanded tensor dimensions are: {expanded_tensor.shape}\n"
      f"The original tensor has shape of: {squeezed_tensor.shape}")

## Transposing a Tensor with permute()

Transposing a tensor involves rearranging its dimensions, effectively changing the order of its axes. This operation is essential for various mathematical and data manipulation tasks. In PyTorch, you can achieve this using the `permute()` function.

**The `permute()` function allows you to rearrange dimensions in a tensor, providing you with the flexibility to change the shape and orientation of your data**. You can specify the new order of dimensions using a list of indices.

Here's a simple example to illustrate transposing a tensor with `permute()`:


In [None]:
permuted_tensor = tensor_a.permute(1, 0)  # Swap dimensions 0 and 1
permuted_tensor.shape, tensor_a.shape

### Transposing a Tensor (Swapping Rows and Columns) in PyTorch

Transposing a tensor means swapping its rows and columns, effectively changing the order of its axes. In PyTorch, you can achieve this using the `torch.transpose()` function or the `.t` attribute.

**`torch.transpose()` and `.t` allow you to obtain a transposed version of a tensor by swapping the dimensions `dim0` and `dim1`**, effectively exchanging rows and columns.

Here's a simple example to illustrate transposing a tensor using `torch.transpose()`:


In [None]:
transposed_tensor_1 = tensor_a.t()
transposed_tensor_2 = torch.transpose(tensor_a, 0, 1)  # Swap axes 0 and 1

print(f'The original tensor shape is: {tensor_a.shape},\n' 
      f'The transposed tensor using .t shape is: {transposed_tensor_1.shape},\n' 
      f'The transposed tensor using .tranpose shape is: {transposed_tensor_2.shape}')

****

### Arithmetic Operations on Tensors

In [None]:
# Create tensors 
tensor_a = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32)
tensor_b = torch.tensor([[5, 6], [7, 8]], dtype=torch.float32)

In [None]:
print(f'tensor a is: {tensor_a} and of shape {tensor_a.shape}\n\n')

print(f'tensor b is: {tensor_b} and of shape {tensor_b.shape}')

### 1. Addition and Subtraction


**`Element-wise` addition/subtraction is performed between two tensors of the `same shape`.
Each element in the result is the sum/difference of the corresponding elements from the input tensors.**

In [None]:
addition_result = tensor_a + tensor_b
subtraction_result = tensor_a - tensor_b

print(f'addition_result   is: {addition_result} and of shape {addition_result.shape}\n\n')
print(f'addition_result   is: {subtraction_result} and of shape {subtraction_result.shape}')

### 2. Multiplication

#### 2.1 Element-wise Multiplication (*)

**`Element-wise` multiplication is performed between two tensors of the `same shape`.
Each element in the result is the product of the corresponding elements from the input tensors.**

In [None]:
result_mul  = tensor_a * tensor_b
print(f'element wise multiplication result  is: {result_mul} and of shape {result_mul.shape}')

#### 2.2 Matrix Multiplication (@ or torch.matmul())

**`Matrix multiplication` (dot product) is performed between two tensors where the inner dimensions match `(the number of columns in the first tensor equals the number of rows in the second tensor)`.**

In [None]:
result_matmul = torch.matmul(tensor_a, tensor_b)
print(f'Matrix multiplication result  is: {result_matmul} and of shape {result_matmul.shape}')

### 3. Division

#### Element-wise Division (/)

**`Element-wise` division is performed between two tensors of the `same shape`.**

**Each element in the result is the quotient of the corresponding elements from the input**

In [None]:
result_div = tensor_a / tensor_b
print(f'Matrix multiplication result  is: {result_div} and of shape {result_div.shape}')

### 4. Exponentiation  

`` ** symbol or torch.exp()``

**Each element in the result is the exponential value of the corresponding element from the input tensor..**

In [None]:
result_exp = tensor_a ** tensor_b

print(f'Exponentiation result  is: {result_exp} and of shape {result_exp.shape}')

### 5. Square Root  

 
**Element-wise Square Root `torch.sqrt()`**
 

**Each element in the result is the square root of the corresponding element from the input tensor.**

In [None]:
result_sqrt = torch.sqrt(tensor_a)

print(f'Logarithm result  is: {result_sqrt} and of shape {result_sqrt.shape}')

### 6. Natural Logarithm

**Element-wise Logarithm `torch.log()`**

**Each element in the result is the natural logarithm (base e) of the corresponding element from the input tensor.**

In [None]:
result_log = torch.log(tensor_a)

print(f'Logarithm result  is: {result_log} and of shape {result_log.shape}')

### Reduction Operations in PyTorch

Reduction operations allow you to compute a single value by aggregating or summarizing the elements within a tensor. Common reduction operations include `sum`, `mean`, `max`, and `min`.
You can perform reduction operations along specified axes or dimensions of the tensor.


**Here's a simple example to illustrate reduction operations:**


In [None]:
# Compute the sum of all elements in the tensor
total_sum = torch.sum(tensor_a)

# Compute the mean along axis 1 (rows)
mean_along_rows = torch.mean(tensor_a, dim=1)

# Compute the maximum value along axis 0 (columns)
max_along_columns = torch.max(tensor_a, dim=0)

# Compute the minimum value along axis 1 (rows)
min_along_rows = torch.min(tensor_a, dim=1)

print("Total Sum:", total_sum)
print()
print("Mean Along Rows:", mean_along_rows)
print()
print("Max Along Columns:", max_along_columns)
print()
print("Min Along Rows:", min_along_rows)

### Broadcasting in PyTorch

Broadcasting is a powerful concept in PyTorch that simplifies element-wise operations between tensors of different shapes. It allows you to perform operations on tensors that don't have exactly the same shape but can be made compatible through broadcasting.

**The key idea behind broadcasting is that the smaller tensor is "broadcasted" or expanded to match the shape of the larger one**, so you can perform element-wise operations without explicitly replicating data. This not only simplifies your code but also makes it more memory-efficient.

Let's illustrate broadcasting with a simple example:

In [None]:
scalar = 2
result_broadcast = tensor_a + scalar
print(f'broadcast results is: {result_broadcast} and of shape {result_broadcast.shape}')

### Concatenation in PyTorch

Concatenation is a fundamental operation in PyTorch that enables you to `combine tensors` (multi-dimensional arrays) along a specified axis or dimension.In PyTorch, you can easily perform this concatenation operation using the `torch.cat()`  

**Let's consider the following scenario:**


You have two 2x2 tensors, `tensor_a` and `tensor_b`, and you want to concatenate them along dimension 0 to create a new tensor with a shape of 4x2.

Let's illustrate Concatenation with a simple example:

In [None]:
concatenated_tensor = torch.cat((tensor_a, tensor_b), dim=0)
print(f'concatenated tensor is: {concatenated_tensor} and of shape {concatenated_tensor.shape}')

****

## Autograd and Gradients <a id="2-3"></a>


`Autograd`, short for Automatic Differentiation, is a key feature of PyTorch that allows for automatic computation of gradients (derivatives) of tensors. It is an essential component for training deep learning models through backpropagation.

**Here are the key concepts related to Autograd:**

**1. `Gradient Calculation`**
In deep learning, we often need to compute gradients of a loss function with respect to model parameters. Autograd simplifies this process.
When you perform operations on tensors that require gradients, PyTorch automatically tracks these operations and constructs a computation graph.

**`2. Computation Graph`**
A computation graph is a directed acyclic graph (DAG) that represents the sequence of operations applied to tensors.
Each operation in the graph is a node, and tensors flowing through these nodes are edges.
The graph allows PyTorch to trace how input tensors influence the output tensors, which is crucial for gradient calculation.

**`3. Dynamic Computational Graph`**
PyTorch uses a dynamic computation graph, which means the graph is built on-the-fly as operations are executed.
This dynamic nature allows flexibility and is well-suited for models with varying architectures or inputs of different shapes.

**`4. Gradients`**
Once you have a computation graph, you can compute gradients by backpropagating through the graph.
Gradients represent how a small change in each input tensor would affect the final output.
The gradients are computed using the chain rule of calculus, and they indicate the direction and magnitude of parameter updates during optimization.

### Working with Gradients

#### 1. Enabling Gradient Tracking
By default, PyTorch tensors do not track gradients. To enable gradient tracking, you can set the `requires_grad` attribute to `True` when creating a tensor. As in the following example:


In [None]:
x = torch.tensor([3.0, 2.0, 3.0], requires_grad=True)

#### 2.  Forward Pass
During the `forward pass`, you perform operations on tensors. PyTorch records these operations in the computation graph.

y = x * 2: You create a new tensor y by multiplying each element of x by 2. So, y becomes [6.0, 4.0, 6.0].

z = y.mean(): You calculate the mean of the tensor y, which is (6.0 + 4.0 + 6.0) / 3 = 16.0 / 3 = 5.33333


In [None]:
y = x * 2
z = y.mean()

#### 2.  Backward Pass
To compute gradients, you initiate the backward pass using the `backward()` method on a scalar tensor (usually a loss)

Chain Rule: The backward pass uses the `chain rule of calculus` to calculate the gradients. It starts from the final scalar value z and works backward through the computation graph to compute the gradients of intermediate tensors with respect to the target tensor (x in this case).

It computes ∂z/∂y, which is the gradient of z with respect to y.
Then, it computes ∂y/∂x, which is the gradient of y with respect to x

In [None]:
z.backward()

The result of the backward pass is stored in the .grad attribute of the tensors with requires_grad=True. In this case, x.grad will contain the gradient of z with respect to x.

In [None]:
x.grad

#### Turning Autograd Off and On
There are situations where you will need fine-grained control over whether autograd is enabled. There are multiple ways to do this, depending on the situation.

The simplest is to change the `requires_grad` flag on a tensor directly:

In [None]:
# Create a tensor with Autograd enabled (requires_grad=True)
x = torch.tensor([2.0], requires_grad=True)

# Perform some operations with Autograd enabled
y = x * 3
z = y ** 2
w = z.mean()

# Compute gradients while Autograd is enabled
w.backward()

# Access the gradient of x
gradient_with_autograd = x.grad

# Print the gradient
print("Gradient with Autograd:", gradient_with_autograd.item())

# Now, let's turn Autograd off for a specific tensor
x.requires_grad_(False)

# Perform operations without Autograd (Autograd is off for x)
y = x * 3
z = y ** 2
w = z.mean()
try:
    # Attempt to compute gradients 
    w.backward()
except:
    print("This tensor does't have require gradients set to True")

You may have a tensor that requires gradient tracking, but you want a copy that does not. For this we have the `Tensor` object's `detach()` method - it creates a copy of the tensor that is *detached* from the computation history:

In [None]:
x = torch.rand((2,2), requires_grad=True)
y = x.detach()

print(x)
print()
print(y)

### Autograd and In-place Operations

When using Autograd, it's crucial to avoid `in-place operations`, as they can remove essential information needed for gradient computation during the `backward()` pass. PyTorch is designed to detect and prevent in-place operations on leaf variables that require Autograd, as shown below.

Note: The following code cell will intentionally raise a runtime error, which is expected.

![image.png](attachment:da0d8eca-4279-43e7-aeaa-9a9279ce83a1.png)

### Autograd Profiler

Autograd keeps a detailed record of every step in your computation. This record, along with timing information, can be used as a helpful profiler. Autograd includes a built-in feature for this purpose. 
Here's a straightforward example of how to use it:

In [None]:
x = torch.randn(2, 3, requires_grad=True)
y = torch.rand(2, 3, requires_grad=True)
z = torch.ones(2, 3, requires_grad=True)

with torch.autograd.profiler.profile(use_cuda=False) as prf:
    for _ in range(1000):
        z = (z / x) * y
        
print(prf.key_averages().table(sort_by='self_cpu_time_total'))

****

## nn.Parameter <a id="2-4"></a>


In PyTorch, `nn.Parameter` is a class that is a subclass of the `torch.Tensor` class. It is specifically designed to be used as a `container for tensors` that should be considered parameters of a PyTorch `nn.Module`. Parameters are tensors that are meant to be learned during the training process, such as `weights` and `biases` in a neural network.

**Why nn.Parameter is useful?**

* Requires Grad Calculation: When you create a tensor using nn.Parameter, it is automatically registered as a parameter of the parent module, and PyTorch keeps track of it for gradient computation during backpropagation. This means that any operations involving these tensors will have gradients computed, allowing them to be updated during training using optimization techniques like stochastic gradient descent (SGD).

* Initialization: Parameters created using nn.Parameter are typically initialized with random values (e.g., Gaussian or uniform distribution) by default. However, you can customize the initialization method if needed.

* Access: You can easily access the parameters of a PyTorch module using the parameters() method, which returns an iterable containing all the nn.Parameter objects within the module.

Here's a simple example of how you might create and use an nn.Parameter in a PyTorch module:

In [None]:
class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # Create an nn.Parameter for weight and bias
        self.weight = nn.Parameter(torch.randn(10, 5))
        self.bias = nn.Parameter(torch.zeros(10))

    def forward(self, x):
        # Use the parameters in the forward pass
        z = torch.matmul(x, self.weight.t()) + self.bias
        return z

# Instantiate the model
model = MyModel()

# Access and print the parameters
for param in model.parameters():
    print(param)

****

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>

<a id="3"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Building Neural Networks (NN API)</center></h1>
    
# [Building Neural Networks (NN API)](https://pytorch.org/docs/stable/tensors.html)

## 3.1 - Layers and Modules <a id="3-1"></a>


**Layers are the building blocks of neural networks. In PyTorch, you can create layers using predefined classes provided by the `torch.nn module`. Common layer types include:**

### [Linear Layer ](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html?highlight=linear+layer) 

The Linear Layer in a PyTorch model receives input from every neuron of its preceding layer and performs matrix-vector multiplication.

![neural_net2.jpg](attachment:f784d293-213e-49e0-9604-6435f2898630.jpg)


## Usage

In PyTorch, we can create a Linear Layer using the following syntax:

```python
import torch.nn as nn

nn.Linear(
    in_features,
    out_features,
    bias=True
)
```

**Where**

* **in_features**: The number of input features or the size of the input vector.

* **out_features**: The number of output features or the size of the output vector.

* **bias**: Whether to include a bias term or not.


### [BatchNorm2d](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html?highlight=batch+normalization)

Batch normalization applies a transformation that maintains the mean output close to `0` and the output standard deviation close to `1`.

![0_pSSzicm1IH4hXOHc.png](attachment:deeb32f3-72b4-4388-8cde-fbb70a18e41f.png)


## Usage

In PyTorch, we can use the BatchNormalization Layer using the following syntax:

```python
import torch.nn as nn

nn.BatchNorm2d(
    num_features,
    eps=1e-05,
    momentum=0.1,
)
```

**Where**

* **num_features**: The number of features (channels) in the input.

* **eps**: A small value added to the denominator for numerical stability (default is 1e-05).

* **momentum**: The value used for the running mean and variance computation (default is 0.1).


### [Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html?highlight=dropout#torch.nn.Dropout)

The Dropout layer randomly sets input units to 0 with a specified probability (rate) at each step during training time. This regularization technique helps prevent overfitting by reducing the reliance on specific neurons.

![dropout.gif](attachment:514188d9-01b5-434c-af36-5c7fff006339.gif)

## Usage

In PyTorch, we can use the Dropout Layer using the following syntax:

```python
import torch.nn as nn

nn.Dropout(
    p=rate,
    inplace=False
)
```


**Where**

* **p**: The probability of dropping an input unit (rate). It should be a float between 0 and 1.

* **inplace**:  A boolean indicating whether the operation is performed in-place (default is False).

* **momentum**: The value used for the running mean and variance computation (default is 0.1).


# [Conv2d ](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)

The Conv2D layer in PyTorch creates a convolution kernel that is applied to the layer input to produce a tensor of outputs. It is a fundamental building block of convolutional neural networks (CNNs) used for tasks such as image classification and feature extraction.

![convgif.gif](attachment:1b14bee2-fbd0-4c92-943f-7802afc83dd5.gif)

## Usage

In PyTorch, we can create a Conv2D layer using the following syntax:

```python
import torch.nn as nn

nn.Conv2d(
    in_channels,
    out_channels,
    kernel_size,
    stride=1,
    padding=0,
    bias=True
)
```

**Where**

* **in_channels**: The number of input channels or the depth of the input feature map.

* **out_channels**:  The number of output channels or the depth of the output feature map.

* **kernel_size**: The size of the convolutional kernel, specified as a tuple (height, width).

* **stride**: The stride of the convolution along the height and width (default is 1).

* **padding**: padding: One of "valid" or "same". "valid" means no padding, while "same" results in padding with zeros evenly to the left/right or up/down of the input (default is 0).

* **bias**: Whether the layer uses a bias term or not (default is `True`).


### [MaxPool2d](http://)  

The MaxPooling2D layer in PyTorch downsamples the input along its height and width by taking the maximum value over an input window. It is a common operation used in convolutional neural networks (CNNs) for reducing spatial dimensions while preserving important features.

![MaxpoolSample2.png](attachment:fc0ec514-42f4-4d88-b468-9fbeabe681a1.png)

## Usage

In PyTorch, we can create a MaxPooling2D layer using the following syntax:

```python
import torch.nn as nn

nn.MaxPool2d(
    kernel_size,
    stride=None,
    padding=0
)
```

**Where**

* **kernel_size**: The size of the pooling window, specified as a tuple (height, width)

* **stride**: The stride of the pooling operation along the height and width (default is None).

* **padding**: One of "valid" or "same". "valid" means no padding, while "same" results in padding with zeros evenly to the left/right or up/down of the input (default is 0).

### [AvgPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html)

The AveragePooling2D layer in PyTorch downsamples the input along its height and width by taking the average value over an input window. It is a pooling operation commonly used in convolutional neural networks (CNNs) for reducing spatial dimensions while smoothing the features.

![Avg-Pooling.png](attachment:26bfb165-c7f2-4e7e-88bc-51797c6bbfd3.png)

#### Usage

In PyTorch, we can create an AveragePooling2D layer using the following syntax:

```python
import torch.nn as nn

nn.AvgPool2d(
    kernel_size,
    stride=None,
    padding=0
)
```
**Where**

* **kernel_size**: The size of the pooling window, specified as a tuple (height, width)

* **stride**: The stride of the pooling operation along the height and width (default is None).

* **padding**: One of "valid" or "same". "valid" means no padding, while "same" results in padding with zeros evenly to the left/right or up/down of the input (default is 0).

### [RNN (Recurrent Neural Network)](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html)
The Recurrent Neural Network (RNN) layer in PyTorch is a fundamental building block for sequence modeling tasks. It processes sequential data by maintaining hidden states and updating them at each time step, allowing it to capture dependencies across time.

![image.png](attachment:1743bc43-c201-44ff-ac83-e2c31ba7bd0f.png)

#### Usage
In PyTorch, you can create an RNN layer using the following syntax:


```python
import torch.nn as nn

nn.RNN(
    input_size,
    hidden_size,
    num_layers,
    batch_first=False,
    dropout=0,
    bidirectional=False
)
```

#### Where

**input_size**: The number of expected features in the input.

**hidden_size**: The number of features in the hidden state.

**num_layers**: The number of recurrent layers (default is 1).

**batch_first**: If True, the input and output tensors are provided as (batch_size, seq_length, input_size) (default is False).

**dropout**: If non-zero, applies dropout to the output of each RNN layer (default is 0).

**bidirectional**: If True, makes the RNN layer bidirectional (default is False).

### [LSTM (Long Short-Term Memory)](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html)
The Long Short-Term Memory (LSTM) layer in PyTorch is a type of RNN layer designed to capture long-range dependencies in sequential data. It uses a memory cell and gates to control the flow of information through the network.

![image.png](attachment:3f309dec-0b08-43e9-8fc5-3e71e619e2b6.png)

#### Usage
In PyTorch, you can create an LSTM layer using the following syntax:


```python
import torch.nn as nn

nn.LSTM(
    input_size,
    hidden_size,
    num_layers,
    batch_first=False,
    dropout=0,
    bidirectional=False
)
```

#### Where

**input_size**: The number of expected features in the input.

**hidden_size**: The number of features in the hidden state.

**num_layers**: The number of LSTM layers (default is 1).

**batch_first**: If True, the input and output tensors are provided as (batch_size, seq_length, input_size) (default is False).

**dropout**: If non-zero, applies dropout to the output of each LSTM layer (default is 0).

**bidirectional**: If True, makes the LSTM layer bidirectional (default is False).

### [GRU (Gated Recurrent Unit)](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html)
The Gated Recurrent Unit (GRU) layer in PyTorch is another type of RNN layer that is computationally efficient and can capture long-range dependencies. It uses update and reset gates to control information flow.

![image.png](attachment:7760e22e-5770-4530-a5f0-5e8608c8fc65.png)

#### Usage
In PyTorch, you can create an GRU layer using the following syntax:


```python
import torch.nn as nn

nn.GRU(
    input_size,
    hidden_size,
    num_layers,
    batch_first=False,
    dropout=0,
    bidirectional=False
)
```

#### Where

**input_size**: The number of expected features in the input.

**hidden_size**: The number of features in the hidden state.

**num_layers**: The number of GRU layers (default is 1).

**batch_first**: If True, the input and output tensors are provided as (batch_size, seq_length, input_size) (default is False).

**dropout**: If non-zero, applies dropout to the output of each GRU layer (default is 0).

**bidirectional**: If True, makes the GRU layer bidirectional (default is False).

### [Multi-Head Attention](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html)
Multi-Head Attention is a crucial component of transformer-based neural networks, such as the Transformer model and its variants (e.g., BERT, GPT). It enables the model to focus on different parts of the input sequence simultaneously, allowing it to capture complex relationships and dependencies within the data.

![image.png](attachment:e1785254-4c3a-4d01-b9d6-868bf1a8cee5.png)

#### Usage
In PyTorch, you can create a Multi-Head Attention layer using the following syntax:

```python
import torch.nn as nn

nn.MultiheadAttention(
    embed_dim,
    num_heads,
    dropout=0.0,
    bias=True,
    add_bias_kv=False,
    add_zero_attn=False,
    kdim=None,
    vdim=None
)
```

#### Where

**embed_dim**: The dimension of the input embeddings.

**num_heads**: The number of attention heads. Each head attends to different parts of the input.

**dropout**: If non-zero, applies dropout to the output of the attention layers (default is 0.0).

**bias**: If True, enables bias in the attention calculation (default is True).

**add_bias_kv**: If True, adds bias to the key and value sequences (default is False).

**add_zero_attn**: If True, adds a learnable parameter to the attention calculation (default is False).

**kdim**: The dimension of the key vectors. By default, it's set to embed_dim, but you can specify a different dimension if needed.

**vdim**: The dimension of the value vectors. By default, it's set to embed_dim, but you can specify a different dimension if needed.

## [Embedding Layer](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)

The Embedding Layer in PyTorch is used to create dense representations of categorical variables, commonly used in natural language processing tasks where words are converted into numerical vectors.

In PyTorch, you can use the Embedding Layer with the following syntax:

```python

import torch.nn as nn

embedding_layer = nn.Embedding(
    num_embeddings, embedding_dim, padding_idx=None,
    max_norm=None, norm_type=2.0, scale_grad_by_freq=False,
    sparse=False, _weight=None
)
```

**Where:**

**num_embeddings**: Integer, the size of the vocabulary, i.e., the total number of unique categories.

**embedding_dim**: Integer, the dimension of the dense embedding.

**padding_idx**: Optional integer, indicating the padding index. If specified, the padding index will have a learned embedding with all zeros.

**max_norm**: Optional float, if specified, will normalize embeddings during forward pass to have a maximum norm of this value.

**norm_type**: Float, the type of norm to be applied when max_norm is specified (e.g., 2.0 for L2 norm).

**scale_grad_by_freq**: Boolean, whether to scale gradients by the frequency of the words during training.
sparse: Boolean, indicating whether to use sparse gradients for embeddings.
_weight: Optional pre-trained embedding weights (a tensor).

### Custom Layer in PyTorch

Custom layers in PyTorch allow you to define your own neural network components with custom behavior. You can create custom layers by subclassing `nn.Module.` Implement the `__init__` method to set up any learnable parameters and other configuration options. Then, implement the `forward` method to define the forward pass logic.

##### Usage

In PyTorch, we can create an custom layer using the following syntax where we implement a custom linear layer

 

In [None]:
import torch
import torch.nn as nn

class CustomLinear(nn.Module):
    def __init__(self, in_features, out_features):
        super(CustomLinear, self).__init__()
        self.weight = nn.Parameter(torch.randn(out_features, in_features))
        self.bias = nn.Parameter(torch.randn(out_features))

    def forward(self, x):
        out = torch.matmul(x, self.weight.t()) + self.bias
        return out


In [None]:
# Instantiate the custom linear layer
custom_layer = CustomLinear(64, 32)  # Example: input size 64, output size 32

# Example input data
input_data = torch.randn(16, 64)

# Use the custom layer
output = custom_layer(input_data)
print(output.shape)  # Print the output shape

****

## 3.2 - Activation functions <a id="3-2"></a>


**Activation functions are essential components in neural networks that introduce non-linearity, allowing neural networks to model complex relationships in data.**

### Sigmoid
Sigmoid is another common activation function that maps input values to the range `[0, 1]`. It is often used in `binary classification` problems where the output represents probabilities.

Usage
In PyTorch, you can apply the Sigmoid function using the following syntax:

In [None]:
input_tensor = torch.tensor([2.0, 1.0, -2.0])

# Define the Sigmoid activation function
sigmoid = nn.Sigmoid()

# Apply Sigmoid to the input
output = sigmoid(input_tensor)

output

### Softmax
Softmax is used in `multi-class classification` problems to convert a vector of raw scores into a `probability distribution over multiple classes`. It exponentiates each score and normalizes them to sum to 1.

In [None]:
# Create a sample input tensor (raw scores)
input_tensor = torch.tensor([2.0, 1.0, 0.1])

# Define the Softmax activation function
softmax = nn.Softmax(dim=0)

# Apply Softmax to the input
output = softmax(input_tensor)

print(output)

### ReLU (Rectified Linear Activation)
ReLU (Rectified Linear Unit) is one of the most commonly used activation functions in neural networks. It replaces all negative values in the input with zero and keeps positive values unchanged.

In [None]:
# Create a sample input tensor
input_tensor = torch.tensor([-1.0, 2.0, -0.5, 3.0])

# Define the ReLU activation function
relu = nn.ReLU()

# Apply ReLU to the input
output = relu(input_tensor)

print(output)

### Custom Activation Function in PyTorch

Custom activation functions in PyTorch allow you to define your own activation functions with custom behavior. You can create custom activation functions by subclassing `nn.Module`. Implement the `forward` method to apply your custom function to the input tensor.

## Usage

In PyTorch, we can create a custom activation function using the following syntax:

```python
import torch
import torch.nn as nn

class SwishActivation(nn.Module):
    def forward(self, x):
        return x * torch.sigmoid(x)
```

****

## 3.3 - Building Models <a id="3-3"></a>




In PyTorch, creating models typically involves several main steps. These steps are crucial for defining the architecture of your neural network, specifying how it should learn from data.

**Steps to Create a Custom Model**

1. **Import Dependencies**: Import the necessary PyTorch modules and packages, such as `torch.nn` for defining neural network components.

2. **Define the Model Class**: Create a custom Python class that inherits from `nn.Module`. This class will represent your neural network model. Define the network's architecture by adding layers and specifying their forward pass in the `forward` method.

3. **Initialize Layers**: In the `__init__` method of your custom model class, initialize the layers (e.g., convolutional layers, fully connected layers) that you'll use in your neural network.

4. **Forward Pass**: Implement the forward pass in the `forward` method of your model class. This method defines how the input data passes through the layers of your model to produce an output.

5. **Training and Optimization**: After defining your custom model, you can use it for training and optimization tasks. You'll need to define a loss function and choose an optimization algorithm (e.g., stochastic gradient descent) to train your model on your dataset.


Here's an example of how to create a custom model in PyTorch using the following syntax:

```python
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # Define your layers here
        self.fc1 = nn.Linear(64, 128)
        self.fc2 = nn.Linear(128, 10)
        
    def forward(self, x):
        # Implement the forward pass
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Create an instance of the custom model
model = MyModel()
```

### More Complex Example

In this example, we'll create a more complex Convolutional Neural Network (CNN) model using PyTorch. This model is designed for image classification tasks and consists of both convolutional and fully connected layers. Let's break down the code and explain its components:

```python
import torch
import torch.nn as nn

class ComplexCNN(nn.Module):
    def __init__(self, num_classes):
        super(ComplexCNN, self).__init__()

        # Sequential block for convolutional layers
        self.conv_layers = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(128),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(256),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

        # Sequential block for fully connected layers
        self.fc_layers = nn.Sequential(
            nn.Linear(256 * 4 * 4, 512),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(512, num_classes)
        )

    def forward(self, x):
        # Forward pass through convolutional layers
        x = self.conv_layers(x)

        # Flatten the feature maps
        x = x.view(x.size(0), -1)

        # Forward pass through fully connected layers
        x = self.fc_layers(x)

        return x
```

#### Note:

The use of `nn.Sequential` allows us to group layers together in a sequential manner, making the code more concise and readable. 

## Visualizing Model Architecture with torchsummary

The `torchsummary` library is a powerful tool for summarizing and inspecting the architecture of PyTorch neural network models. Here's a step-by-step guide on how to use `torchsummary`:

1. **Installation**: Begin by installing the `torchsummary` library using the following command:

```python
!pip install torchsummary
```


2. **Importing Libraries**: Import the required libraries into your Python script or notebook:

```python
import torch
import torch.nn as nn
from torchsummary import summary
```

3. Defining Model Architecture: Define your PyTorch model as usual. This includes specifying the layers, architecture, and any custom components specific to your neural network.

4. Visualizing the Model Architecture: To visualize the architecture of your model, use the following code:

```python
summary(model, input_size)
```

**Where**:

**model** represents the PyTorch model for which you want to visualize the architecture.

**input_size** is the expected input shape of the model.
For instance, if your model expects input data with 3 channels and an image size of 32x32 pixels, you would specify input_size=(3, 32, 32).

****

## 3.4 - Callbacks <a id="3-4"></a>


A callback is a mechanism in deep learning frameworks like `PyTorch` or `TensorFlow` that allows you to customize and `extend` the behavior of a training process `during training` a neural network. It provides a way to specify certain actions to be taken at various points during training, such as `at the end of an epoch` or `after each batch`. Callbacks are often `used for purposes like monitoring training progress`, `saving model checkpoints`, `applying learning rate schedules`, early stopping, and more.

**Here are some key aspects of callbacks:**

**Customizable Actions**: Callbacks enable you to define custom actions or functions that will be executed at specific points during training. For example, you can specify that you want to save the model's weights after each epoch or display training metrics periodically.

**Modular and Reusable**: Callbacks are modular and reusable pieces of code. You can create your own custom callbacks to perform specific tasks tailored to your project's requirements.

**Non-Intrusive**: Callbacks don't interfere with the core training loop of the deep learning framework. They complement the training process without altering the fundamental training algorithm.

**Monitoring and Logging**: Callbacks are commonly used for monitoring training metrics such as loss, accuracy, or custom evaluation metrics. You can log these metrics to track how your model is performing over time.

**Early Stopping**: One common use of callbacks is early stopping, where training is halted when a certain condition is met, such as when the validation loss stops improving, to prevent overfitting.

**Model Checkpoints**: You can use callbacks to save model checkpoints at specific intervals, ensuring that you can restore the model to a particular state if needed.

**Learning Rate Scheduling**: Callbacks can be used to adjust the learning rate during training, enabling you to fine-tune the training process for better convergence.

**TensorBoard Integration**: Callbacks can integrate with tools like TensorBoard for visualizing and analyzing training progress.

**Callback Chains**: Multiple callbacks can be chained together to perform a sequence of actions during training.

### TensorBoard 

TensorBoard is a powerful visualization tool primarily associated with TensorFlow. However, you can use the TensorBoardX library to achieve similar functionality in PyTorch. This callback logs events for TensorBoardX, enabling you to visualize and monitor your PyTorch models.

#### Usage

In PyTorch, you can use the TensorBoard callback with the help of the TensorBoardX library. Here's how you can set it up:

```python
from torch.utils.tensorboard import SummaryWriter

# Create a SummaryWriter for TensorBoard
writer = SummaryWriter(log_dir="logs")

# Inside your training loop or callback
for epoch in range(num_epochs):
    # ... Training code ...

    # Add scalars to TensorBoard
    writer.add_scalar("Loss", loss, global_step=batch_num)
    writer.add_scalar("Accuracy", accuracy, global_step=batch_num)

# Close the SummaryWriter when done
writer.close()
```

**Where**

**log_dir**: The path to the directory where log files are saved, which can be parsed by TensorBoard.

### Learning Rate Scheduler

A Learning Rate Scheduler is a callback that dynamically adjusts the learning rate during training to improve model convergence. In PyTorch, you can easily set up a Learning Rate Scheduler using built-in scheduler classes like `StepLR`.

#### Usage

In PyTorch, you can create a Learning Rate Scheduler callback as follows:

```python
import torch.optim as optim
from torch.optim import lr_scheduler

# Create an optimizer (e.g., SGD or Adam)
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Define a learning rate scheduler using StepLR
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

# Inside your training loop or callback
for epoch in range(num_epochs):
    # ... Training code ...

    # Update the learning rate using the scheduler
    scheduler.step()
```
**Where:** 

**optimizer**: The PyTorch optimizer used for training.

**step_size**: The number of epochs after which the learning rate is adjusted. For example, with step_size=10, the learning rate will be updated every 10 epochs.

**gamma**: The factor by which the learning rate is multiplied when adjusted. For example, with gamma=0.1, the learning rate is reduced to 10% of its current value.

## Create a Custom Callback  
The CheckpointCallback class is a custom callback for `saving checkpoints` during training.
It accepts two optional parameters during initialization:

**`checkpoint_dir`**: The directory where checkpoints will be saved (default is './checkpoints').                      
**`save_interval`**: The interval (in epochs) at which checkpoints should be saved (default is 5).

The `on_epoch_end` method is called at the end of each training epoch. It checks if the current epoch is a multiple of the specified `save interval` and, if so, calls the `save_checkpoint` method to save a checkpoint.




```python
# Create a custom callback class to save checkpoints
class CheckpointCallback(torch.nn.Module):
    def __init__(self, checkpoint_dir='./checkpoints', save_interval=5):
        super(CheckpointCallback, self).__init__()
        self.checkpoint_dir = checkpoint_dir
        self.save_interval = save_interval
        self.epoch = 0

    def on_epoch_end(self, epoch, model, optimizer, logs=None):
        self.epoch = epoch
        if (epoch + 1) % self.save_interval == 0:
            self.save_checkpoint(model, optimizer, epoch)

    def save_checkpoint(self, model, optimizer, epoch):
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
        }
        checkpoint_path = f'{self.checkpoint_dir}/model_checkpoint_epoch{epoch}.pt'
        torch.save(checkpoint, checkpoint_path)
        print(f"Model checkpoint saved at epoch {epoch}:

```

The `CheckpointCallback` class saves checkpoints at the end of each epoch.

#### Usage

```python
save_interval = 5 
# Instantiate the custom checkpoint callback with the specified interval
checkpoint_callback = CheckpointCallback(checkpoint_dir, save_interval)

# Training loop
num_epochs = 20  # Total number of epochs
for epoch in range(num_epochs):
    # Perform training steps

    # Call the custom checkpoint callback at the end of each epoch
    checkpoint_callback.on_epoch_end(epoch, model, optimizer)
```

****

## 3.5 - Loss Functions <a id="3-5"></a>

A loss function serves as a tool for assessing the difference between the predicted output and the actual target (ground truth) for a given set of input data. The purpose of a loss function is to provide a measure of how well or poorly a machine learning model is performing on a specific task

### Cross-Entropy Loss (Log Loss)
Cross-Entropy Loss, also known as Log Loss, is a common loss function used in `classification` problems, especially when dealing with multiple classes. It measures the dissimilarity between predicted class probabilities and the true class labels.

#### Usage
In PyTorch, you can use the Cross-Entropy Loss function as follows:

```python
import torch.nn as nn

# Define the Cross-Entropy Loss function
criterion = nn.CrossEntropyLoss()

# Inside your training loop or forward pass
output = model(input)
loss = criterion(output, target)
```

### Mean Squared Error (MSE) Loss

Mean Squared Error Loss is a common loss function used in `regression` problems. It measures the average squared difference between predicted and actual values.

#### Usage
In PyTorch, you can use the Mean Squared Error Loss function as follows:
```python
import torch.nn as nn

# Define the Mean Squared Error Loss function
criterion = nn.MSELoss()

# Inside your training loop or forward pass
output = model(input)
loss = criterion(output, target)
```

### Binary Cross-Entropy Loss

Binary Cross-Entropy Loss is used in `binary classification` problems to measure the dissimilarity between predicted binary class probabilities and true binary labels.

#### Usage
In PyTorch, you can use the Binary Cross-Entropy Loss function as follows
```python
import torch.nn as nn

# Define the Binary Cross-Entropy Loss function
criterion = nn.BCELoss()

# Inside your training loop or forward pass
output = model(input)
loss = criterion(output, target)
```

****

## 3.6 - Optimizers <a id="3-6"></a>

Optimizers are algorithms or techniques used to adjust the parameters of a model in order to minimize the loss function. The main goal of an optimizer is to find the best set of model parameters that result in the lowest possible value of the loss function


### How Optimizers Work
* **Forward Pass**: During the forward pass, the model computes predictions given the input data.

* **Loss Calculation**: The loss function quantifies the difference between the model's predictions and the ground truth.

* **Backpropagation**: PyTorch automatically computes gradients of the loss with respect to model parameters using the backward pass.

* **Optimizer Step**: The optimizer updates model parameters using the computed gradients, effectively adjusting them to minimize the loss.

### Training Loop Example

* In each training iteration, the model performs a forward pass to make predictions.
* The loss is computed based on the predictions and ground truth.
* The optimizer's `zero_grad()` method is used to clear previous gradients.
* Backpropagation computes gradients of the loss with respect to model parameters.
* The optimizer's `step()` method updates the model parameters using the computed gradients.

#### Example 

```python
import torch.optim as optim

# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    for batch_data in dataloader:
        # Forward Pass
        outputs = model(batch_data)

        # Calculate Loss
        loss = loss_function(outputs, targets)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()

        # Optimizer Step
        optimizer.step()
```

### Here is an example of the most common optimizers: 

### Stochastic Gradient Descent (SGD)
SGD is the most basic optimizer. It updates model parameters by computing the gradients of the loss with respect to the parameters and moving in the direction that reduces the loss.

#### Usage
In PyTorch, you can use the SGD as follows

```python
import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=learning_rate)
```

### RMSprop
RMSprop adjusts the learning rates for each parameter based on a moving average of the squared gradients. It helps stabilize training.

#### Usage
In PyTorch, you can use the RMSprop as follows

```python
import torch.optim as optim

optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)
```

### Adam
Adam combines the benefits of both the AdaGrad and RMSprop optimizers. It adapts the learning rates for each parameter individually based on the history of gradients.

#### Usage
In PyTorch, you can use the Adam as follows

```python
import torch.optim as optim

optimizer = optim.Adam(model.parameters(), lr=learning_rate)

```

###  Creating Custom Optimizer



**Custom optimizers are implemented as Python classes that inherit from the base `torch.optim.Optimizer` class.**

**Here are the general steps to create a custom optimizer in PyTorch:**

* Define Your Custom Optimizer Class:
Create a Python class that inherits from `Optimizer`. This class will contain the optimization logic.

```python
class MyCustomOptimizer(Optimizer):
    def __init__(self, params, lr=0.01, **kwargs):
        super(MyCustomOptimizer, self).__init__(params, defaults)

    def step(self, closure=None):
        # Implement the optimization step here
        pass
```

`Constructor (__init__ Method):`
Inside your custom optimizer class, define the constructor `(__init__)` method to accept the following parameters:

    params: An iterable of parameters to optimize or dictionaries defining parameter groups.
    Other hyperparameters (e.g., learning rate and any custom hyperparameters) as keyword arguments.
    Call the superclass constructor to set up the optimizer's initial state.



`step Method`:
Implement the step method, which is responsible for performing a single optimization step. Inside this method, you should update the model's parameters based on the gradients computed during the forward and backward passes. You can access the parameter values, gradients, and other optimizer-specific parameters within this method.


`Optional Methods`:
Depending on your custom optimizer's requirements, you might also need to implement other methods like `__setstate__` for handling saved state when loading models. These methods are optional and depend on the specific needs of your custom optimizer.


### Example 
Note that this example uses the code from this [GitHub](https://github.com/rahulkidambi/AccSGD/blob/master/AccSGD.py) Repo and implements this [Paper](https://arxiv.org/pdf/1704.08227.pdf)


```python
class AccSGD(Optimizer):
    r"""Implements the algorithm proposed in https://arxiv.org/pdf/1704.08227.pdf, which is a provably accelerated method 
    for stochastic optimization. This has been employed in https://openreview.net/forum?id=rJTutzbA- for training several 
    deep learning models of practical interest. This code has been implemented by building on the construction of the SGD 
    optimization module found in pytorch codebase.
    Args:
        params (iterable): iterable of parameters to optimize or dicts defining
            parameter groups
        lr (float): learning rate (required)
        kappa (float, optional): ratio of long to short step (default: 1000)
        xi (float, optional): statistical advantage parameter (default: 10)
        smallConst (float, optional): any value <=1 (default: 0.7)
    Example:
        >>> from AccSGD import *
        >>> optimizer = AccSGD(model.parameters(), lr=0.1, kappa = 1000.0, xi = 10.0)
        >>> optimizer.zero_grad()
        >>> loss_fn(model(input), target).backward()
        >>> optimizer.step()
    """

    def __init__(self, params, lr=required, kappa = 1000.0, xi = 10.0, smallConst = 0.7, weight_decay=0):
        defaults = dict(lr=lr, kappa=kappa, xi=xi, smallConst=smallConst,
                        weight_decay=weight_decay)
        super(AccSGD, self).__init__(params, defaults)

    def __setstate__(self, state):
        super(AccSGD, self).__setstate__(state)

    def step(self, closure=None):
        """ Performs a single optimization step.
        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            weight_decay = group['weight_decay']
            large_lr = (group['lr']*group['kappa'])/(group['smallConst'])
            Alpha = 1.0 - ((group['smallConst']*group['smallConst']*group['xi'])/group['kappa'])
            Beta = 1.0 - Alpha
            zeta = group['smallConst']/(group['smallConst']+Beta)
            for p in group['params']:
                if p.grad is None:
                    continue
                d_p = p.grad.data
                if weight_decay != 0:
                    d_p.add_(weight_decay, p.data)
                param_state = self.state[p]
                if 'momentum_buffer' not in param_state:
                    param_state['momentum_buffer'] = copy.deepcopy(p.data)
                buf = param_state['momentum_buffer']
                buf.mul_((1.0/Beta)-1.0)
                buf.add_(-large_lr,d_p)
                buf.add_(p.data)
                buf.mul_(Beta)

                p.data.add_(-group['lr'],d_p)
                p.data.mul_(zeta)
                p.data.add_(1.0-zeta,buf)

        return loss
 ```

****

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>

<a id="4"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Datasets API</center></h1>

# Datasets API

Datasets API is a powerful tool for managing and preprocessing data in machine learning and deep learning projects. The Datasets API simplifies the process of loading and manipulating data, making it easier for researchers and practitioners to work with a variety of datasets.

## Why Use the Datasets API?

The management of data is a crucial aspect of any machine learning project. Here are some reasons why you should consider using PyTorch's Datasets API:

1. **Efficiency:** The Datasets API provides efficient data loading and batching capabilities, which can significantly speed up model training.

2. **Flexibility:** It allows you to work with both built-in datasets and custom datasets, giving you the freedom to handle various types of data.

3. **Data Augmentation:** You can easily apply data augmentation techniques to improve model generalization.

4. **Customization:** When working with custom datasets, you have full control over data preprocessing and transformation.

5. **Integration:** It integrates seamlessly with other PyTorch components like the DataLoader and neural network modules.



## 4.1 Loading Datasets <a id="4-1"></a>




### Built-in Datasets
All datasets are subclasses of `torch.utils.data.Dataset` i.e, they have `__getitem__` and `__len__` methods implemented. Hence, they can all be passed to a `torch.utils.data.DataLoader` For example:

```python
imagenet_data = torchvision.datasets.ImageNet('path/to/imagenet_root/')
data_loader = torch.utils.data.DataLoader(imagenet_data,
                                          batch_size=4,
                                          shuffle=True)
```

### Loading Dataset from Folder using ImageFolder


`ImageFolder` is a class provided by the PyTorch library, which is commonly used for handling image datasets. It's a part of the torchvision.datasets module, specifically designed for loading image data in a format that is suitable for training deep learning models.

**Here's how ImageFolder works:**

Dataset Organization:
The ImageFolder class assumes that your dataset is organized in a specific way on your file system. Each category or class of images should be stored in a separate subdirectory, and the images belonging to that category should be contained within that directory. For example:


```python
dataset/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── ...
├── class2/
│   ├── image1.jpg
│   ├── image2.jpg
│   ├── ...
├── ...
```

**Creating an ImageFolder Dataset:**

```python
from torchvision import datasets

# Define the root directory where your dataset is stored
data_dir = 'path/to/dataset/'

# Create an ImageFolder dataset
dataset = datasets.ImageFolder(data_dir, transform=my_transform)
```

### Data Loader

`DataLoader` is a utility class that is used to efficiently load and manage data for training deep learning models, especially when dealing with large datasets. It is an essential component when working with tasks like image classification, object detection, natural language processing, and more. The primary purpose of a DataLoader is to provide an iterable over a dataset, which can be easily integrated into the training loop of your machine learning model. Here's how a DataLoader works and why it's useful:


* **Dataset**: The foundation of a `DataLoader` is a PyTorch dataset. A dataset is an abstraction that represents your data and provides methods to access and manipulate it. PyTorch provides several built-in datasets, and you can also create custom datasets by subclassing the torch.utils.data.Dataset class. Datasets are usually responsible for loading and preprocessing the data samples.

* **Batching**: Training deep learning models is often done in batches of data samples rather than individual samples. Batching allows for efficient vectorized operations on a GPU, which speeds up training. The DataLoader takes care of dividing your dataset into batches of a specified size.

* **Shuffling**: It's common practice to shuffle the data before each epoch (a complete pass through the entire dataset). Shuffling helps in reducing any ordering bias that may exist in the data. The DataLoader can shuffle your dataset automatically if you set the shuffle parameter to True.

* **Parallel Data Loading**: If you have a multi-core CPU or multiple GPUs, you can take advantage of parallel data loading. The DataLoader can load multiple batches in parallel, which can significantly speed up the training process.

* **Iterability**: DataLoader provides an iterable object that you can use in a for loop during training. It automatically takes care of loading and batching the data, making it easy to integrate into your training loop.


**Example**


```python
# Step 3: Create a DataLoader with custom batch size and shuffling
batch_size = 64
shuffle = True

custom_dataset = CustomCIFAR10(transform=custom_transform)
data_loader = DataLoader(custom_dataset, batch_size=batch_size, shuffle=shuffle)
```


****

## 4.2 - Preprocessing Techniques <a id="4-2"></a>



Efficient data preprocessing and loading are essential for training robust machine learning models. In this section, we'll cover various techniques to preprocess your data effectively using PyTorch.


### Data Batching and Shuffling

During model training, it's common to process data in batches rather than individually. Batching not only improves computational efficiency but also provides a form of regularization. Additionally, shuffling the data within each epoch helps prevent the model from memorizing the order of the samples. PyTorch's `DataLoader` makes it easy to handle batching and shuffling.

#### Batching Data

To create data batches using `DataLoader`, you specify the`batch size` when initializing the data loader. Here's an example:

```python
from torch.utils.data import DataLoader

# Create a DataLoader with a batch size
batch_size = 32
data_loader = DataLoader(dataset, batch_size=batch_size)
```


#### Shuffling Data

To enable shuffling using `DataLoader`, you specify the `shuffle = True` when initializing the data loader. Here's an example:
    
```python
from torch.utils.data import DataLoader

# Create a DataLoader with a batch size
batch_size = 32
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
```
    


### PyTorch Transforms 
PyTorch provides a powerful set of transformation functions in the `torchvision.transforms` module that can be applied to your datasets. These transformations are particularly useful for preprocessing and augmenting image data.

#### Common Transforms

PyTorch transforms can perform a wide range of operations, including resizing, cropping, rotation, and normalization. Here are some common transforms you can use:

##### Resize

```python
import torchvision.transforms as transforms

# Resize images to a specific size
resize_transform = transforms.Resize((128, 128))

```

##### Random Horizontal Flip

```python
import torchvision.transforms as transforms

# Randomly flip images horizontally 
flip_transform = transforms.RandomHorizontalFlip()


```

##### ToTensor

```python
import torchvision.transforms as transforms

# Convert images to PyTorch tensors
tensor_transform = transforms.ToTensor()

```

##### Normalize

```python
import torchvision.transforms as transforms

# Normalize image values with mean and standard deviation
normalize_transform = transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))

```




### Combining Transforms
You can create a sequence of transforms and apply them to your data using the `transforms.Compose` function:

```python
import torchvision.transforms as transforms

# Define individual transformations
resize_transform = transforms.Resize((256, 256))  # Resize the image to 256x256 pixels
flip_transform = transforms.RandomHorizontalFlip()  # Randomly flip the image horizontally

# Combine multiple transforms into a single transform pipeline
transform = transforms.Compose([resize_transform, flip_transform])

```

### Data Augmentation

Data augmentation is a technique used to increase the diversity of the training dataset by applying random transformations such as rotations, flips, and crops. This helps the model generalize better and become more robust to variations in the data. Here's an example of data augmentation for CIFAR-10:

```python
import torchvision
import torchvision.transforms as transforms

# Define data augmentation and transformation
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # Randomly flip images horizontally
    transforms.RandomRotation(10),       # Randomly rotate images by up to 10 degrees
    transforms.RandomCrop(32, padding=4), # Randomly crop images with padding
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Apply transformations (including augmentation) to the dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)

```

### PyTorch's nn.utils.rnn for Sequential Data


When dealing with sequential data like text, time series, or audio, it's crucial to maintain the temporal structure of the data. PyTorch's `nn.utils.rnn` module provides tools to efficiently work with sequences, including padding and packing sequences for RNNs.




#### Sequence Padding

In many real-world scenarios, sequences may have varying lengths. This can be problematic when you want to batch sequences for deep learning models. PyTorch's `pad_sequence` function is a valuable tool for handling sequences of different lengths.

##### Example:

In [None]:
import torch
import torch.nn.utils.rnn as rnn_utils
sequences = [torch.tensor([1, 2, 3]), torch.tensor([4, 5]), torch.tensor([6, 7, 8, 9])]
padded_sequences = rnn_utils.pad_sequence(sequences, batch_first=True, padding_value=0)
print(sequences)
print()
print(padded_sequences)

****

## 4.3 - Creating Custom Datasets <a id="4-3"></a>


In some machine learning and deep learning projects, you may need to work with custom datasets that are not available as built-in datasets. PyTorch allows you to create custom datasets by subclassing the `torch.utils.data.Dataset` class. In this section, we'll explore how to create custom datasets tailored to your specific needs.

### Dataset Class

To create a custom dataset, you'll need to define a Python class that inherits from `torch.utils.data.Dataset`. This class should implement the following methods:

- `__init__(self)`: Initialize the dataset and load data if necessary.
- `__len__(self)`: Return the total number of samples in the dataset.
- `__getitem__(self, idx)`: Return a data sample and its corresponding label for the given index `idx`.

Example of how to create a custom dataset for classifying images of cats and dogs:

```python
import torch
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
import os

class CustomCatDogDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        self.data_dir = data_dir
        self.transform = transform
        self.image_paths = [os.path.join(data_dir, filename) for filename in os.listdir(data_dir)]
        self.labels = [0 if 'cat' in filename else 1 for filename in os.listdir(data_dir)]

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(image_path)

        if self.transform:
            image = self.transform(image)

        return image, label

# Usage example:
data_transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

custom_dataset = CustomCatDogDataset(data_dir='path/to/custom_dataset', transform=data_transform)
```

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>

<a id="5"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Training Deep Learning Models</center></h1>

# Training Deep Learning Models

## Helper Functions <a id="5-1"></a>


 

In [None]:
!pip -q install torchsummary

In [None]:
# Natural Language Processing (NLP) related imports
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

# Data manipulation and analysis
import numpy as np
import pandas as pd

# Regular expressions
import re

# Machine learning and deep learning frameworks
import torch
import torchvision.transforms as transforms
from torchsummary import summary
from torch import nn, optim
from torch.utils.data import DataLoader, Dataset

# Data preprocessing for machine learning
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer


### You can find more information about using GPUs in this [section](#6).

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
def load_data(X_train, y_train, X_val=None, y_val=None, dataset_class=None, batch_size=64):
    """
    Load and preprocess data into PyTorch data loaders for training and validation.

    Args:
        X_train (numpy.ndarray): Training data features.
        y_train (numpy.ndarray): Training data labels.
        X_val (numpy.ndarray, optional): Validation data features.
        y_val (numpy.ndarray, optional): Validation data labels.
        dataset_class (class, optional): Custom dataset class to create datasets.
        batch_size (int, optional): Batch size for data loaders. Default is 64.

    Returns:
        train_loader (DataLoader): Data loader for training data.
        val_loader (DataLoader or None): Data loader for validation data, or None if validation data is not provided.
    """

    train_data = dataset_class(data=X_train, labels=y_train)
    train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)

    val_loader = None

    if X_val is not None and y_val is not None and dataset_class is not None:
        val_data = dataset_class(data=X_val, labels=y_val)
        val_loader = DataLoader(val_data, batch_size=batch_size, shuffle=False)

    return train_loader, val_loader

In [None]:
def validate(model, val_loader, criterion, device='cpu'):
    """
    Validate a trained deep learning model on a validation dataset.

    Args:
        model (torch.nn.Module): The trained neural network model to validate.
        val_loader (torch.utils.data.DataLoader): DataLoader for the validation dataset.
        criterion (torch.nn.Module): The loss function used for validation.

    Returns:
        tuple: A tuple containing the validation loss and validation accuracy (in percentage).
    """
    model.eval()
    total_val_loss = 0.0
    correct_val_predictions = 0
    total_val_samples = 0

    with torch.no_grad():
        for data, labels in val_loader:
            val_inputs = data.to(device)
            val_labels = labels.to(device)

            val_outputs = model(val_inputs)
            val_loss = criterion(val_outputs, val_labels)

            total_val_loss += val_loss.item()

            # Calculate validation accuracy
            _, predicted_val = torch.max(val_outputs, 1)
            correct_val_predictions += (predicted_val == val_labels).sum().item()
            total_val_samples += val_labels.size(0)

    validation_accuracy = correct_val_predictions / total_val_samples
    return total_val_loss / len(val_loader), validation_accuracy * 100

In [None]:
def train(model, train_loader, optimizer, criterion,
          scheduler=None, val_loader=None, epochs=10, device='cpu'):
    """
    Train a deep learning model.

    Args:
        model (torch.nn.Module): The neural network model to train.
        train_loader (torch.utils.data.DataLoader): DataLoader for the training dataset.
        val_loader (torch.utils.data.DataLoader): DataLoader for the validation dataset.
        optimizer (torch.optim.Optimizer): The optimizer used for training.
        criterion (torch.nn.Module): The loss function used for training.
        epochs (int, optional): The number of training epochs. Default is 10.

    Returns:
        None
    """
    for epoch in range(epochs):
        # Training loop
        model.train()
        total_loss = 0.0
        correct_train_predictions = 0
        total_train_samples = 0

        for data, labels in train_loader:
            inputs = data.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

            # Forward pass
            outputs = model(inputs)
        
            # Compute the loss
            loss = criterion(outputs, labels)

            # Backpropagation and optimization
            loss.backward()
            optimizer.step()
            if scheduler:
                scheduler.step()

            total_loss += loss.item()

            # Calculate training accuracy
            _, predicted_train = torch.max(outputs, 1)
            correct_train_predictions += (predicted_train == labels).sum().item()
            total_train_samples += labels.size(0)

        training_accuracy = correct_train_predictions / total_train_samples

        print(f'Epoch [{epoch + 1}/{epochs}], Training Loss: {total_loss / len(train_loader):.3f}, Training Accuracy: {training_accuracy * 100:.3f}%')
        if val_loader is not None:
            # Validation step using the callback
            val_loss, val_accuracy = validate(model, val_loader, criterion,device=device)
            print(f'Epoch [{epoch + 1}/{epochs}], Validation Loss: {val_loss:.3f}, Validation Accuracy: {val_accuracy:.3f}%\n')

    print('Training finished')

## Training Linear Regression Model <a id="5-2"></a>

### Make sure to run these [helper functions](#5-1) before running the following code

In [None]:
# Generate some sample data
np.random.seed(42)
X = np.random.rand(100, 1)  # Input features (100 samples)
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)  # Corresponding labels with some noise

# Convert data to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32)

In [None]:
# Define the linear regression model
class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)  # One input feature, one output

    def forward(self, x):
        return self.linear(x)

# Instantiate the model
model = LinearRegression()

# Define the loss function (Mean Squared Error)
criterion = nn.MSELoss()

# Define the optimizer (Stochastic Gradient Descent)
optimizer = optim.SGD(model.parameters(), lr=0.01)

In [None]:
# Training loop
num_epochs = 5000
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_tensor)
    
    # Compute the loss
    loss = criterion(outputs, y_tensor)
    
    # Backpropagation and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 500 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

## Training a Neural Network on MNIST <a id="5-3"></a>

### Make sure to run these [helper functions](#5-1) before running the following code

In [None]:
df = pd.read_csv('/kaggle/input/digit-recognizer/train.csv') # read training dataframe

In [None]:
X = df.iloc[:,1:].values  # set training data
y = df.iloc[:,0].values # set training labels
X = X / 255.0

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
class CustomMNISTDataset(Dataset):
    def __init__(self, data, labels, transform=None, shuffle=True):
        self.data = data.astype(np.float32)
        self.labels = labels.astype(np.int64)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        label = self.labels[idx]

        if self.transform:
            data = self.transform(data)

        return data, label

In [None]:
train_loader, val_loader =  load_data(X_train, y_train, X_val, y_val, CustomMNISTDataset, batch_size=64)

In [None]:
class SimpleNN(nn.Module):
    def __init__(self, input_size, num_classes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

In [None]:
input_size = X_train.shape[1]
num_classes = 10
num_epochs = 50

model = SimpleNN(input_size, num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
shape = X_train.shape
# summary(model, input_size=shape)

In [None]:
train(model, train_loader, optimizer, criterion, 
          val_loader=val_loader, epochs=10, device=device)

## Training a Convolutional Neural Network <a id="5-4"></a>

### Make sure to run these [helper functions](#5-1) before running the following code

In [None]:
X_train = X_train.reshape(-1, 1,28, 28)
X_val = X_val.reshape(-1, 1, 28, 28)

In [None]:
cnn_train_loader, cnn_val_loader = load_data(X_train, y_train, X_val, y_val, CustomMNISTDataset, batch_size=64)

In [None]:
def calculate_max_pool_output_shape(input_height, input_width, pool_size=2):
    """
    Calculate the output shape after applying max pooling to an input image.

    Parameters:
        input_height (int): The height of the input image.
        input_width (int): The width of the input image.
        pool_size (int, optional): The size of the pooling window. Defaults to 2.

    Returns:
        tuple: A tuple containing the output height and width after max pooling.
    """
    output_height = int(input_height / pool_size)
    output_width = int(input_width / pool_size)
    return output_height, output_width


In [None]:
def find_conv2d_output_shape(height, width, conv):
    """
    Calculate the output shape of a 2D convolutional layer.

    Parameters:
        height (int): The height of the input feature map.
        width (int): The width of the input feature map.
        conv (nn.Conv2d): The convolutional layer for which to calculate the output shape.

    Returns:
        tuple: A tuple containing the output height and width after applying the convolutional layer.
    """
    # Get convolutional layer arguments
    kernel_size = conv.kernel_size
    stride = conv.stride
    padding = conv.padding
    dilation = conv.dilation

    # Calculate output height and width
    height = np.floor((height + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0] + 1)
    width = np.floor((width + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1] + 1)

    return int(height), int(width)


In [None]:
class ConvModel(nn.Module):
    def __init__(self, config):
        
        super(ConvModel, self).__init__()
        c,h,w = config["input_shape"]
        classes = config["classes"]
        
        # Define convolutional layers
        self.conv1 = nn.Conv2d(c, 32, kernel_size=3)
        self.relu1 = nn.ReLU()
        h,w = find_conv2d_output_shape(h, w, self.conv1)
 
        self.conv2 = nn.Conv2d(32, 32, kernel_size=3)
        self.relu2 = nn.ReLU()
        h,w = find_conv2d_output_shape(h, w, self.conv2)

        # Max-pooling layer
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        h,w = calculate_max_pool_output_shape(h, w, 2)
        # Flatten the output
        self.flatten = nn.Flatten()

        self.linear1 = nn.Linear(h*w*32, classes)  # Update the input size here

    def forward(self, x):
        # Define the forward pass
        x = self.conv1(x)
        x = self.relu1(x)
        
        x = self.conv2(x)
        x = self.relu2(x)
        
        x = self.maxpool(x)
        
        x = self.flatten(x)
        
        x = self.linear1(x)
        
        return x

In [None]:
config = {
    "input_shape": (1, 28, 28),
    "classes": 10 
}

In [None]:
cnn_model = ConvModel(config).to(device)
optimizer = optim.Adam(cnn_model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

In [None]:
summary(cnn_model, input_size=(1,28,28))

In [None]:
train(cnn_model, cnn_train_loader, optimizer, criterion, 
          val_loader=cnn_val_loader, epochs=10, device=device)

## Training an LSTM <a id="5-5"></a>


### Make sure to run these [helper functions](#5-1) before running the following code


In [None]:
df = pd.read_csv("/kaggle/input/ag-news-classification-dataset/train.csv")

In [None]:
# Load stopwords and initialize the stemmer
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()

# Function to preprocess a single text
def preprocess_single_text(text):
    # Lowercase the text
    text = text.lower()
    
    # Remove non-alphanumeric characters
    text = re.sub(r'[^\w\s]', '', text)
    
    # Tokenize the text
    tokens = word_tokenize(text)
    
    # Remove stopwords and apply stemming
    tokens = [stemmer.stem(token) for token in tokens if token not in stop_words]
    
    # Join the tokens back into a string
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

# Apply preprocessing to each row of 'Title' and 'Description' columns
df['preprocessed_text'] = df['Title'].str.lower() + ' ' + df['Description'].str.lower()
df['preprocessed_text'] = df['preprocessed_text'].apply(preprocess_single_text)

In [None]:
# Split the dataset into training and testing sets
X = df['preprocessed_text']
y = df['Class Index']
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
tokenizer = lambda x: x.split()  # You can use a more sophisticated tokenizer here
X_train_tokens = [tokenizer(text) for text in X_train]
X_val_tokens = [tokenizer(text) for text in X_val]

In [None]:
# Tokenize the text data and convert it into sequences
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_val_sequences = tokenizer.texts_to_sequences(X_val)

In [None]:
# Pad the sequences to ensure they have the same length
max_sequence_length = 100
X_train_padded = pad_sequences(X_train_sequences, maxlen=max_sequence_length)
X_val_padded = pad_sequences(X_val_sequences, maxlen=max_sequence_length)

In [None]:
y_train = y_train.values
y_val = y_val.values

y_train[y_train == 4] = 0
y_val[y_val == 4] = 0

In [None]:
class AGNewsDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data.astype(np.int64)
        self.labels = labels.astype(np.int64)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        label = self.labels[idx]


        return data, label

In [None]:
train_data = AGNewsDataset(data=X_train_padded, labels=y_train)
val_data = AGNewsDataset(data=X_val_padded, labels=y_val)

# Create data loaders for training and validation
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
val_loader = DataLoader(val_data, batch_size=64, shuffle=False)

In [None]:
class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embedding_dim, max_sequence_length, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)  # Change: Removed padding_idx
        self.lstm = nn.LSTM(embedding_dim, 128, batch_first=True)
        self.fc = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.embedding(x)
        x, _ = self.lstm(x)
        x = x[:, -1, :]
        x = self.fc(x)
        return x

In [None]:
# Create the PyTorch model
rnn_model = LSTMModel(
    vocab_size=len(tokenizer.word_index) + 1,
    embedding_dim=100,   
    max_sequence_length=max_sequence_length,
    num_classes=4
).to(device)

criterion = nn.CrossEntropyLoss()  
optimizer = optim.Adam(rnn_model.parameters())   


In [None]:
train(rnn_model, train_loader, optimizer, criterion, 
          val_loader=val_loader, epochs=10,device=device)

## Transfer Learning <a id="5-6"></a>


### Make sure to run these [helper functions](#5-1) before running the following code


Training deep learning models demands significant computational resources and consumes a substantial amount of time.

A practical strategy to mitigate these challenges involves employing transfer learning.

Transfer learning enables the transfer of knowledge from a pre-trained network initially designed for a similar application.

This approach provides a valuable head start in our specific application.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy


In [None]:
data_transforms = {
    'train': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=np.array([0.5, 0.5, 0.5]), 
                             std=np.array([0.25, 0.25, 0.25]))
    ]),
    'test': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=np.array([0.5, 0.5, 0.5]), 
                             std=np.array([0.25, 0.25, 0.25]))
    ]),
    
}

In [None]:
data_dir = '/kaggle/input/covid19-xray-dataset-train-test-sets/xray_dataset_covid19'

In [None]:
train_dataset = datasets.ImageFolder(os.path.join(data_dir, 'train'), data_transforms['train'])
test_dataset = datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms['test'])

In [None]:
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=8, shuffle=False)

In [None]:
model = models.vgg16(weights=True)

In [None]:
model

In [None]:
# Change the number of output features for the last layer of the model to 2.
model.classifier[6].out_features = 2

# Move the model to the specified device (e.g., GPU or CPU).
model = model.to(device)

# Define the loss function, which is Cross Entropy Loss, used for classification tasks.
criterion = nn.CrossEntropyLoss()

# Set up the optimizer for training.  
optimizer = optim.SGD(model.parameters(), lr=0.001)

In [None]:
train(model, train_loader, optimizer, criterion, scheduler=None,
          val_loader=test_loader, epochs=2, device=device)

****

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>

<a id="6"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Parallelism and Multi-GPU</center></h1>

# Parallelism and Multi-GPU

Graphics Processing Units (GPUs) are specialized hardware that can dramatically accelerate deep learning tasks. In PyTorch, leveraging GPU acceleration can significantly speed up model training and inference. This introduction will cover the basics of using GPUs in PyTorch.

**Why Use GPUs?**

**Speed**: GPUs are designed for parallel processing and excel at performing large matrix operations required for neural network training. They are much faster than CPUs for these tasks.

**Parallelism**: Deep learning models often involve millions of parameters. GPUs can handle the parallelism efficiently, training multiple examples or layers simultaneously.

**Memory**: GPUs come with their own dedicated memory, allowing for larger models and data to be processed efficiently.




## Utilizing GPUs  <a id="6-1"></a>


### GPU Setup  <a id="6-1-1"></a>

* Check Your GPU Compatibility: First, verify whether your GPU is compatible with CUDA by visiting the official NVIDIA CUDA-supported GPUs list, which you can find at this [URL](https://developer.nvidia.com/cuda-gpus).

* Download the Appropriate CUDA Version: Next, you'll need to download the CUDA version that is compatible with PyTorch. You can find the supported CUDA version for your specific PyTorch release by visiting the NVIDIA CUDA Toolkit archive at this [URL](https://developer.nvidia.com/cuda-toolkit-archive). Ensure that you select the version that matches your PyTorch version.

* Retrieve the Installation Command: Visit the [PyTorch Documentation](https://pytorch.org/get-started/locally/) and select the installation method suitable for your operating system and CUDA version. Look for the installation command or instructions provided on this page. Here, you will find the exact command or steps required to install PyTorch with GPU support on your system.

* Select the Appropriate Configuration: While on the PyTorch documentation page, carefully choose the installation configuration that matches your operating system, CUDA version, and preferred installation method. PyTorch offers various installation options, so be sure to pick the one that suits your requirements.

Here's an example of how I installed PyTorch with CUDA support:
![image.png](attachment:5fa14310-0828-40c9-bbe2-dbbf5c27fb45.png)

###  GPU Usage in PyTorch  <a id="6-1-2"></a>


1. Check for GPU Availability
You can check if a GPU is available on your system using:

```python
import torch

if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU available")
else:
    device = torch.device("cpu")
    print("No GPU available, using CPU")
```

2. Move Tensors to GPU
To perform operations on the GPU, you need to move your tensors and models to the GPU device:

```python
# Move a tensor to the GPU
tensor_on_gpu = tensor_on_cpu.to(device)

# Move a model to the GPU
model.to(device)
```

3. Perform GPU Operations

Once tensors and models are on the GPU, operations are automatically performed on the GPU:
```python
result = model(tensor_on_gpu)  # Operations are GPU-accelerated
```

GPU Training Loop
Here's how to modify your training loop to use GPU acceleration:


```python
import torch

# Define the device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move the model and data to the selected device
model.to(device)
data = data.to(device)

# Training loop
for epoch in range(num_epochs):
    for batch_data in dataloader:
        # Forward Pass
        outputs = model(batch_data)

        # Calculate Loss
        loss = loss_function(outputs, targets)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()

        # Optimizer Step
        optimizer.step()
        
```

****

## Multi-GPU Training <a id="6-2"></a>


Multi-GPU training, also known as parallel training, is a technique in deep learning where multiple Graphics Processing Units (GPUs) are used simultaneously to train a single neural network model. There are several reasons for using multi-GPU training:

* **Accelerated Training Speed**: One of the most prominent advantages of multi-GPU training is the significant reduction in training time. By distributing the computational workload across multiple GPUs, deep learning models can process data and update their parameters much faster than with a single GPU. This accelerated training speed is particularly beneficial for large-scale models and extensive datasets.

* **Handling Larger Datasets**: Multi-GPU setups are particularly useful when dealing with massive datasets that cannot fit into the memory of a single GPU. Each GPU can process a subset of the data, and their gradients can be aggregated to update the model.



## Steps for Multi-GPU Training

### Import the required DDP (Distributed Data Parallel):

```python

# torch.multiprocessing from PyTorch will be used for setting up multi-process training.
import torch.multiprocessing as mp

# DistributedSampler class is used to sample data in a distributed manner.
from torch.utils.data.distributed import DistributedSampler

# DistributedDataParallel is a wrapper for training models in a distributed manner.
from torch.nn.parallel import DistributedDataParallel as DDP

# init_process_group, destroy_process_group are functions for initializing and destroying the distributed training environment.
from torch.distributed import init_process_group, destroy_process_group
```

### Initialize disturbed process group

```python
def ddp_setup():
    init_process_group(backend="nccl")
    torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
    
```

**Where** 


`init_process_group(backend="nccl")`: Initializes the communication backend for distributed training. 

`torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))`: Sets the current CUDA device for the current process. 

The `os.environ["LOCAL_RANK"]` retrieves the value of the "LOCAL_RANK" environment variable. In a DDP setup, each process typically has a unique rank assigned to it. This rank is often used to determine which portion of the model the process should work on. By setting the CUDA device to the value of "LOCAL_RANK," the code ensures that each process uses a specific GPU device based on its rank.

### Add DistributedSampler to the DataLoader

```python

DataLoader(
        dataset,
        batch_size=batch_size,
        pin_memory=True,
        shuffle=False,
        sampler=DistributedSampler(dataset)
    
```

**Where**

**DistributedSampler** is used to distribute the dataset among multiple GPUs in a way that each GPU processes a different subset of the data. It helps ensure that each GPU gets a unique and non-overlapping portion of the dataset. This is essential for training neural networks across multiple GPUs without redundancy or data leakage.

### Wrap the model with DDP

```python
model = MyCustomModel()
model.to(gpu_id)
model =  DDP(model,  device_ids=[gpi_id])
```

**Where**

`model = MyCustomModel()`: This line initializes an instance of your custom neural network model. MyCustomModel() should be replaced with the actual class that defines your neural network architecture. This model is initially created on the CPU.

``model.to(gpu_id)``: This line moves the model to a specific GPU identified by gpu_id. The to() method is used to transfer the model's parameters and operations to the specified GPU. gpu_id should be an integer representing the GPU device you want to use.

`model = DDP(model, device_ids=[gpu_id])`: Here, you wrap the model with DDP, which stands for DistributedDataParallel. DDP is a wrapper provided by PyTorch for distributing a model across multiple GPUs and handling parallel processing. It takes two important arguments:

**model**: This is the model you want to parallelize. In this case, it's the model you created earlier and moved to a GPU.

**device_ids**=[gpu_id]: This argument specifies the list of GPU devices you want to use for parallelization. In your code, you provide device_ids with a list containing a single GPU ID (gpu_id). This means that the model will be parallelized across only one GPU.


### Example

**Note:**

I attempted to execute these code snippets in code cells. However, I encountered errors when running them in this environment. Nonetheless, I was able to successfully save them as .py files and run them without any issues

In [None]:
%%writefile model.py

import torch.nn as nn
import numpy as np

def calculate_max_pool_output_shape(input_height, input_width, pool_size=2):
    """
    Calculate the output shape after applying max pooling to an input image.

    Parameters:
        input_height (int): The height of the input image.
        input_width (int): The width of the input image.
        pool_size (int, optional): The size of the pooling window. Defaults to 2.

    Returns:
        tuple: A tuple containing the output height and width after max pooling.
    """
    output_height = int(input_height / pool_size)
    output_width = int(input_width / pool_size)
    return output_height, output_width

def find_conv2d_output_shape(height, width, conv):
    """
    Calculate the output shape of a 2D convolutional layer.

    Parameters:
        height (int): The height of the input feature map.
        width (int): The width of the input feature map.
        conv (nn.Conv2d): The convolutional layer for which to calculate the output shape.

    Returns:
        tuple: A tuple containing the output height and width after applying the convolutional layer.
    """
    # Get convolutional layer arguments
    kernel_size = conv.kernel_size
    stride = conv.stride
    padding = conv.padding
    dilation = conv.dilation

    # Calculate output height and width
    height = np.floor((height + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0] + 1)
    width = np.floor((width + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1] + 1)

    return int(height), int(width)

class ConvModel(nn.Module):
    def __init__(self, config):
        super(ConvModel, self).__init__()
        c, h, w = config["input_shape"]
        classes = config["classes"]

        # Define convolutional layers
        self.conv1 = nn.Conv2d(c, 32, kernel_size=3)
        self.relu1 = nn.ReLU()
        h, w = find_conv2d_output_shape(h, w, self.conv1)

        self.conv2 = nn.Conv2d(32, 32, kernel_size=3)
        self.relu2 = nn.ReLU()
        h, w = find_conv2d_output_shape(h, w, self.conv2)

        # Max-pooling layer
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        h, w = calculate_max_pool_output_shape(h, w, 2)
        # Flatten the output
        self.flatten = nn.Flatten()

        self.linear1 = nn.Linear(h * w * 32, classes)  # Update the input size here

    def forward(self, x):
        # Define the forward pass
        x = self.conv1(x)
        x = self.relu1(x)

        x = self.conv2(x)
        x = self.relu2(x)

        x = self.maxpool(x)

        x = self.flatten(x)

        x = self.linear1(x)

        return x


In [None]:
%%writefile data.py

from torch.utils.data import DataLoader, Dataset
import pandas as pd
import numpy as np


def preprocess_dataset(path='/kaggle/input/digit-recognizer/train.csv'):
    df = pd.read_csv(path)
    x = df.iloc[:, 1:].values  # set training data
    y = df.iloc[:, 0].values  # set training labels
    x = x / 255.0
    x = x.reshape(-1, 1, 28, 28)
    return x, y


class CustomMNISTDataset(Dataset):
    def __init__(self, data, labels, transform=None):
        self.data = data.astype(np.float32)
        self.labels = labels.astype(np.int64)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        label = self.labels[idx]

        if self.transform:
            data = self.transform(data)

        return data, label

In [None]:
%%writefile main.py

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from model import ConvModel
from data import preprocess_dataset, CustomMNISTDataset

from torch.utils.data.distributed import DistributedSampler
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed import init_process_group, destroy_process_group
import os


def ddp_setup():
    init_process_group(backend="nccl")
    torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))


def prepare_dataloader(dataset: Dataset, batch_size: int):
    return DataLoader(
        dataset,
        batch_size=batch_size,
        pin_memory=True,
        shuffle=False,
        sampler=DistributedSampler(dataset)
    )


def load_train_objs():
    config = {
        "input_shape": (1, 28, 28),
        "classes": 10
    }
    x, y = preprocess_dataset(path='/kaggle/input/digit-recognizer/train.csv')
    train_data = CustomMNISTDataset(data=x, labels=y)
    model = ConvModel(config)  # load your model
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

    return train_data, model, optimizer, criterion


class Trainer:
    def __init__(
            self,
            model: torch.nn.Module,
            train_data: DataLoader,
            optimizer: torch.optim.Optimizer,
            criterion: nn.Module,
            save_every: int,
            snapshot_path: str,
            load_snapshot:bool = False
    ) -> None:
        self.gpu_id = int(os.environ["LOCAL_RANK"])
        self.model = model.to(self.gpu_id)
        self.criterion = criterion
        self.train_data = train_data
        self.optimizer = optimizer
        self.save_every = save_every
        self.epochs_run = 0
        self.snapshot_path = snapshot_path
        if load_snapshot and os.path.exists(snapshot_path):
            print("Loading snapshot")
            self._load_snapshot(snapshot_path)
            self.model = DDP(self.model, device_ids=[self.gpu_id])

    def _load_snapshot(self, snapshot_path):
        loc = f"cuda:{self.gpu_id}"
        snapshot = torch.load(snapshot_path, map_location=loc)
        self.model.load_state_dict(snapshot["MODEL_STATE"])
        self.epochs_run = snapshot["EPOCHS_RUN"]
        print(f"Resuming training from snapshot at Epoch {self.epochs_run}")

    def _run_batch(self, source, targets):
        self.optimizer.zero_grad()
        output = self.model(source)
        loss = self.criterion(output, targets)
        loss.backward()
        self.optimizer.step()
        return loss

    def _run_epoch(self, epoch):
        self.train_data.sampler.set_epoch(epoch)
        total_loss = 0.0

        for source, targets in self.train_data:
            source = source.to(self.gpu_id)
            targets = targets.to(self.gpu_id)
            loss = self._run_batch(source, targets)
            total_loss += loss.item()

        avg_loss = total_loss / len(self.train_data)
        print(f"[GPU{self.gpu_id}] Epoch {epoch} | Loss: {avg_loss}")

    def _save_snapshot(self, epoch):
        snapshot = {
            "MODEL_STATE": self.model.state_dict(),
            
            "EPOCHS_RUN": epoch,
        }
        torch.save(snapshot, self.snapshot_path)
        print(f"Epoch {epoch} | Training snapshot saved at {self.snapshot_path}")

    def train(self, max_epochs: int):
        for epoch in range(self.epochs_run, max_epochs):
            self._run_epoch(epoch)
            if self.gpu_id == 0 and epoch % self.save_every == 0:
                self._save_snapshot(epoch)

def main(save_every: int, total_epochs: int, batch_size: int, snapshot_path="snapshot.pt",
         load_snapshot=False):
    ddp_setup()
    dataset, model, optimizer, criterion = load_train_objs()
    train_data = prepare_dataloader(dataset, batch_size)
    trainer = Trainer(model=model, train_data=train_data, optimizer=optimizer, criterion=criterion,
                      save_every=save_every, snapshot_path=snapshot_path, load_snapshot=load_snapshot)
    trainer.train(total_epochs)
    destroy_process_group()

if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description='simple distributed training job')
    parser.add_argument('total_epochs', type=int, help='Total epochs to train the model')
    parser.add_argument('save_every', type=int, help='How often to save a snapshot')
    parser.add_argument('--batch_size', default=32, type=int, help='Input batch size on each device (default: 32)')
    parser.add_argument('--load_snapshot', default=False, type=bool,
                        help='Whether to load snapshot from the last saved or not')
    args = parser.parse_args()

    main(save_every=args.save_every, total_epochs=args.total_epochs, batch_size=args.batch_size,
         load_snapshot=args.load_snapshot)

In [None]:
!torchrun  --standalone --nproc_per_node=gpu  main.py 50 10

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>


<a id="7"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Using TPU</center></h1>
    
# Using TPU

A TPU, or Tensor Processing Unit, is a specialized hardware accelerator developed by Google for deep learning and machine learning workloads. TPUs are designed to significantly accelerate the training and inference processes of neural networks, which are fundamental to many artificial intelligence (AI) and machine learning tasks. Here's a more detailed explanation of what TPUs are and why they are beneficial:



* **High Performance**: TPUs are known for their exceptional performance when it comes to training and running deep learning models. They can perform matrix multiplications and other tensor operations much faster than traditional CPUs or GPUs.

* **Efficiency**: TPUs are designed to be highly energy-efficient. This means that they can provide a significant boost in computational power while using less energy compared to traditional computing hardware. This efficiency is especially crucial for large-scale machine learning tasks and data centers.

* **Scalability**: TPUs are highly scalable and can be used in clusters or data center environments. This allows organizations to scale up their deep learning workloads as needed, making them suitable for training large models on vast datasets.

## XLA Library (Accelerated Linear Algebra):

XLA is a domain-specific compiler developed by Google, designed for optimizing and accelerating linear algebra operations commonly used in machine learning and deep learning.

It's specifically tailored for hardware accelerators, including TPUs, and is part of the TensorFlow ecosystem. PyTorch can also be used with XLA through the pytorch-xla package.

XLA compiles computational graphs into optimized kernels that can be executed efficiently on hardware accelerators. This makes it possible to harness the full power of TPUs for deep learning tasks.

XLA provides an abstraction layer that enables deep learning frameworks like TensorFlow and PyTorch to target a wide range of hardware accelerators, including TPUs, GPUs, and CPUs, without needing to write specialized low-level code for each.

## Using TPUs in Kaggle Notebooks


### Step 1: Choose TPU from Accelerators

The first step is to select a TPU as your accelerator in Kaggle. To do this, follow these steps:

1. Click on the "Accelerator" dropdown in the Kaggle notebook settings.
2. Select "TPU" as your accelerator type.
3. Save your settings.

Your Kaggle notebook is now configured to use a TPU for your machine learning tasks.

### Step 2: Import Necessary Modules

In your Python code cell, you need to import the necessary modules to work with TPUs. Here's the code to import these modules:

```python
import torch
import torch_xla
import torch_xla.distributed.data_parallel as dp
import torch_xla.distributed.parallel_loader as pl
import torch_xla.core.xla_model as xm

```

### Step 3: Adapt Your Data Loading for TPUs
To take full advantage of TPUs, you need to make some adjustments in your data loading code. Specifically, you should use a distributed data sampler and configure data loaders for efficient training. Here's the code to achieve this:

```python
train_data = dataset_class(data=X_train, labels=y_train)

# Create a distributed data sampler for training data
train_sampler = torch.utils.data.DistributedSampler(train_data,
                                                   num_replicas=xm.xrt_world_size(),
                                                   rank=xm.get_ordinal(),
                                                   shuffle=True)

# Create a data loader with the distributed sampler
train_loader = torch.utils.data.DataLoader(train_data,
                                           batch_size=batch_size,
                                           num_workers=4,
                                           sampler=train_sampler,
                                           drop_last=True)
```


**Explanation:**

- `num_replicas`: This parameter determines the total number of processing units or replicas participating in the training process.

- `rank`: The rank signifies the unique identifier assigned to the current processing unit within the distributed setup.

- `xm.xrt_world_size`: This function helps identify the total number of TPUs in use, providing crucial information for distributing the workload.

- `xm.get_ordinal`: This function is employed to pinpoint the position or rank of the current TPU within the distributed setup.



During the training process, it is essential to make efficient use of parallel loading to handle the data. 
The following code demonstrates how to accomplish this:

```python
# Initialize a Parallel Loader with Training Data and Device
para_loader = pl.ParallelLoader(train_data, [device])

# Utilize the Parallel Loader as the Data Loader for Training
train_loss, accuracy = train_step(para_loader.per_device_loader(device))
```

**Explanation**:

**Parallel Loader Initialization**:

First, we initialize a ParallelLoader named para_loader. This loader is a specialized tool used to manage and distribute the training data.
The training data (train_data) is provided to the ParallelLoader, which is responsible for efficiently handling the data distribution.
The [device] argument specifies the device where the data should be loaded. This step ensures that the data is appropriately located for processing.

**Training with Parallel Loader:**

After setting up the ParallelLoader, we utilize it as the data loader for our training process.
The train_step function is called, passing in the data loader obtained from para_loader using para_loader.per_device_loader(device).
This train_step function contains the logic for conducting a forward and backward pass, calculating the training loss, and computing accuracy.

In [None]:
!pip -q install torch-xla

## Example

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from sklearn.model_selection import train_test_split

try:
    import torch_xla
    import torch_xla.distributed.data_parallel as dp
    import torch_xla.distributed.parallel_loader as pl
    import torch_xla.core.xla_model as xm

except:
    print("Please activate TPU")

In [None]:
class CustomMNISTDataset(Dataset):
    def __init__(self, data, labels, transform=None):
        self.data = data.astype(np.float32)
        self.labels = labels.astype(np.int64)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        label = self.labels[idx]

        if self.transform:
            data = self.transform(data)

        return data, label
    

def preprocess_dataset(path='/kaggle/input/digit-recognizer/train.csv'):
    df = pd.read_csv(path)
    x = df.iloc[:, 1:].values  # set training data
    y = df.iloc[:, 0].values  # set training labels
    x = x / 255.0
    x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42)
    x_train = x_train.reshape(-1, 1, 28, 28)
    x_val = x_val.reshape(-1, 1, 28, 28)
    return x_train, x_val, y_train, y_val


def load_data(X_train, y_train, X_val=None, y_val=None, dataset_class=None, batch_size=64):
    """
    Load and preprocess data into PyTorch data loaders for training and validation.

    Args:
        X_train (numpy.ndarray): Training data features.
        y_train (numpy.ndarray): Training data labels.
        X_val (numpy.ndarray): Validation data features.
        y_val (numpy.ndarray): Validation data labels.
        dataset_class (class, optional): Custom dataset class to create datasets.
        batch_size (int, optional): Batch size for data loaders. Default is 64.

    Returns:
        train_loader (DataLoader): Data loader for training data.
        val_loader (DataLoader or None): Data loader for validation data, or None if validation data is not provided.
    """

    train_data = dataset_class(data=X_train, labels=y_train)
    
    train_sampler = torch.utils.data.DistributedSampler(train_data,
                                                       num_replicas=xm.xrt_world_size(),
                                                       rank= xm.get_ordinal(),
                                                       shuffle=True)
    train_loader = torch.utils.data.DataLoader(train_data,batch_size=batch_size,num_workers=4,
                                               sampler=train_sampler,drop_last=True)
    
    val_loader = None
    if X_val is not None and y_val is not None:

        val_data = dataset_class(data=X_val, labels=y_val)
        
        val_sampler = torch.utils.data.DistributedSampler(val_data,
                                       num_replicas=xm.xrt_world_size(),
                                       rank= xm.get_ordinal(),
                                       shuffle=False)
        
        val_loader = torch.utils.data.DataLoader(val_data,batch_size=batch_size,num_workers=4,
                                               sampler=val_sampler,drop_last=True)
    return train_loader, val_loader
    

In [None]:
def calculate_max_pool_output_shape(input_height, input_width, pool_size=2):
    """
    Calculate the output shape after applying max pooling to an input image.

    Parameters:
        input_height (int): The height of the input image.
        input_width (int): The width of the input image.
        pool_size (int, optional): The size of the pooling window. Defaults to 2.

    Returns:
        tuple: A tuple containing the output height and width after max pooling.
    """
    output_height = int(input_height / pool_size)
    output_width = int(input_width / pool_size)
    return output_height, output_width

def find_conv2d_output_shape(height, width, conv):
    """
    Calculate the output shape of a 2D convolutional layer.

    Parameters:
        height (int): The height of the input feature map.
        width (int): The width of the input feature map.
        conv (nn.Conv2d): The convolutional layer for which to calculate the output shape.

    Returns:
        tuple: A tuple containing the output height and width after applying the convolutional layer.
    """
    # Get convolutional layer arguments
    kernel_size = conv.kernel_size
    stride = conv.stride
    padding = conv.padding
    dilation = conv.dilation

    # Calculate output height and width
    height = np.floor((height + 2 * padding[0] - dilation[0] * (kernel_size[0] - 1) - 1) / stride[0] + 1)
    width = np.floor((width + 2 * padding[1] - dilation[1] * (kernel_size[1] - 1) - 1) / stride[1] + 1)

    return int(height), int(width)

class ConvModel(nn.Module):
    def __init__(self, config):
        super(ConvModel, self).__init__()
        c, h, w = config["input_shape"]
        classes = config["classes"]

        # Define convolutional layers
        self.conv1 = nn.Conv2d(c, 32, kernel_size=3)
        self.relu1 = nn.ReLU()
        h, w = find_conv2d_output_shape(h, w, self.conv1)

        self.conv2 = nn.Conv2d(32, 32, kernel_size=3)
        self.relu2 = nn.ReLU()
        h, w = find_conv2d_output_shape(h, w, self.conv2)

        # Max-pooling layer
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        h, w = calculate_max_pool_output_shape(h, w, 2)
        # Flatten the output
        self.flatten = nn.Flatten()

        self.linear1 = nn.Linear(h * w * 32, classes)  # Update the input size here

    def forward(self, x):
        # Define the forward pass
        x = self.conv1(x)
        x = self.relu1(x)

        x = self.conv2(x)
        x = self.relu2(x)

        x = self.maxpool(x)

        x = self.flatten(x)

        x = self.linear1(x)

        return x


In [None]:
class Trainer:
    """
    A class for training and validating a PyTorch model.

    Args:
        model (torch.nn.Module): The neural network model to be trained.
        train_data (DataLoader): Dataloader for the training dataset.
        val_data (DataLoader): Dataloader for the validation dataset.
        optimizer (torch.optim.Optimizer): The optimizer for updating model parameters.
        criterion (nn.Module): The loss function for training.
        device: The device (TPU) to perform computations on.

    Attributes:
        model (torch.nn.Module): The neural network model.
        criterion (nn.Module): The loss function.
        train_data (DataLoader): Dataloader for the training dataset.
        val_data (DataLoader): Dataloader for the validation dataset.
        optimizer (torch.optim.Optimizer): The optimizer.
        device: The device (CPU or GPU).

    Methods:
        forward_pass(data, label):
            Perform a forward pass through the model and compute loss and accuracy.

        train_step(train_loader):
            Perform a training step, including forward and backward passes.

        validation_step(val_loader):
            Evaluate the model on the validation dataset.

        train(epochs):
            Train the model for a specified number of epochs.
    """

    def __init__(
            self,
            model: torch.nn.Module,
            train_data: DataLoader,
            val_data: DataLoader,
            optimizer: torch.optim.Optimizer,
            criterion: nn.Module,
            device
    ) -> None:
        """
        Initialize the Trainer with the provided model and data.

        Args:
            model (torch.nn.Module): The neural network model to be trained.
            train_data (DataLoader): Dataloader for the training dataset.
            val_data (DataLoader): Dataloader for the validation dataset.
            optimizer (torch.optim.Optimizer): The optimizer for updating model parameters.
            criterion (nn.Module): The loss function for training.
            device: The device (CPU or GPU) to perform computations on.
        """
        self.model = model.to(device)
        self.criterion = criterion
        self.train_data = train_data
        self.val_data = val_data
        self.optimizer = optimizer
        self.device = device

    def forward_pass(self, data, label):
        """
        Perform a forward pass through the model and compute loss and accuracy.

        Args:
            data: Input data.
            label: Target labels.

        Returns:
            loss: The computed loss.
            correct_predictions: The number of correct predictions.
        """
        # Forward pass
        outputs = self.model(data)

        # Compute the loss
        loss = self.criterion(outputs, label)

        # Calculate accuracy
        _, predicted = torch.max(outputs, 1)
        correct_predictions = (predicted == label).sum().item()
        return loss, correct_predictions

    def train_step(self, train_loader):
        """
        Perform a training step, including forward and backward passes.

        Args:
            train_loader: Dataloader for the training dataset.

        Returns:
            average_loss: Average loss per batch.
            training_accuracy: Training accuracy.
        """
        self.model.train()
        total_loss = 0.0
        correct_train_predictions = 0
        total_train_samples = 0

        for data, label in train_loader:
            data = data.to(self.device)
            label = label.to(self.device)

            self.optimizer.zero_grad()

            loss, correct = self.forward_pass(data, label)

            # Backpropagation and optimization
            loss.backward()
            self.optimizer.step()

            total_loss += loss.item()
            correct_train_predictions += correct
            total_train_samples += label.size(0)

        # Avoid division by zero
        training_accuracy = correct_train_predictions / total_train_samples if total_train_samples > 0 else 0
        average_loss = total_loss / len(train_loader)  # Compute the average loss per batch
        return average_loss, training_accuracy

    def validation_step(self, val_loader):
        """
        Evaluate the model on the validation dataset.

        Args:
            val_loader: Dataloader for the validation dataset.

        Returns:
            val_loss: Average validation loss.
            validation_accuracy: Validation accuracy.
        """
        self.model.eval()
        total_val_loss = 0.0
        correct_val_predictions = 0
        total_val_samples = 0

        with torch.no_grad():
            for data, label in val_loader:
                data = data.to(self.device)
                label = label.to(self.device)

                loss, correct = self.forward_pass(data, label)

                total_val_loss += loss.item()
                correct_val_predictions += correct
                total_val_samples += label.size(0)

        validation_accuracy = correct_val_predictions / total_val_samples

        return total_val_loss / len(val_loader), validation_accuracy

    def train(self, epochs):
        """
        Train the model for a specified number of epochs.

        Args:
            epochs: Number of training epochs.
        """
        for epoch in range(epochs):
            para_loader = pl.ParallelLoader(self.train_data, [self.device])
            train_loss, accuracy = self.train_step(para_loader.per_device_loader(self.device))
            para_loader = pl.ParallelLoader(self.val_data, [self.device])
            
            val_loss, val_accuracy = self.validation_step(para_loader.per_device_loader(self.device))
            print(f'Epoch: {epoch+1} train_loss: {train_loss} Train Accuracy {accuracy}\n val_loss: {val_loss} Validation Accuracy: {val_accuracy}\n')


In [None]:
 X_train, X_val, y_train, y_val = preprocess_dataset(path='/kaggle/input/digit-recognizer/train.csv')

In [None]:
try:
    train_loader, val_loader = load_data(X_train, y_train, X_val, y_val, dataset_class=CustomMNISTDataset, batch_size=128)

except:
    print("Please activate TPU")
    

In [None]:
try:
    device = xm.xla_device()
except:
    print("Please activate TPU")
    device = 'cpu'

In [None]:
config = {
    "input_shape": (1, 28, 28),
    "classes": 10
}

In [None]:
model = ConvModel(config).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

In [None]:
trainer = Trainer(model, train_loader, val_loader, optimizer, criterion, device)

In [None]:
# trainer.train(10)

<div style="text-align: right;">
  <a href="#top" style="text-decoration: none; color: black;">
    <span style="color: #EE4C2C; font-size: 18px; font-weight: bold;">&uarr;</span> Return to Table of Contents
  </a>
</div>


<a id="8"></a>
<h1 style='background:#EE4C2C;border:0; color:#313131; /* Updated text color */
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #313131;'>Thank you</center></h1>
    
    
**Thank you so much for going through this notebook**

**If you have any feedback or suggestions please let me know**
  
