📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Dataset](#toc2_)    
  - [Load Iris Dataset](#toc2_1_)    
- [Torch Dataset](#toc3_)    
- [Torch DataLoader](#toc4_)    
  - [Strategies for updating weights](#toc4_1_)    
    - [Batch Gradient Descent](#toc4_1_1_)    
    - [Stochastic Gradient Descent](#toc4_1_2_)    
    - [Mini-Batch Gradient Descent](#toc4_1_3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.utils.data

# <a id='toc2_'></a>[Dataset](#toc0_)

$
X = \begin{bmatrix}
        x_{1}^1 & x_{1}^2 & \cdots & x_{1}^n \\
        x_{2}^1 & x_{2}^2 & \cdots & x_{2}^n \\
        \vdots & \vdots & \ddots & \vdots \\
        x_{m}^1 & x_{m}^2 & \cdots & x_{m}^n \\
    \end{bmatrix}_{m \times n} \quad \text{(m: number of samples, n: number of features)}
$

$
Y = \begin{bmatrix}
        y_{1} \\
        y_{2} \\
        \vdots \\
        y_{m} \\
    \end{bmatrix}_{m \times 1} \quad \text{(m: number of samples)}
$

## <a id='toc2_1_'></a>[Load Iris Dataset](#toc0_)

In [None]:
iris_dataset_url = (
    r"https://raw.githubusercontent.com/mr-pylin/datasets/refs/heads/main/data/tabular-data/iris/dataset.csv"
)

# pandas data-frame
df = pd.read_csv(iris_dataset_url, encoding="utf-8")

# log
df.head()

In [None]:
classes = df["class"].unique()
class_to_idx = {l: i for i, l in enumerate(classes)}

# split dataset into features and labels
X, y = df.iloc[:, :4].values, df.iloc[:, 4].values

# convert categorical labels into indices
y = np.array([class_to_idx[l] for l in y])

# properties of the dataset
num_samples, num_features = X.shape
classes, samples_per_class = np.unique(y, return_counts=True)

# log
print(f"X.shape: {X.shape}")
print(f"X.dtype: {X.dtype}")
print(f"y.shape: {y.shape}")
print(f"y.dtype: {y.dtype}")
print("-" * 50)
print(f"classes          : {classes}")
print(f"samples per class: {samples_per_class}")

In [None]:
# convert numpy.ndarray to torch.Tensor
X = torch.from_numpy(X.astype(np.float32))
y = torch.from_numpy(y.astype(np.float32)).view(-1, 1)

# log
print(f"x.shape: {X.shape}")
print(f"x.dtype: {X.dtype}")
print(f"x.ndim : {X.ndim}\n")
print(f"y.shape: {y.shape}")
print(f"y.dtype: {y.dtype}")
print(f"y.ndim : {y.ndim}")

# <a id='toc3_'></a>[Torch Dataset](#toc0_)
   - It's designed to store and manage tensor-based datasets.
   - DataLoaders in PyTorch specifically require a tensor-based Dataset.
   - `torch.utils.data.TensorDataset` is indeed a subclass of the `torch.utils.data.Dataset` class in PyTorch.
   - To build highly customizable and versatile datasets, refer to [**custom-classes.ipynb**](./custom-classes.ipynb) notebook.

In [None]:
# a torch dataset
dataset = torch.utils.data.TensorDataset(X, y)

# log
print(f"dataset.tensors[0].shape : {dataset.tensors[0].shape}")
print(f"dataset.tensors[1].shape : {dataset.tensors[1].shape}")
print("-" * 50)
print(f"first sample:")
print(f"    -> X: {dataset[0][0]}")
print(f"    -> y: {dataset[0][1]}")

# <a id='toc4_'></a>[Torch DataLoader](#toc0_)
   - A DataLoader(`torch.utils.data.DataLoader`) is a utility for training and evaluation in deep learning tasks that enables:
      - efficient loading datasets
      - handling batching
      - shuffling
      - parallel data loading

## <a id='toc4_1_'></a>[Strategies for updating weights](#toc0_)

### <a id='toc4_1_1_'></a>[Batch Gradient Descent](#toc0_)
   - Uses the **entire** dataset to compute the **gradient** of the loss function and **update** the **weights**.
   - **Pros**: Provides a stable convergence.
   - **Cons**: Can be very slow and computationally expensive for large datasets.

🌟 **Example**:
| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 150 $    | $ \lceil\frac{150}{150}\rceil = 1 $ | $ \lceil\frac{150}{150}\rceil = 1 $ |

In [None]:
epochs = 2
batch_size = dataset.tensors[0].shape[0]
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch+1:0{len(str(epochs))}}/{epochs}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print("-" * 50)

### <a id='toc4_1_2_'></a>[Stochastic Gradient Descent](#toc0_)
   - the model **updates** the **weights** using only **one data point** at a time.
   - **Pros**: Faster updates and can escape local minima.
   - **Cons**: Can be noisy and may not converge as smoothly as batch gradient descent.

🌟 **Example**:
| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 1 $      | $ \lceil\frac{150}{1}\rceil = 150 $ | $ \lceil\frac{150}{1}\rceil = 150 $ |

In [None]:
epochs = 2
batch_size = 1
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch+1:0{len(str(epochs))}}/{epochs}")
    for i, (x, y) in enumerate(dataloader):
        if i % 25 == 0 or i == len(X) - 1:
            print(f"    iteration {i}")
            print(f"        x.shape: {x.shape}")
            print(f"        y.shape: {y.shape}")
            print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print("-" * 50)

### <a id='toc4_1_3_'></a>[Mini-Batch Gradient Descent](#toc0_)
   - the model updates its weights after processing a small batch of 'm' samples from the training dataset.
   - this method combines the advantages of both SGD and Batch Gradient Descent by providing a balance between efficiency and stability during training.

🌟 **Example**:
| #Epoch | batch size | #batch                             | #iteration per epoch               |
|:------:|:----------:|:----------------------------------:|:----------------------------------:|
| $ 2 $  | $ 32 $     | $ \lceil\frac{150}{32}\rceil = 5 $ | #batch                             |

In [None]:
epochs = 2
batch_size = 32
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch+1:0{len(str(epochs))}}/{epochs}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print("-" * 50)