📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

# Dependencies

In [1]:
import numpy as np
import pandas as pd
import torch
import torch.utils.data

# Dataset

$
X = \begin{bmatrix}
        x_{1}^1 & x_{1}^2 & \cdots & x_{1}^n \\
        x_{2}^1 & x_{2}^2 & \cdots & x_{2}^n \\
        \vdots & \vdots & \ddots & \vdots \\
        x_{m}^1 & x_{m}^2 & \cdots & x_{m}^n \\
    \end{bmatrix}_{m \times n} \quad \text{(m: number of samples, n: number of features)}
$

$
Y = \begin{bmatrix}
        y_{1} \\
        y_{2} \\
        \vdots \\
        y_{m} \\
    \end{bmatrix}_{m \times 1} \quad \text{(m: number of samples)}
$

## Load Iris Dataset

In [2]:
iris_dataset_url = r"https://raw.githubusercontent.com/mr-pylin/datasets/refs/heads/main/data/tabular-data/iris/dataset.csv"

# pandas data-frame
df = pd.read_csv(iris_dataset_url, encoding='utf-8')

# log
df.head()

Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [3]:
classes = df['class'].unique()
class_to_idx = {l: i for i, l in enumerate(classes)}

# split dataset into features and labels
X, y = df.iloc[:, :4].values, df.iloc[:, 4].values

# convert categorical labels into indices
y = np.array([class_to_idx[l] for l in y])

# properties of the dataset
num_samples, num_features = X.shape
classes, samples_per_class = np.unique(y, return_counts=True)

# log
print(f"X.shape: {X.shape}")
print(f"X.dtype: {X.dtype}")
print(f"y.shape: {y.shape}")
print(f"y.dtype: {y.dtype}")
print('-' * 50)
print(f"classes          : {classes}")
print(f"samples per class: {samples_per_class}")

X.shape: (150, 4)
X.dtype: float64
y.shape: (150,)
y.dtype: int32
--------------------------------------------------
classes          : [0 1 2]
samples per class: [50 50 50]


In [4]:
# convert numpy.ndarray to torch.Tensor
X = torch.from_numpy(X.astype(np.float32))
y = torch.from_numpy(y.astype(np.float32)).view(-1, 1)

# log
print(f"x.shape: {X.shape}")
print(f"x.dtype: {X.dtype}")
print(f"x.ndim : {X.ndim}\n")
print(f"y.shape: {y.shape}")
print(f"y.dtype: {y.dtype}")
print(f"y.ndim : {y.ndim}")

x.shape: torch.Size([150, 4])
x.dtype: torch.float32
x.ndim : 2

y.shape: torch.Size([150, 1])
y.dtype: torch.float32
y.ndim : 2


# Torch Dataset
   - It's designed to store and manage tensor-based datasets.
   - DataLoaders in PyTorch specifically require a tensor-based Dataset.
   - `torch.utils.data.TensorDataset` is indeed a subclass of the `torch.utils.data.Dataset` class in PyTorch.
   - To build highly customizable and versatile datasets, refer to [**customs.ipynb**](./customs.ipynb) notebook.

In [5]:
# a torch dataset
dataset = torch.utils.data.TensorDataset(X, y)

# log
print(f"dataset.tensors[0].shape : {dataset.tensors[0].shape}")
print(f"dataset.tensors[1].shape : {dataset.tensors[1].shape}")
print('-' * 50)
print(f"first sample:")
print(f"    -> X: {dataset[0][0]}")
print(f"    -> y: {dataset[0][1]}")

dataset.tensors[0].shape : torch.Size([150, 4])
dataset.tensors[1].shape : torch.Size([150, 1])
--------------------------------------------------
first sample:
    -> X: tensor([5.1000, 3.5000, 1.4000, 0.2000])
    -> y: tensor([0.])


# Torch DataLoader
   - A DataLoader(`torch.utils.data.DataLoader`) is a utility for training and evaluation in deep learning tasks that enables:
      - efficient loading datasets
      - handling batching
      - shuffling
      - parallel data loading

## Strategies for updating weights

### 1. Batch Gradient Descent
   - Uses the **entire** dataset to compute the **gradient** of the loss function and **update** the **weights**.
   - **Pros**: Provides a stable convergence.
   - **Cons**: Can be very slow and computationally expensive for large datasets.

🌟 **Example**:
| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 150 $    | $ \lceil\frac{150}{150}\rceil = 1 $ | $ \lceil\frac{150}{150}\rceil = 1 $ |

In [10]:
epochs = 2
batch_size = dataset.tensors[0].shape[0]
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print('-' * 50)

epoch 0
    iteration 0
        x.shape: torch.Size([150, 4])
        y.shape: torch.Size([150, 1])
    weights are updated.
model saw the entire dataset.
--------------------------------------------------
epoch 1
    iteration 0
        x.shape: torch.Size([150, 4])
        y.shape: torch.Size([150, 1])
    weights are updated.
model saw the entire dataset.
--------------------------------------------------


### 2. Stochastic Gradient Descent
   - the model **updates** the **weights** using only **one data point** at a time.
   - **Pros**: Faster updates and can escape local minima.
   - **Cons**: Can be noisy and may not converge as smoothly as batch gradient descent.

🌟 **Example**:
| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 1 $      | $ \lceil\frac{150}{1}\rceil = 150 $ | $ \lceil\frac{150}{1}\rceil = 150 $ |

In [11]:
epochs = 2
batch_size = 1
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        if i % 25 == 0 or i == len(X) - 1:
            print(f"    iteration {i}")
            print(f"        x.shape: {x.shape}")
            print(f"        y.shape: {y.shape}")
            print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print('-' * 50)

epoch 0
    iteration 0
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 25
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 50
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 75
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 100
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 125
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
    iteration 149
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are updated.
model saw the entire dataset.
--------------------------------------------------
epoch 1
    iteration 0
        x.shape: torch.Size([1, 4])
        y.shape: torch.Size([1, 1])
    weights are

### 3. Mini-Batch Gradient Descent
   - the model updates its weights after processing a small batch of 'm' samples from the training dataset.
   - this method combines the advantages of both SGD and Batch Gradient Descent by providing a balance between efficiency and stability during training.

🌟 **Example**:
| #Epoch | batch size | #batch                             | #iteration per epoch               |
|:------:|:----------:|:----------------------------------:|:----------------------------------:|
| $ 2 $  | $ 32 $     | $ \lceil\frac{150}{32}\rceil = 5 $ | #batch                             |

In [12]:
epochs = 2
batch_size = 32
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.")
    print(f"model saw the entire dataset.")
    print('-' * 50)

epoch 0
    iteration 0
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 1
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 2
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 3
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 4
        x.shape: torch.Size([22, 4])
        y.shape: torch.Size([22, 1])
    weights are updated.
model saw the entire dataset.
--------------------------------------------------
epoch 1
    iteration 0
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 1
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weights are updated.
    iteration 2
        x.shape: torch.Size([32, 4])
        y.shape: torch.Size([32, 1])
    weig