📝 **Author:** Amirhossein Heydari - 📧 **Email:** amirhosseinheydari78@gmail.com - 📍 **Linktree:** [linktr.ee/mr_pylin](https://linktr.ee/mr_pylin)

---

# Dependencies

In [1]:
import numpy as np
import torch
import torch.utils.data
from sklearn.datasets import load_iris

# Dataset

$
X = \begin{bmatrix}
        x_{1}^1 & x_{1}^2 & \cdots & x_{1}^n \\
        x_{2}^1 & x_{2}^2 & \cdots & x_{2}^n \\
        \vdots & \vdots & \ddots & \vdots \\
        x_{m}^1 & x_{m}^2 & \cdots & x_{m}^n \\
    \end{bmatrix}_{m \times n} \quad \text{(m: number of samples, n: number of features)}
$

$
Y = \begin{bmatrix}
        y_{1} \\
        y_{2} \\
        \vdots \\
        y_{m} \\
    \end{bmatrix}_{m \times 1} \quad \text{(m: number of samples)}
$

In [2]:
# load iris dataset
x, y = load_iris(return_X_y=True)

# properties of the dataset
num_samples, num_features = x.shape
classes, samples_per_class = np.unique(y, return_counts=True)

# log
for i in ['x', 'y']:
    print(f"{i}.shape: {eval(f'{i}.shape')}")
    print(f"{i}.dtype: {eval(f'{i}.dtype')}")
print('-' * 50)
print(f"classes          : {classes}")
print(f"samples per class: {samples_per_class}")

x.shape: (150, 4)
x.dtype: float64
y.shape: (150,)
y.dtype: int32
--------------------------------------------------
classes          : [0 1 2]
samples per class: [50 50 50]


In [3]:
# convert numpy.ndarray to torch.Tensor
x = torch.from_numpy(x.astype(np.float32))
y = torch.from_numpy(y.astype(np.float32)).view(-1, 1)

# log
print(f"x.shape: {x.shape}")
print(f"x.dtype: {x.dtype}")
print(f"x.ndim : {x.ndim}\n")
print(f"y.shape: {y.shape}")
print(f"y.dtype: {y.dtype}")
print(f"y.ndim : {y.ndim}")

x.shape: torch.Size([150, 4])
x.dtype: torch.float32
x.ndim : 2

y.shape: torch.Size([150, 1])
y.dtype: torch.float32
y.ndim : 2


# Torch Dataset
<!-- <ul> -->
<p style="font-family: consolas;">TensorDataset : <span style="color: tomato">torch.utils.data.TensorDataset</span> does inherit from <span style="color: cyan">torch.utils.data.Dataset</span></p>
<!-- </ul> -->

In [4]:
# a torch dataset
dataset = torch.utils.data.TensorDataset(x, y)

# log
print(f"dataset.tensors[0].shape : {dataset.tensors[0].shape}")
print(f"dataset.tensors[1].shape : {dataset.tensors[1].shape}")
print('-' * 50)
print(f"first sample:")
print(f"    -> x: {dataset[0][0]}")
print(f"    -> y: {dataset[0][1]}")

dataset.tensors[0].shape : torch.Size([150, 4])
dataset.tensors[1].shape : torch.Size([150, 1])
--------------------------------------------------
first sample:
    -> x: tensor([5.1000, 3.5000, 1.4000, 0.2000])
    -> y: tensor([0.])


# Torch DataLoader
<ul>
    <li>
        A DataLoader(<span style="font-family: consolas;color: tomato;">torch.utils.data.DataLoader</span>) is a utility that enables:
        <ul>
            <li>efficient loading datasets,</li>
            <li>handling batching,</li>
            <li>shuffling,</li>
            <li>parallel data loading</li>
        </ul>for training and evaluation in deep learning tasks.
    </li>
</ul>

## Strategies for updating weights

### 1. Stochastic Gradient Descent
   - the model updates its weights after processing each individual sample from the training dataset.
   - it is computationally efficient but can lead to noisy updates due to the variance in individual samples.

| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 1 $      | $ \lceil\frac{150}{1}\rceil = 150 $ | $ \lceil\frac{150}{1}\rceil = 150 $ |

In [None]:
epochs = 2
batch_size = 1
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.\n")
    print(f"model saw the entire dataset")
    print('-' * 50)

### 2. Batch Gradient Descent
   - the model updates its weights after processing the entire training dataset (all samples).
   - this method provides a more stable update direction, but it can be computationally expensive for large datasets.

| #Epoch | batch size | #batch per epoch                    | #iteration per epoch                |
|:------:|:----------:|:-----------------------------------:|:-----------------------------------:|
| $ 2 $  | $ 150 $    | $ \lceil\frac{150}{150}\rceil = 1 $ | $ \lceil\frac{150}{150}\rceil = 1 $ |

In [None]:
epochs = 2
batch_size = dataset.tensors[0].shape[0]
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.\n")
    print(f"model saw the entire dataset")
    print('-' * 50)

### 3. Mini-Batch Gradient Descent
   - the model updates its weights after processing a small batch of 'm' samples from the training dataset.
   - this method combines the advantages of both SGD and Batch Gradient Descent by providing a balance between efficiency and stability during training.

| #Epoch | batch size | #batch per epoch                 | #iteration per epoch              |
|:------:|:----------:|:--------------------------------:|:---------------------------------:|
| $ 2 $  | $ 4 $    | $ \lceil\frac{150}{4}\rceil = 38 $ | $ \lceil\frac{150}{4}\rceil = 38 $ |

In [None]:
epochs = 2
batch_size = 4
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=2)

# log
for epoch in range(epochs):
    print(f"epoch {epoch}")
    for i, (x, y) in enumerate(dataloader):
        print(f"    iteration {i}")
        print(f"        x.shape: {x.shape}")
        print(f"        y.shape: {y.shape}")
        print("    weights are updated.\n")
    print(f"model saw the entire dataset")
    print('-' * 50)