<div style="text-align: center; font-size: 32px; font-weight: bold;">
    Dataset, DataLoader Classes, Dataset Transforms - Batch Training
</div>

we see how we can use the built-in `Dataset` and `DataLoader` classes and improve our pipeline with batch training. 
- Dataset and DataLoader
- Automatic batch calculation
- Batch optimization in training loop

---

#### Conventional Data loading
So far our code is simple. 
- We loaded dataset from a csv file.
- Ee have training loop.
- We optimize the model based on whole dataset (Forward + Backward + weight Update). this can be time consuming if we do gradeint calcaulation on wholele data and sometimes inefficient to load full data at once. A better way for large datasets is to dovode the large samples in to small batches. 

```python
data = numpy.loadtxt('wine.csv')
# Training Loop
for epoch in range(100):
    X, y = data
    # Forward + Backward + weight Update
```
---
#### Batch Datasets
In this case the trainingn loop will be
```python
# Training Loop
for epoch in range(100):
    # Loop over all batches
    for i in range(total_batches):
        X_batch, y_batch = ...
        # We do optimization only for the current batch of data

    # --> Use Dataset and Dataloader to load `wine.csv`
```
We can use PyTorch Dataset and Dataloader classes. It will do tha batch calcualtins and iterations.

### Terminologies in batch training
- epoch = one forward and backward pass of ALL training samples
- batch_size = number of training samples used in one forward/backward pass
- number of iterations = number of passes, each pass (forward+backward) using [batch_size] number of sampes
- e.g : 100 samples, batch_size=20 -> 100/20=5 iterations for 1 epoch

### --> DataLoader can do the batch computation for us
```python
# Implement a custom Dataset:
# inherit Dataset
implement __init__ , __getitem__ , and __len__
```

In [1]:
import torch
import torchvision
from torch.utils.data import Dataset, DataLoader #  A base class for custom datasets in PyTorch.
import numpy as np
import math

# Dataset: We have wine dataset.  
# 1st row: Header
# We want to predict wine categories. There are three wine categories: 1, 2, 3 so it means there are three classes or categories
# The classes are in the first column and Features are in other columns

# So here we are creating a custom dataset class, where `__init__` will automatically load the data and perform some
# initial transformations as we describe. Than later we can use methods (functions) defined in the class to call other items example 
# `__getitem__` and `__len__`

# implement custom dataset.
class WineDataset(Dataset):
    def __init__(self):
        # Initialize data, download, etc.
        # Data Loading: # read with numpy or pandas
        xy = np.loadtxt('./data/wine.csv', delimeter=, ,dtype = np.float32, skiprows=1)
        
        # split whole dataset into x and y. Here the first column is the class label, the rest are the features
        self.x = torch.from_numpy(xy[:, 1]) # all the rows except first, whcih is header
        self.y = torch.from_numpy(xy[:, [0]]) # we put it in another array i.e, [0], n_samples, 1. So it makes task easy alter
        self.n_samples = xy.shape[0] # first dimension is number of samples
    
    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index): 
        # Method Use for indexing support: This method allows indexing (dataset[i]) to get individual samples. 
        # Returns a tuple: (features, label).
        return self.x[index], self.y[index] # this will return a tuple

    # we can call len(dataset) to return the size
    def __len__(self):
        # len(dataset)
        return self.n_samples

# Create an object dataset from wineDataset
dataset = WineDataset()

# Look the dataset: Get first sample and unpack
first_data = dataset[0]
# unpack this in features and labels
features, labels = first_data
print(features, labels )


SyntaxError: invalid syntax (773188495.py, line 18)

In [None]:
# Dataloader: Load whole dataset with DataLoader
# shuffle: shuffle data, good for training
# num_workers: faster loading with multiple subprocesses
# !!! IF YOU GET AN ERROR DURING LOADING, SET num_workers TO 0 !!!

train_loader  = DataLoader(dataset=dataset, batch_size=4, shuffle=True, num_workers=2)
# num_workers make loading faster as its using multiple process

# convert to an iterator and look at one random sample
dataiter = iter(train_loader )
data = dataiter.next()
features, labels = data
print(features, labels)

# Dummy Training loop
num_epochs = 2
total_samples = len(dataset)
n_iterations = math.ceil(total_samples/4)
print(total_samples, n_iterations)

for epoch in range(num_epochs):
    for i, (inputs, labels) in enumerate(train_loader): # enumerate function gives us the index         
        # here: 178 samples, batch_size = 4, n_iters=178/4=44.5 -> 45 iterations
        # Run your training process.  # Forward + backwar + update weights
        if (i+1) % 5 == 0:
            print(f'Epoch: {epoch+1}/{num_epochs}, Step {i+1}/{n_iterations}| Inputs {inputs.shape} | Labels {labels.shape}')
            # batch size is 4, 13 feateures

### Some other dataset in PyTorch

In [None]:
# some famous datasets are available in torchvision.datasets
# e.g. MNIST, Fashion-MNIST, CIFAR10, COCO

train_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True, 
                                           transform=torchvision.transforms.ToTensor(),  
                                           download=True)

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=3,
                          shuffle=True)

# look at one random sample
dataiter = iter(train_loader)
data = next(dataiter)
inputs, targets = data
print(inputs.shape, targets.shape)

<div style="text-align: center; font-size: 32px; font-weight: bold;">
    Dataset Transforms
</div>
How we can use dataset transforms together with the built-in Dataset class. Apply built-in transforms to images, arrays, and tensors. Or write your own custom Transform classes.

- Dataset Transforms
- Use built-in Transforms
- Implement custom Transforms


- Transforms can be applied to PIL images, tensors, ndarrays, or custom data during creation of the DataSet
    - complete list of built-in transforms: https://pytorch.org/docs/stable/torchvision/transforms.html

1. On Images: `CenterCrop`, `Grayscale`, `Pad`, `RandomAffine`, `RandomCrop`, `RandomHorizontalFlip`, `RandomRotation`, `Resize`, `Scale`

2. On Tensors: `LinearTransformation`, `Normalize`, `RandomErasing`

3. Conversion: `ToPILImage`: from tensor or ndrarray; `ToTensor` : from numpy.ndarray or PILImage

4. Generic: Use `Lambda `

5. Custom: Write own class

6. Compose multiple Transforms: `composed = transforms.Compose([Rescale(256), RandomCrop(224)])`


- Earlier we used buid in dataset and data loader. We canpass build in data transfrom to dataset than apply some trasnfroms. In below transfrom we convert images to tensor. We can see different pytorch transforms at pytorch website https://pytorch.org/docs/stable/torchvision/transforms.html

In [None]:
import torch
import torchvision

# Transform MNIST data to tensor
dataset = torhcvision.datasets.MNIST(
    root='./data', transform=torchvision.transforms.ToTensor())

Earlier we inplmeneted custom `WineDataset`. Now let's extend this class to support transform  and write our own transform classes. This code we implemented earlier, where we implemented ` __getitem__` and `__len__` method whcih allow indexing and length.
```python
# implement custom WineDataset.
class WineDataset(Dataset):
    def __init__(self):
        xy = np.loadtxt('./data/wine.csv', delimeter=, ,dtype = np.float32, skiprows=1)
        self.n_samples = xy.shape[0]

        # note that we donot convert to tensor
        self.x = torch.from_numpy(xy[:, 1]) 
        self.y = torch.from_numpy(xy[:, [0]]) 
    
    def __getitem__(self, index): 
        return self.x[index], self.y[index] 

    def __len__(self):
        return self.n_samples

dataset= WineDataset()
```
Let's extend this dataset class to support transform arguments. We put that in `__init__(self, transform=None):`

In [None]:
# Above we implemented custom wine dataset. Lets extend this with adding transforms
import torch
import torchvision
from torch.utils.data import Dataset
import numpy as np

class WineDataset(Dataset):

    def __init__(self, transform=None): # Added  transform=None. transform is optional
        xy = np.loadtxt('./data/wine/wine.csv', delimiter=',', dtype=np.float32, skiprows=1)
        self.n_samples = xy.shape[0]

        # note that we do not convert to tensor here
        self.x_data = xy[:, 1:]
        self.y_data = xy[:, [0]]

        self.transform = transform # for transform

    def __getitem__(self, index):
        sample = self.x_data[index], self.y_data[index]

        if self.transform: # if transform is not None
            sample = self.transform(sample)

        return sample

    def __len__(self):
        return self.n_samples

# Write our own transform and apply to our dataset
# Lets create Custom Transform class. In last class we converted to tensor but now lets leave it to numpy array and pass to 
# dataset to convert to tensor. The only thing we need to change is to use implement __call__(self, sample)
class ToTensor:
    # Convert ndarrays to Tensors
    def __call__(self, sample):  # callable object
        inputs, targets = sample # unpack our samples
        return torch.from_numpy(inputs), torch.from_numpy(targets)

print('Without Transform')
dataset = WineDataset()
first_data = dataset[0]
features, labels = first_data
print(type(features), type(labels))
print(features, labels)

print('\nWith Tensor Transform')
dataset = WineDataset(transform=ToTensor())
first_data = dataset[0]
features, labels = first_data
print(type(features), type(labels))
print(features, labels)

print('\nWith None Transform')
dataset = WineDataset(transform=None)
first_data = dataset[0]
features, labels = first_data
print(type(features), type(labels))
print(features, labels)

# Write another custom transform to perform multiplicaiton
class MulTransform:  # multiply inputs with a given factor
    def __init__(self, factor):
        self.factor = factor

    def __call__(self, sample):
        inputs, targets = sample
        inputs *= self.factor
        return inputs, targets

        
print('\nWith Tensor and Multiplication Transform')
composed = torchvision.transforms.Compose([ToTensor(), MulTransform(4)])
dataset = WineDataset(transform=composed)
first_data = dataset[0]
features, labels = first_data
print(type(features), type(labels))
print(features, labels)