<a href="https://colab.research.google.com/github/mroshan454/Replicating-ViT-Research-Paper/blob/main/Modular_Functions_For_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modular Code Structure for ViT Replication

This notebook explains the modular Python Scripts used in My Vision Transformer (ViT) Replication Project. Each Script is written to handle a specific part of the machine learning pipeline in a clear , reusable way - just like in the production codebases.


## 1. 📦`data_setup.py` - Dataset and Dataloader Builder

**Purpose** - This script helps in loading the image data into PyTorch `Dataloaders's`. It is written modularly so that it can be reused with any image classification dataset , simply by passing in the paths.



In [None]:
%%writefile data_setup.py
"""
This file contains functionality for creating PyTorch DataLoader's for
image classification data
"""

import os
from torchvision import datasets,transforms
from torch.utils.data import dataloader

NUM_WORKERS = os.cpu_count()

def create_dataloaders(
    train_data:str,
    test_data:str,
    transform:transforms.Compose,
    batch_size:int,
    num_workers:int=NUM_WORKERS
    ):
    """Creates training and testing DataLoaders.
    Takes in a training directory and testing directory path and turns
    them into PyTorch Datasets and them into PyTorch Dataloader.

    Args:
    train_dir: Path to training directory.
    test_dir: Path to testing directory.
    transform: torchvision transforms to perform on training and testing data.
    batch_size: Number of samples per batch in each of the Dataloaders.
    num_workers: An integer for number of workers per DataLoader.

    Returns:
    A tuple of (train_dataloader,test_dataloader ,class_names).
    Where class_names is a list of the target classes.
    Example usage:
    train_dataloader, test_dataloader, class_names = create_dataloaders(train_dir=path/to/train_dir,
    test_dir = path/to/test_dir,
    transform=some_transform,
    batch_size=32,
    num_workers=4)
    """
    #Use ImageFolder to create dataset(s)
    train_data = datasets.ImageFolder(train_dir,transform=transform)
    test_data = datasets.ImageFolder(test_dir,transform=transform)
    #Get Class Names
    class_names = train_data.classes
    #Turn images into DataLoaders
    train_dataloader = DataLoader(train_data,
                                  batch_size=batch_size,
                                  shuffle=True,
                                  num_workers=num_workers,
                                  pin_memory=True)
    test_dataloader = Dataloader(test_data,
                                 batch_size=batch_size,
                                 shuffle=False,
                                 num_workers=num_workers,
                                 pin_memory=True)
    return train_dataloader , test_dataloader , class_names

## 2. ⚙️`engine.py` - Training and Evaluation Engine

This script contains all training and evaluation logic for the model. It include modular functions for:
* One-epoch training(`train_step`)
* One-epoch testing(`test_step`)
* Full training Pipeline over multiple epochs(`train`)