# Custom Dataset Building in PyTorch

## What You'll Learn
This notebook teaches you how to:
1. **Build Custom Datasets** for images with labels (Cats & Dogs)
2. **Handle Text Datasets** for image captioning (Flickr8k)
3. **Create DataLoaders** for efficient batch processing
4. **Build Vocabularies** for text processing

---

## Why Do We Need Custom Datasets?

**The Restaurant Analogy:**
- **PyTorch's Built-in Datasets** = Fast food menu (limited options: MNIST, CIFAR10)
- **Custom Datasets** = Your own restaurant menu (any dish you want!)

**Real Problem:**
Most real-world projects need custom data (medical images, company photos, specific text), not just standard datasets.

---

## Two Examples Covered

### Example 1: Cats & Dogs Classification
**Goal:** Given an image ‚Üí predict "cat" or "dog"

**Files needed:**
- `cats_dogs/` folder with images (`cat_001.jpg`, `dog_001.jpg`, etc.)
- `cats_dogs.csv` file with labels (filename, class)

### Example 2: Image Captioning
**Goal:** Given an image ‚Üí generate a text description

**Files needed:**
- `flickr8k_images/` folder with photos
- `captions.txt` file with image-caption pairs

Let's build both!



1. Top panel ‚Äî dataset assets

* A folder named **cats_dogs_resized** ‚Üí this is the directory that holds the prepared (resized) images.
* A file named **cats_dogs** (Excel workbook) ‚Üí a spreadsheet used for labels/metadata (e.g., filename ‚Üí class). The size shows ~394 KB.

<figure>
  <img src="asset/cat_dog_directory.png" alt="File Directory containing cat dog image and csv file with their identity" width="800">
</figure>

2. Middle panel ‚Äî the images themselves

* Inside **cats_dogs_resized**, Windows Explorer displays a grid of **dog photos** (filenames like `dog_3286.jpg`, `dog_3297.jpg`, ‚Ä¶ `dog_3312.jpg`).
* These are varied pictures (different breeds/poses/backgrounds) but all **uniformly resized**, suitable for a ML dataset.
<figure>
  <img src="asset/cat_dog_image_directory.png" alt="inside of folder" width="800">
</figure>


3. Bottom panel ‚Äî the label spreadsheet

* An Excel sheet with **filenames in Column A** (e.g., `dog_2511.jpg`, `dog_2512.jpg`, ‚Ä¶).
* **Column B contains numeric class labels** (shown as `1` for these rows). Given the filenames are ‚Äúdog_‚Ä¶‚Äù, this implies a mapping such as **1 = dog** (and likely **0 = cat** elsewhere in the sheet).
* Columns C/D are empty in the visible portion (reserved for other info if needed).

<figure>
  <img src="asset/cat_dog_csv_structure.png" alt="inside of the csv" width="400">
</figure>

In short: the image depicts a typical classification dataset setup‚Äî(1) a folder of resized images, (2) a preview of those images, and (3) a spreadsheet mapping each filename to a class label.

---

# üê± Part 1: Cats & Dogs Classification Dataset

## üìÇ Understanding the Data Structure

In [9]:
import torch
from torch.utils.data import DataLoader, Dataset
from skimage import io
import pandas as pd
import os

---

## Step 1: Import Required Libraries

```python
import torch                        # Main PyTorch library
from torch.utils.data import DataLoader, Dataset  # Data handling tools
from skimage import io             # Image reading
import pandas as pd                # CSV file handling
import os                          # File path operations
```

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

---

## Step 2: Set Device (CPU or GPU)

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

**Purpose:** Automatically use GPU if available (10-20x faster than CPU).

In [6]:
class CatsandDogsDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.annotations=pd.read_csv(csv_file)
        self.root_dir=root_dir
        self.transform=transform
        
    def __len__(self):
        return len(self.annotations)
    
    def __getitem__(self, index):
        img_path=os.path.join(self.root_dir, self.annotations.iloc[index,0]) #ith row, 0th column
        image=io.imread(img_path)
        y_label = torch.tensor(int(self.annotations.iloc[index,1])) #ith row, 1st column

        if self.transform:
            image = self.transform(image)

        return image, y_label


---

## Step 3: Build the Custom Dataset Class

### Requirements

Every PyTorch Dataset must implement:
1. **`__init__`**: Setup (read CSV, store paths)
2. **`__len__`**: Return total number of items
3. **`__getitem__`**: Get one item (image + label) by index

### How It Works

```python
class CatsandDogsDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        """
        Args:
            csv_file: Path to labels CSV (filename, class)
            root_dir: Folder containing images
            transform: Image transformations
        """
        self.annotations = pd.read_csv(csv_file)  # Load labels
        self.root_dir = root_dir                  # Image folder path
        self.transform = transform                # Transforms to apply
    
    def __len__(self):
        """Return total number of samples"""
        return len(self.annotations)
    
    def __getitem__(self, index):
        """Get one sample (image, label)"""
        # 1. Build image path
        img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
        
        # 2. Load image
        image = io.imread(img_path)
        
        # 3. Get label
        y_label = torch.tensor(int(self.annotations.iloc[index, 1]))
        
        # 4. Apply transforms
        if self.transform:
            image = self.transform(image)
        
        return image, y_label
```

### Example

**CSV file (`cats_dogs.csv`):**
```
filename,class
cat_001.jpg,0
dog_001.jpg,1
cat_002.jpg,0
```

**Usage:**
```python
dataset = CatsandDogsDataset('cats_dogs.csv', 'cats_dogs/')
image, label = dataset[0]  # Get first item
# image: numpy array or tensor
# label: tensor(0) for cat, tensor(1) for dog
```

In [None]:
import torchvision.transforms as transforms

dataset=CatsandDogsDataset(csv_file='cats_dogs.csv', root_dir='cats_dogs',transform=transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((128, 128)),
    transforms.ToTensor()
]))


---

## Step 4: Apply Transforms

### Why Transforms?

1. **Standardize input size**: Neural networks need fixed dimensions
2. **Normalize values**: Convert pixel values from [0, 255] ‚Üí [0.0, 1.0]
3. **Data augmentation**: Random crops, flips increase training data variety

### Transform Pipeline

```python
import torchvision.transforms as transforms

dataset = CatsandDogsDataset(
    csv_file='cats_dogs.csv',
    root_dir='cats_dogs',
    transform=transforms.Compose([
        transforms.ToPILImage(),      # numpy ‚Üí PIL Image
        transforms.Resize((128, 128)), # Resize to 128√ó128
        transforms.ToTensor()          # PIL ‚Üí tensor, [0-255] ‚Üí [0.0-1.0]
    ])
)
```

### Transformation Flow

**Input:** `numpy array (H, W, 3)` with values `[0-255]`

‚Üì `ToPILImage()`

**PIL Image** (required for Resize)

‚Üì `Resize((128, 128))`

**PIL Image (128, 128)**

‚Üì `ToTensor()`

**Output:** `torch.Tensor (3, 128, 128)` with values `[0.0, 1.0]`

**Note:** `ToTensor()` automatically:
- Changes shape from `(H, W, C)` ‚Üí `(C, H, W)`
- Normalizes values: `pixel / 255`

In [None]:
train_set, test_set = torch.utils.data.random_split(dataset, [800, 200])

train_loader = DataLoader(dataset=train_set, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=32, shuffle=True)

---

## Step 5: Split into Train/Test Sets

### Purpose

- **Training set**: Model learns from this (80%)
- **Test set**: Evaluate performance on unseen data (20%)

### Implementation

```python
train_set, test_set = torch.utils.data.random_split(dataset, [800, 200])

train_loader = DataLoader(dataset=train_set, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=32, shuffle=True)
```

### Why Batching?

**Without batching (batch_size=1):**
- Process 1 image ‚Üí update weights ‚Üí repeat 800 times
- Very slow!

**With batching (batch_size=32):**
- Process 32 images ‚Üí update weights ‚Üí repeat 25 times
- Much faster!

### Parameters

- **`batch_size=32`**: Number of samples per batch
- **`shuffle=True`**: Randomize order each epoch (prevents memorization)

---

# Part 2: Image Captioning Dataset

## What's Different?

**Cats & Dogs:**
- Input: Image ‚Üí Output: Single number (0 or 1)

**Image Captioning:**
- Input: Image ‚Üí Output: **Sentence** ("A dog playing in the park")

**New Challenges:**
1. Convert words to numbers (vocabulary)
2. Handle variable-length captions
3. Pad sequences to same length for batching

This image shows the structure of the Flickr8k Image Captioning dataset, where each photo (in /images) is linked to multiple natural-language descriptions (in /captions). The task is to build a model that can look at a new image and describe it in words.


###  Folder structure
<figure>
  <img src="asset/image_caption_directory.png" alt="File Directory containing  image and text file with their caption" width="800">
</figure>

* **`images`** ‚Üí A folder containing all the image files.
* **`captions`** ‚Üí A text file (`.txt`) containing all the image-caption pairs.
  This setup is part of the **Flickr8k dataset**, which is widely used for *image captioning* tasks in deep learning.


---

###  Folder content preview
<figure>
  <img src="asset/image_caption_image_directory.png" alt="inside of folder" width="800">
</figure>
* The path is `Desktop > customdata > flickr8k > images`.
* It shows thumbnails of several **JPEG images**, each named with an ID like `667626_18933d13e`, `10815824_2997e03d76`, etc.
* The images show various human and animal activities (children playing, dogs, people near water, etc.).
  These are the **input images** used for training and evaluation in the captioning model.



---

###  Captions file opened in Notepad
<figure>
  <img src="asset/image_caption_caption_file.png" alt="inside of the txt" width="800">
</figure>

* The text file has **two columns**:

  * **Column 1:** Image filename (e.g., `1000268201_693b08cb0e.jpg`)
  * **Column 2:** The **caption** (natural language description).
* Example lines:

  * `1000268201_693b08cb0e.jpg, A child in a pink dress is climbing up a set of stairs in an entry way.`
  * `1000268201_693b08cb0e.jpg, A girl going into a wooden building.`
  * `1000268201_693b08cb0e.jpg, A little girl climbing into a wooden playhouse.`
* Note that the same image appears multiple times, each with a **different caption** ‚Üí this is a common feature in Flickr8k: *five human-written captions per image*.


**In short:**
This image shows the structure of the **Flickr8k Image Captioning dataset**, where each photo (in `/images`) is linked to multiple natural-language descriptions (in `/captions`). The task is to build a model that can look at a new image and describe it in words.


| Component       | Description                               | Purpose                              |
| --------------- | ----------------------------------------- | ------------------------------------ |
| `images` folder | Set of photos (8,000 total)               | Visual input                         |
| `captions.txt`  | Image filename + 5 human captions         | Ground-truth text labels             |
| Combined usage  | Each image paired with multiple sentences | For training Image Captioning models |


```python
train_set, test_set = torch.utils.data.random_split(dataset, [800, 200])

train_loader = DataLoader(dataset=train_set, batch_size=32, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=32, shuffle=True)
```

### üìä What Happens Here

**1. `random_split(dataset, [800, 200])`**
- Randomly splits 1000 images into 800 and 200
- Returns two subset objects

**2. `DataLoader` for Training:**
- `batch_size=32`: Delivers 32 images at a time (not one by one)
- `shuffle=True`: Randomizes order each epoch (prevents memorization)

**3. `DataLoader` for Testing:**
- Same batch size
- Shuffle to avoid bias

### üéØ Why Batching?

**Without batching (batch_size=1):**
```python
for image, label in train_loader:  # One image at a time
    # image.shape = (1, 3, 128, 128)
    # Train on 1 image ‚Üí update weights ‚Üí repeat 800 times
    # Super slow! üê¢
```

**With batching (batch_size=32):**
```python
for images, labels in train_loader:  # 32 images at once
    # images.shape = (32, 3, 128, 128)
    # Train on 32 images ‚Üí update weights ‚Üí repeat 25 times
    # Much faster! ‚ö°
```

---

# üì∑ Part 2: Image Captioning Dataset

## ü§î What's Different Here?

**Cats & Dogs:**
- Input: Image ‚Üí Output: Single number (0 or 1)

**Image Captioning:**
- Input: Image ‚Üí Output: **Sentence** ("A dog playing in the park")

**New Challenge:** How do we handle text?
- Need to convert words to numbers (vocabulary)
- Need to handle variable-length captions
- Need to pad sequences to same length

Let's build it step by step!

In [None]:
import spacy
spacy_eng = spacy.load("en_core_web_sm")

class Vocabulary:
    def __init__(self, freq_threshold):
        self.freq_threshold = freq_threshold
        self.itos = {0: "<PAD>", 1: "<SOS>", 2: "<EOS>", 3: "<UNK>"} #iots: index to string
        self.stoi = {v: k for k, v in self.itos.items()} #stoi: string to index
        
    def __len__(self):
        return len(self.itos)
    
    @staticmethod
    def tokenizer(text):
        return [tok.text.lower() for tok in spacy_eng.tokenizer(text)]
    #        "I love dogs" -> ['i', 'love', 'dogs']
    
    def build_vocabulary(self, sentence_list):
        frequencies = {}
        idx = 4
        
        for sentence in sentence_list:
            for word in self.tokenizer(sentence):
                if word not in frequencies:
                    frequencies[word] = 1
                else:
                    frequencies[word] +=1
                
                if frequencies[word] == self.freq_threshold:
                    self.stoi[word] = idx
                    self.itos[idx] = word
                    idx +=1
                    
    def numericalize(self, text):
        tokenized_text = self.tokenizer(text)
        return [
            self.stoi.get(token, self.stoi["<UNK>"])
            for token in tokenized_text
        ]

In [3]:
import spacy
import torch
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset
from PIL import Image

---

## Step 6: Build a Vocabulary Class

### Why We Need This

Neural networks can't process text directly. They need numbers.

**Problem:** Convert "A dog running" ‚Üí `[4, 5, 6]`
**Solution:** Build a vocabulary that maps each word to a unique number.

### Key Features

**Special Tokens:**
- `<PAD>` = 0: Padding for variable-length sequences
- `<SOS>` = 1: Start of sequence
- `<EOS>` = 2: End of sequence
- `<UNK>` = 3: Unknown words (not in vocabulary)

**Frequency Threshold:**
- Only include words appearing ‚â• `freq_threshold` times
- Rare words ‚Üí `<UNK>` (prevents vocabulary explosion)

### Example

**Input sentences (freq_threshold=3):**
```python
sentences = [
    "A dog playing in park",
    "A cat sleeping on sofa",
    "A dog running in park",
    "A bird flying in sky",
    "A dog jumping in park"
]
```

**Word frequencies:**
```python
{'a': 5, 'dog': 3, 'in': 4, 'park': 3, 'playing': 1, 'cat': 1, ...}
```

**Vocabulary (only words with freq ‚â• 3):**
```python
vocab.stoi = {
    '<PAD>': 0, '<SOS>': 1, '<EOS>': 2, '<UNK>': 3,
    'a': 4, 'in': 5, 'dog': 6, 'park': 7
}
```

**Numericalize:**
```python
vocab.numericalize("A dog playing in park")
# Output: [4, 6, 3, 5, 7]
#         'a' 'dog' '<UNK>' 'in' 'park'
# Note: 'playing' ‚Üí '<UNK>' (frequency too low)
```

In [5]:
class FlickerDataset(Dataset):
    def __init__(self, root_dir, captions_file, transform=None, freq_threshold=5):
        self.root_dir=root_dir
        self.df=pd.read_csv(captions_file)
        self.transform=transform
        
        self.imgs=self.df['image']
        self.captions=self.df['caption']
        
        self.vocab=Vocabulary(freq_threshold)
        self.vocab.build_vocabulary(self.captions.tolist())
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, index):
        caption=self.captions[index]
        img_id=self.imgs[index]
        img_path=os.path.join(self.root_dir, img_id)
        image=Image.open(img_path).convert("RGB")
        
        if self.transform is not None:
            image=self.transform(image)
        
        numericalized_caption = [self.vocab.stoi["<SOS>"]]
        numericalized_caption += self.vocab.numericalize(caption)
        numericalized_caption.append(self.vocab.stoi["<EOS>"])
        
        return image, torch.tensor(numericalized_caption)
    

---

## Step 7: Import Additional Libraries

```python
import spacy                        # Text tokenization
import torch
from torch.nn.utils.rnn import pad_sequence  # Padding sequences
from torch.utils.data import DataLoader, Dataset
from PIL import Image              # Image handling
```

**New additions:**
- **`spacy`**: Natural language processing (tokenization)
- **`pad_sequence`**: Makes all captions same length for batching
- **`PIL`**: Better image handling than skimage

---

## Step 8: Build Flickr Dataset Class

### Dataset Structure

**Files needed:**
- `flickr8k_images/`: Folder with 8,000 images
- `captions.txt`: CSV with columns `[image, caption]`

**Each image has 5 different captions** (written by different people).

### What `__getitem__` Returns

```python
image, caption = dataset[0]

# image: Transformed tensor (3, 224, 224)
# caption: Numericalized caption with special tokens
#          [<SOS>, word1, word2, ..., <EOS>]
#          e.g., [1, 4, 45, 6, 89, 2]
```

### Example Flow

**CSV file:**
```
image,caption
dog_001.jpg,A brown dog running in the park
cat_002.jpg,A white cat sleeping on a sofa
```

**Getting item 0:**
1. `caption = "A brown dog running in the park"`
2. `img_path = "flickr8k_images/dog_001.jpg"`
3. Load image ‚Üí apply transforms ‚Üí tensor `(3, 224, 224)`
4. Numericalize caption:
   ```python
   [1] + [4, 45, 6, 89, 5, 12, 7] + [2]
   # <SOS> + words + <EOS>
   ```
5. Return `(image_tensor, caption_tensor)`

In [4]:
class MyCollate:
    def __init__(self, pad_idx):
        self.pad_idx=pad_idx
        
    def __call__(self, batch):
        images = [item[0].unsqueeze(0) for item in batch]
        images = torch.cat(images, dim=0)
        captions = [item[1] for item in batch]
        captions = pad_sequence(captions, batch_first=False, padding_value=self.pad_idx)
        
        return images, captions

In [6]:
def get_loader(
    root_folder,
    annotation_file,
    transform,
    batch_size=32,
    num_workers=2,
    shuffle=True,
    pin_memory=True,
):
    dataset = FlickerDataset(
        root_dir=root_folder,
        captions_file=annotation_file,
        transform=transform,
    )

    pad_idx = dataset.vocab.stoi["<PAD>"]

    loader = DataLoader(
        dataset=dataset,
        batch_size=batch_size,
        num_workers=num_workers,
        shuffle=shuffle,
        pin_memory=pin_memory,
        collate_fn=MyCollate(pad_idx=pad_idx),
    )

    return loader



---

## Step 9: Custom Collate Function (Padding)

### Why Do We Need This?

Captions have different lengths:
- Caption 1: `[<SOS>, "a", "dog", <EOS>]` ‚Üí 4 words
- Caption 2: `[<SOS>, "a", "cat", "on", "sofa", <EOS>]` ‚Üí 6 words

**Problem:** PyTorch requires all tensors in a batch to have the same shape.

**Solution:** Pad shorter captions with `<PAD>` token to match the longest caption.

### How Padding Works

**Batch of 3 samples:**
```python
batch = [
    (image1, tensor([1, 4, 6, 2])),          # Length 4
    (image2, tensor([1, 4, 5, 7, 8, 2])),    # Length 6 (longest)
    (image3, tensor([1, 4, 6, 5, 7, 2]))     # Length 6
]
```

**After `pad_sequence`:**
```python
captions = tensor([
    [1, 1, 1],      # <SOS> for all
    [4, 4, 4],      # 'a' for all
    [6, 5, 6],      # 'dog', 'cat', 'dog'
    [2, 7, 5],      # <EOS>, 'on', 'in'
    [0, 8, 7],      # <PAD>, 'sofa', 'park'
    [0, 2, 2]       # <PAD>, <EOS>, <EOS>
])
# Shape: (max_length, batch_size) = (6, 3)
```

### Implementation

The `MyCollate` class:
1. Takes a batch of `(image, caption)` pairs
2. Stacks images into a single tensor
3. Pads captions to the same length
4. Returns both as tensors

**Note:** `batch_first=False` because RNN/LSTM expects `(sequence_length, batch_size, embedding_dim)`

In [None]:
import torchvision.transforms as transforms
dataloader=get_loader(
    root_folder="flickr8k_images",
    annotation_file="captions.txt",
    transform=transforms.Compose(
        [
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
        ]
    ),
    batch_size=32,
)

---

## Step 10: Create DataLoader with Custom Collate

### What `get_loader` Does

This function combines everything we've built:
1. Creates a `FlickerDataset` instance
2. Gets the `<PAD>` token index from vocabulary
3. Creates a `DataLoader` with custom padding function

### Parameters Explained

```python
def get_loader(
    root_folder,        # Path to images folder
    annotation_file,    # Path to captions CSV
    transform,          # Image transformations
    batch_size=32,      # Number of samples per batch
    num_workers=2,      # CPU threads for parallel loading
    shuffle=True,       # Randomize order each epoch
    pin_memory=True,    # Faster CPU ‚Üí GPU transfer
):
```

**Key Parameters:**
- **`batch_size=32`**: Load 32 image-caption pairs at once (faster than one-by-one)
- **`num_workers=2`**: Use 2 CPU threads to load data in parallel while GPU trains
- **`shuffle=True`**: Randomize order to prevent model from memorizing sequence
- **`pin_memory=True`**: Allocate data in pinned memory (only useful with GPU)
- **`collate_fn=MyCollate(pad_idx)`**: Use our custom padding function for variable-length captions

### What It Returns

The function returns both:
1. **`loader`**: DataLoader for batching
2. **`dataset`**: Dataset instance (useful for accessing vocabulary)

```python
loader, dataset = get_loader(...)
vocab_size = len(dataset.vocab)  # Can access vocab from dataset
```

---

## Step 11: Using the DataLoader

### Basic Usage

```python
import torchvision.transforms as transforms

dataloader = get_loader(
    root_folder="flickr8k_images",
    annotation_file="captions.txt",
    transform=transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ]),
    batch_size=32,
)
```

### Looping Through Data

```python
for images, captions in dataloader:
    # images.shape = (32, 3, 224, 224)
    #   32 images, 3 channels (RGB), 224√ó224 pixels
    
    # captions.shape = (max_length, 32)
    #   max_length = longest caption in batch
    #   32 captions (one per image)
    
    # Feed to your model here
    outputs = model(images, captions)
```

### Note About Running

If you don't have the Flickr8k dataset downloaded, running this cell will show:
```
FileNotFoundError: [Errno 2] No such file or directory: 'flickr8k_images'
```

**This is expected!** The code structure is correct. To run it, you need the actual dataset.

---

## Summary

### What We Built

**Part 1: Cats & Dogs Classification**
1. Custom `CatsandDogsDataset` class
2. Image loading and transforms
3. Train/test split
4. DataLoader for batch processing

**Part 2: Image Captioning**
1. `Vocabulary` class for text processing
2. `FlickerDataset` with special tokens
3. Custom collate function for padding
4. Complete DataLoader pipeline

---

## Key Concepts

### Custom Dataset Requirements
Every PyTorch Dataset needs:
```python
class MyDataset(Dataset):
    def __init__(self):           # Setup
    def __len__(self):            # Return total count
    def __getitem__(self, idx):   # Return one item
```

### Text Processing Pipeline
1. **Tokenization**: Split text into words
2. **Vocabulary building**: Map words to numbers
3. **Special tokens**: `<SOS>`, `<EOS>`, `<PAD>`, `<UNK>`
4. **Numericalization**: Convert text to numbers

### Handling Variable-Length Sequences
- Use `pad_sequence` to make all sequences same length
- Custom collate function applies padding automatically
- Essential for batch processing in RNNs/LSTMs

### DataLoader Benefits
- **Batching**: Process multiple samples at once
- **Shuffling**: Randomize order for better training
- **Parallel loading**: Load data while GPU trains
- **Memory management**: Efficient data transfer

---

## What's Next?

Now you can:
1. Build CNN classifiers for custom image datasets
2. Create image captioning models (CNN + RNN)
3. Process text data for NLP tasks
4. Handle any custom dataset structure

**You're ready to train models on your own data!**