# Lab 2: ImageFolder & DataLoaders

In this lab, we will learn how to transform images and load them efficiently using PyTorch.

## Learning Objectives

By the end of this lab, you will be able to:
- Create image transformation pipelines using `transforms.Compose`
- Understand what each transform does (Resize, Flip, ToTensor)
- Load data using `ImageFolder` dataset
- Create efficient DataLoaders for batched training

## 0. Setup

### Import Required Libraries

We'll import essential PyTorch libraries for data loading and transformations:
- `torch` - Core PyTorch library
- `DataLoader` - For creating batched, iterable datasets
- `datasets` - Pre-built dataset classes (like ImageFolder)
- `transforms` - Image transformation utilities

In [2]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from pathlib import Path
import matplotlib.pyplot as plt

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')

### Working with Vision Data

Since we're working with a vision problem, we'll use:
- **`torchvision.datasets`** - Contains data loading functions for images
- **`torchvision.transforms`** - Provides tools for preprocessing and augmenting image data

These modules are part of the `torchvision` library, PyTorch's computer vision toolkit.

## 1. Setup Data Paths

In [3]:
import requests
import zipfile

data_path = Path('data/')
image_path = data_path / 'pizza_steak_sushi'

if image_path.is_dir():
    print(f'{image_path} directory exists.')
else:
    print(f'Creating {image_path} directory...')
    image_path.mkdir(parents=True, exist_ok=True)
    with open(data_path / 'pizza_steak_sushi.zip', 'wb') as f:
        request = requests.get('https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip')
        print('Downloading data...')
        f.write(request.content)
    with zipfile.ZipFile(data_path / 'pizza_steak_sushi.zip', 'r') as zip_ref:
        print('Unzipping data...')
        zip_ref.extractall(image_path)
    print('Done!')

train_dir = image_path / 'train'
test_dir = image_path / 'test'

print(f'Train dir: {train_dir}')
print(f'Test dir: {test_dir}')

## 2. Transforming Data

Now what if we wanted to load our image data into PyTorch?

### Requirements for Using Images in PyTorch

Before we can use our image data with PyTorch, we need to:

1. **Convert to tensors** - Turn images into numerical representations
2. **Create a Dataset** - Use `torch.utils.data.Dataset`
3. **Create a DataLoader** - Use `torch.utils.data.DataLoader` for batching

### Transforming Data with torchvision.transforms

We've got folders of images, but before we can use them with PyTorch, we need to convert them into tensors.

One of the ways we can do this is by using the **`torchvision.transforms`** module.

`torchvision.transforms` contains many pre-built methods for:
- Formatting images
- Converting them into tensors
- Data augmentation (artificially increasing dataset variety)

### Why Transform Images?

Raw images can't be directly fed to neural networks. We need a series of transform steps:

1. **Resize** - Make all images the same size (neural networks expect fixed input dimensions)
2. **Augment** - Artificially increase dataset variety (helps prevent overfitting)
3. **ToTensor** - Convert PIL Image to PyTorch tensor (normalize pixel values from 0-255 to 0-1)

Let's create a transformation pipeline:

In [4]:
data_transform = transforms.Compose([
    transforms.Resize(size=(64, 64)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])

print('Transform pipeline created!')

### Transform Pipeline

`transforms.Compose()` chains multiple transforms together:

| Transform | What it Does |
|-----------|--------------|
| `Resize((64, 64))` | Resizes image to 64x64 pixels |
| `RandomHorizontalFlip(p=0.5)` | 50% chance to flip image horizontally |
| `ToTensor()` | Converts PIL Image → Tensor, scales values to [0, 1] |

### Visualize Transform Effect

Let's see how transforms modify our images. Notice:
- **Original**: Variable sizes, PIL Image format
- **Transformed**: Fixed 64x64 size, PyTorch tensor format [C, H, W]

In [5]:
from PIL import Image
import random

def plot_transformed_images(image_paths, transform, n=3, seed=42):
    random.seed(seed)
    random_image_paths = random.sample(image_paths, k=n)
    for image_path in random_image_paths:
        with Image.open(image_path) as f:
            fig, ax = plt.subplots(1, 2)
            ax[0].imshow(f)
            ax[0].set_title(f"Original \nSize: {f.size}")
            ax[0].axis("off")
            
            transformed_image = transform(f).permute(1, 2, 0)
            ax[1].imshow(transformed_image)
            ax[1].set_title(f"Transformed \nSize: {transformed_image.shape}")
            ax[1].axis("off")
            
            fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)
            plt.show()



### Understanding Tensor Dimensions

**Important dimension ordering:**
- **PyTorch format**: `[C, H, W]` (Channels, Height, Width)
- **Matplotlib format**: `[H, W, C]` (Height, Width, Channels)

We use `.permute(1, 2, 0)` to rearrange tensor dimensions for visualization:
```python
# PyTorch: [3, 64, 64] → Matplotlib: [64, 64, 3]
tensor.permute(1, 2, 0)
```

In [6]:
image_path_list = list(image_path.glob("*/*/*.jpg"))
print(f"Found {len(image_path_list)} images")

plot_transformed_images(image_path_list, data_transform, n=2)

### Understanding Transforms

Nice! We've now got a way to convert our images to tensors using `torchvision.transforms`.

We can also manipulate their size and orientation if needed (some models prefer images of different sizes and shapes).

### Image Size Trade-offs

**Image size matters:**
- **Larger images** = More information for the model to learn from
- **Smaller images** = Faster computation but less detail

**Example:**
- An image of size `[256, 256, 3]` will have **16x more pixels** than `[64, 64, 3]`
- Calculation: `(256×256×3) / (64×64×3) = 16`

**The trade-off:**
- ✅ More pixels = More information
- ⚠️ More pixels = More computations (slower training)

**Practical tip:** Start with smaller images (64x64) to prototype quickly, then scale up if needed.

## 3. Load Data with ImageFolder

Alright, time to turn our image data into a `Dataset` capable of being used with PyTorch.

### What is ImageFolder?

Since our data is in standard image classification format, we can use the class **`torchvision.datasets.ImageFolder`**.

`ImageFolder` is PyTorch's built-in dataset class for image classification. It:
- Automatically reads images from folder structure
- Assigns labels based on folder names
- Applies transforms to each image
- Returns `(image_tensor, label)` pairs

### How to Use ImageFolder

We pass it:
1. **`root`** - File path of the target image directory
2. **`transform`** - Series of transforms to perform on images

Let's test it out on our `train_dir` and `test_dir`, passing in `transform=data_transform` to turn our images into tensors:

In [7]:
train_data = datasets.ImageFolder(root=train_dir, transform=data_transform)
test_data = datasets.ImageFolder(root=test_dir, transform=data_transform)

print(f'Train data: {train_data}')
print(f'Test data: {test_data}')

### Dataset Created Successfully!

Beautiful! It looks like PyTorch has registered our `Dataset`s.

Let's inspect them by checking:
- `classes` - List of class names
- `class_to_idx` - Dictionary mapping class names to indices
- Dataset lengths

In [8]:
class_names = train_data.classes
class_dict = train_data.class_to_idx

print(f'Class names: {class_names}')
print(f'Class to idx: {class_dict}')
print(f'Number of training images: {len(train_data)}')
print(f'Number of test images: {len(test_data)}')

In [14]:
img, label = train_data[0]

print(f"Image tensor:\n{img}")
print(f"Image shape: {img.shape}")
print(f"Image datatype: {img.dtype}")
print(f"Image label: {label}")
print(f"Label datatype: {type(label)}")

### Understanding the Output

Our images are now in the form of a **tensor** with:
- **Shape**: `[3, 64, 64]` → [Channels, Height, Width]
- **Data type**: `torch.float32` (normalized values between 0 and 1)

The labels are **integers** relating to a specific class:
- Referenced by the `class_to_idx` attribute
- `0` = pizza, `1` = steak, `2` = sushi

How about we plot a single image tensor using matplotlib?

### Plot a Sample Image

**Important**: PyTorch tensors have shape `[C, H, W]` (Channels, Height, Width), but matplotlib expects `[H, W, C]`. We use `.permute(1, 2, 0)` to rearrange dimensions.

In [10]:
# Permute for matplotlib: [C, H, W] -> [H, W, C]
img_permute = img.permute(1, 2, 0)

plt.figure(figsize=(8, 6))
plt.imshow(img_permute)
plt.title(f'Class: {class_names[label]}')
plt.axis(False)
plt.show()

### Image Quality vs. Resolution

Notice the image is now **more pixelated** (less quality).

**Why?**
- Original size: `512×512` pixels
- Resized to: `64×64` pixels
- Information loss: ~98.4% reduction in pixels!

**Important insight:**
> If you think the image is harder to recognize, chances are a model will find it harder to understand too.

**Balance to consider:**
- Higher resolution = Better quality = Slower training
- Lower resolution = Faster training = Potentially lower accuracy

For our food classification task, 64×64 is a good starting point!

## 4. Create DataLoaders

We've got our images as PyTorch `Dataset`s, but now let's turn them into `DataLoader`s.

### Why DataLoaders?

Training on one image at a time is **inefficient**. **DataLoaders** provide:

1. **Batching** - Group multiple images together (e.g., 32 images per batch)
   - GPUs excel at parallel computation on batches
   - Gradient updates are more stable with batch statistics

2. **Shuffling** - Randomize order each epoch
   - Prevents model from memorizing the order
   - Improves generalization

3. **Parallel Loading** - Use multiple CPU workers to load data faster
   - Loads next batch while GPU processes current batch
   - Reduces training bottlenecks

### Creating DataLoaders

We'll use `torch.utils.data.DataLoader` to turn our `Dataset`s into `DataLoader`s.

This makes them **iterable**, so a model can go through and learn the relationships between samples and targets (features and labels).

We'll use:
- `batch_size=16` - Process 16 images at once
- `num_workers=0` - Use main process for data loading (set to 1+ for parallel loading)

### Understanding num_workers

**What's `num_workers`?**

It defines how many **subprocesses** will be created to load your data.

Think of it like this:
- **`num_workers=0`**: Main process loads data (simplest, but slower)
- **`num_workers=1`**: One subprocess loads data in parallel
- **`num_workers=4`**: Four subprocesses load data simultaneously

**Performance tip:**
The higher `num_workers` is set, the more compute power PyTorch uses to load data.

**Best practice:**
Set it to the total number of CPUs on your machine using:
```python
import os
num_workers = os.cpu_count()
```

This ensures the DataLoader recruits as many cores as possible to load data efficiently.

**For this tutorial:** We use `num_workers=0` for simplicity (cross-platform compatibility).

In [20]:
import os
os.cpu_count()

In [21]:
BATCH_SIZE = 16

train_dataloader = DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

test_dataloader = DataLoader(
    dataset=test_data,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=2
)

print(f'Train dataloader: {train_dataloader}')
print(f'Test dataloader: {test_dataloader}')

### DataLoaders Created Successfully!

Wonderful! Now our data is **iterable**.

Let's try it out and check the shapes of a batch:

In [23]:
img, label = next(iter(train_dataloader))

# Batch size will now be 1, try changing the batch_size parameter above and see what happens
print(f"Image shape: {img.shape} -> [batch_size, color_channels, height, width]")
print(f"Label shape: {label.shape}")

We could now use these DataLoader's with a training and testing loop to train a model.

### Visualize a Batch

A batch is a group of images processed together. This is more efficient because:
- GPUs excel at parallel computation
- Gradient updates are more stable with batch statistics

In [18]:
fig, axes = plt.subplots(2, 4, figsize=(12, 6))
for i, ax in enumerate(axes.flatten()):
    if i < len(img_batch):
        ax.imshow(img_batch[i].permute(1, 2, 0))
        ax.set_title(f'{class_names[label_batch[i]]}')
        ax.axis('off')
plt.tight_layout()
plt.show()

## Key Takeaways

| Concept | Shape/Format | Description |
|---------|--------------|-------------|
| **PIL Image** | (H, W, C) | Original image format |
| **Image Tensor** | [C, H, W] | Single image: [3, 64, 64] |
| **Batch Tensor** | [N, C, H, W] | Batch of N images: [32, 3, 64, 64] |

### Transform Summary

```
PIL Image (variable size) 
    → Resize (64x64) 
    → RandomFlip (augmentation) 
    → ToTensor [3, 64, 64] (values 0-1)
```

### What's Next?

In **Lab 3**, we'll build a **custom Dataset class** from scratch to understand what happens under the hood!