# Performing data augmentation on a batch of images and the need for collate_fn

We have already seen that it is preferable to perform different augmentations in
different iterations on the same image.

If we have an augmentation pipeline defined in the \__init\__ method, we would
only need to perform augmentation once on the input set of images. This means we
would not have different augmentations on different iterations.

Similarly, if the augmentation is in the \__getitem\__ method – which is ideal since
we want to perform a different set of augmentations on each image – the major
bottleneck is that the augmentation is performed once for each image. It would be
much faster if we were to perform augmentation on a batch of images instead of on
one image at a time. Let's understand this in detail by looking at two scenarios where
we will be working on 32 images:

- Augmenting 32 images, one at a time
- Augmenting 32 images as a batch in one go

In [1]:
from torchvision import datasets
import torch
data_folder = '/Data/FMNIST/' # This can be any directory you want to download FMNIST to
fmnist = datasets.FashionMNIST(data_folder, download=True, train=True)

In [2]:
tr_images = fmnist.data
tr_targets = fmnist.targets

In [3]:
val_fmnist = datasets.FashionMNIST(data_folder, download=True, train=False)
val_images = val_fmnist.data
val_targets = val_fmnist.targets

In [4]:
from imgaug import augmenters as iaa
aug = iaa.Sequential([
              iaa.Affine(translate_px={'x':(-10,10)}, mode='constant'),
            ])

In [5]:
%%time
for i in range(32):
  aug.augment_image(tr_images[i])

CPU times: user 85.4 ms, sys: 0 ns, total: 85.4 ms
Wall time: 85.9 ms


In [6]:
%%time
x = aug.augment_images(tr_images[:32])

CPU times: user 11.7 ms, sys: 0 ns, total: 11.7 ms
Wall time: 12.3 ms
