Now that we have a dataset and a collation function, we're ready to create `DataLoader`. We'll add two more things here: an optional `shuffle` for the training set, and a `ProcessPoolExecutor` to do our preprocessing in parallel. A parallel data loader is very important, because opening and decoding a JPEG image is a slow process. One CPU core is not enough to decode images fast enough to keep a modern GPU busy. Here's our `DataLoader` class:

In [1]:
from fastai.vision.all import *

In [2]:
class Dataloader:
    def __init__(self, ds, bs=128, shuffle=False, n_workers=1):
        self.ds, self.bs, self.shuffle, self.n_workers = ds, bs, shuffle, n_workers
    
    def __len__(self):
        return (len(self.ds) - 1) // self.bs + 1

Now, the prelimiary definitions for the demo.

In [3]:
class Dataset:
    def __init__(self, fns_ys):
        self.fns, self.ys = fns_ys
    def __len__(self):
        return len(self.fns)
    def __getitem__(self, i):
        im = Image.open(self.fns[i]).resize((64,64)).convert('RGB')
        return tensor(im).float()/255, tensor(self.ys[i])

In [4]:
path = untar_data(URLs.IMAGENETTE_160)
t = get_image_files(path)
lbls = t.map(Self.parent.name()).unique()
lidx = lbls.val2idx()
y = L(lidx[o.parent.name] for o in t)
train_filt = L(o.parent.parent.name == 'train' for o in t)
train, valid = t[train_filt], t[~train_filt]
train_y, valid_y = y[train_filt], t[~train_filt]
train_ds, valid_ds = Dataset((train, train_y)), Dataset((valid, valid_y))

In [5]:

def collate(idxs, ds):
    xb, yb = zip(*[ds[i] for i in idxs])
    return torch.stack(xb), torch.stack(yb)