### The Image Classification Dataset

This lecture will be a bit different different than others, and we will dive deep into an image classification dataset.

Now imagine you are a ML scientist back in 20 years ago and you are thinking of a dataset for benchmarking image classification models.
Then the first dataset comes to your mind is highly likely to be the MNIST dataset. It was created by LeCun and his colleagues in 1998, and it is one of the widely used dataset for image classification.

However, as of today, even a simple model can achieve a classification accuracy over 95%, making it unsuitable for distinguishing between stronger models and the weaker ones. 
So Today, MNIST serves as more of a sanity check dataset than as a benchmark one. 


In [None]:
%matplotlib inline
from d2l import torch as d2l
import torch
import torchvision
from torchvision import transforms
from torch.utils import data

d2l.use_svg_display() 

To up the ante just a bit, we will focus this lecture in a  comparatively complex dataset released in 2017 -- Fashion-MNIST dataset.

We can download and read the Fashion-MNIST dataset into memory via the build-in functions in PyTorch.


`ToTensor` converts the image data from PIL type to 32-bit floating point tensors. It divides all numbers by 255 so that all pixel values are between 0 and 1.

In [None]:
trans = transforms.ToTensor()
mnist_train = torchvision.datasets.FashionMNIST(
    root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(
    root="../data", train=False, transform=trans, download=True)

Fashion-MNIST consists of images from 10 categories, each of the category has  6000 images in the training set and 1000 images in the test set. 

For those of you who dont know "what is a test dataset?" it is used for evaluating model performance, but not for training. 

So let's print the total number of images. 

In total, we have 60,000 images in the training set and 10,000 images in the test set.

In [None]:
len(mnist_train), len(mnist_test)

Let's take a look of the shape of each image.


For each image, the height and width both contain 28 pixels. 
While since this is a grayscale image dataset, so the number of "color" channels is 1. 


In [None]:
mnist_train[0][0].shape

We can also get the categories of the Fashion-MNIST datasets, which contains t-shirt, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot. 

The function `get_fashion_mnist_labels` returns text labels for the Fashion-MNIST dataset. 

In [None]:
def get_fashion_mnist_labels(labels):  #@save
    """Return text labels for the Fashion-MNIST dataset."""
    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',
                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
    return [text_labels[int(i)] for i in labels]

And the function `show_images` visualizes these examples.

In [None]:
def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):  #@save
    """Plot a list of images."""
    figsize = (num_cols * scale, num_rows * scale)
    _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)
    axes = axes.flatten()
    for i, (ax, img) in enumerate(zip(axes, imgs)):
        if torch.is_tensor(img):
            # Tensor Image
            ax.imshow(img.numpy())
        else:
            # PIL Image
            ax.imshow(img)
        ax.axes.get_xaxis().set_visible(False)
        ax.axes.get_yaxis().set_visible(False)
        if titles:
            ax.set_title(titles[i])
    return axes  

Here are some visualization of the images and their corresponding text labels for the first few examples in the training dataset.



In [None]:
X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))
show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));

## Reading a Minibatch



To make our life easier when reading from the training and test sets, we use the built-in data iterator rather than creating one from scratch. Recall that at each iteration, a data loader reads a minibatch of data with size `batch_size` at each time. 

We also randomly shuffle the examples through the iterator, while note that we only need to shuffle for the training dataset.

In [None]:
batch_size = 256

def get_dataloader_workers():  #@save
    """Use 4 processes to read the data."""
    return 4

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
                             num_workers=get_dataloader_workers())

How long does it take to read the full training dataset? Let's take a look.

In [None]:
timer = d2l.Timer()
for X, y in train_iter:
    continue
f'{timer.stop():.2f} sec'

## Putting All Things Together



Now it is the time to put all the things together, we can define this `load_data_fashion_mnist` function that reads the Fashion-MNIST dataset. It returns the data iterators for both the training set and validation set. In addition, it accepts an optional argument to resize images to another shape.

In [None]:
def load_data_fashion_mnist(batch_size, resize=None):  #@save
    """Download the Fashion-MNIST dataset and then load it into memory."""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans) 
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train, batch_size, shuffle=True,
                            num_workers=get_dataloader_workers()),
            data.DataLoader(mnist_test, batch_size, shuffle=False,
                            num_workers=get_dataloader_workers()))

Let's test the image resizing feature of the `load_data_fashion_mnist function` by specifying the resize argument to be 64 x 64.

In [None]:
train_iter, test_iter = load_data_fashion_mnist(32, resize=64)
for X, y in train_iter:
    print(X.shape, X.dtype, y.shape, y.dtype)
    break

Great job! Now the Fashion-MNIST dataset is ready-to-be-trained, and we will move on to the more complex model training in the next section.