# Loading datasets which are available in the framework
## Problem
You want to load a dataset (e.g., MNIST) which is available directly in the framework.

## Solution in TensorFlow

The datasets are available in TensorFlow via the `tensorflow.keras.datasets` module. Within that module, multiple submodules exist for each dataset. With TensorFlow 2.0 and 1.4, the following dataset modules are available: 

* Boston housing price regression dataset (`boston_housing`).
* CIFAR10 small images classification dataset (`cifar10`).
* CIFAR100 small images classification dataset (`cifar100`).
* Fashion-MNIST dataset (`fashion_mnist`).
* IMDB sentiment classification dataset (`imdb`).
* MNIST handwritten digits dataset (`mnist`).
* Reuters topic classification dataset (`reuters`).

The method `load_data()` returns two tuples for test and training datasets, each with data and labels. Depending on the dataset, the method can have different arguments, for example, `test_split` for splitting the fraction of the test and training set in the Boston housing dataset.

In [1]:
import tensorflow.keras.datasets as datasets

(train_x, train_y), (test_x, test_y) = datasets.mnist.load_data()

## Solution in PyTorch

PyTorch comes with an optional `torchvision` package which contains only image related datasets. Access the dataset with a `Dataset` class (e.g., `MNIST`) and the `DataLoader` class. The `Dataset` class is used to determine which dataset is used (including the separation of test and training set) and what transformations to perform. The `DataLoader` represents an iterator and therefore allows to configure the batch size and the sampling strategy of the dataset. The `torchvision` package comes with an extensive list of datasets, in the `torchvision.datasets` package. The following dataset classes are available:

* MNIST dataset of handwritten digits (`MNIST`).
* Fashion-MNIST dataset (`FashionMNIST`).
* Kuzushiji-MNIST dataset of handwritten japanese characters (`KMNIST`).
* Extension of MNIST dataset to handwritten letters (`EMNIST`).
* Recreation of the MNIST dataset (`QMNIST`).
* COCO captions dataset (`CocoCaptions`).
* COCO detection dataset (`CocoDetection`).
* Large scale image understanding dataset (`LSUN`).
* ImageNet 2012 classification dataset (`ImageNet`).
* CIFAR10 small images classification dataset (`CIFAR10`).
* CIFAR100 small images classification dataset (`CIFAR100`).
* STL10 small images for unsupervised feature learning dataset (`STL10`)
* Street View Hause Number dataset (`SVHN`).
* PhotoTour dataset (`PhotoTour`).
* SBU captions dataset (`SBU`).
* Flickr8k captions dataset (`Flickr8k`).
* Flickr30k captions dataset (`Flick30k`).
* Pascal VOC segmentation dataset (`VOCSegmentation`).
* Pascal VOC detection dataset (`VOCDetection`).
* Cityscapes segmentation dataset (`Cityscapes`).
* SBD semantic contours dataset (`SBDataset`).
* USPS handwritten text dataset (`USPS`).
* Kinetics-400 action recognition videos dataset (`Kinetics400`).
* HMDB51 action recognition videos dataset (`HMDB51`).
* UCF101 action recognition videos dataset (`UCF101`).


In [2]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torch.utils.data as data

dataset = datasets.MNIST(root='~/.torch/MNIST', train=True, transform=transforms.ToTensor())
dataloader = data.DataLoader(dataset)

for train_x, train_y in dataloader:
    # Training loop
    pass

The `train` argument selects the training set (set to `True`) or test set (set to `False`). The `transform` argument specifies what transformations to apply to the dataset. In this example, it only transforms an image into a tensor. However, more complex transformation chains can be defined.

## Discussion

With the Boston housing, IMDB and Reuters datasets, TensorFlow offers more variety compared to the image only related datasets in PyTorch. A significant difference between TensorFlow and PyTorch is that TensorFlow provides a training loop via the model's `fit` method and in PyTorch, the user has to write it himself and therefore needs an iterator (`DataLoader`).