# Torchvision Built-in datasets

Torchvision provides many built-in datasets in the `torchvision.datasets` module, as well as utility classes for building your own datasets.

All datasets are subclasses of `torch.utils.data.Dataset` i.e, they have `__getitem__` and `__len__` methods implemented. Hence, they can all be passed to a `torch.utils.data.DataLoader` which can load multiple samples in parallel using torch.multiprocessing workers. For example:

In [None]:
import os
import torch
import torchvision
import matplotlib.pyplot as plt


## Example: MNIST

In [None]:
data_folder = '../data/'
dataset_name = 'MNIST'
dataset_folder = os.path.join(data_folder, dataset_name)

In [None]:
if os.path.exists(dataset_folder):
    print(f"Using existing dataset folder: {dataset_folder}")
    mnist_data = torchvision.datasets.MNIST(data_folder, 
                                            train=True,
                                            transform=torchvision.transforms.ToTensor())
    # NOTE:
    # transform=torchvision.transforms.ToTensor(): Converts the PIL Image or NumPy array to a 
    # torch.FloatTensor and scales the pixel values to the range [0.0, 1.0].
else:
    print(f"Dataset folder '{dataset_folder}' does not exist. Downloading dataset...")
    mnist_data = torchvision.datasets.MNIST(data_folder, 
                                            train=True,
                                            transform=torchvision.transforms.ToTensor(), 
                                            download=True)

In [None]:
print(mnist_data)

In [None]:
# Get the first image and its label
image, label = mnist_data[0] 

print("Image shape:", image.shape)
print("Label:", label)

In [None]:
# Plotting the first 10 images in the dataset

fig, ax = plt.subplots(2, 5, figsize=(12, 5))

for i in range(10):
    image, label = mnist_data[i]
    print(f"Image {i} - Label: {label}")
    ax[i // 5, i % 5].imshow(image.squeeze(), cmap='gray')
    ax[i // 5, i % 5].set_title(f"Label: {label}")
    ax[i // 5, i % 5].axis('off')
plt.show()


# Exercises

Explore the list of available datasets in `Torchvision`:

https://docs.pytorch.org/vision/stable/datasets.html

Choos one datased and inspect / visualize it's content:
- How many examples?
- What is the dimentionality of ech input example?
- What are the lables?
- What tipe of study can you do with this dataset?
