# What is a custom dataset?

Custom datasets are datasets not created specifically for use as input data for Neural Networks, so we need to adjust said data before we are able to use it to train and test our models.

We use different PyTorch domain libraries dependent on our problem space, these can include:

- Vision - `torchvision`
- Text - `torchtext`
- Audio - `torchaudio`
- Recommendation Systems - `torchrec`

TorchData is also a new domain we can use for all sort of data.

## FoodVisionMini

This will be a CNN that takes in custom data, following our PyTorch workflow as seen below.

- Get data ready (turn into tensors)
- Pick a loss function and optimizer
- Build or pick a pretrained model to suit our problem
- Fit the model to the data and make a prediction
- Evaluate the model
- Improve the model through experimentation
- Save and reload the trained model

In [2]:
import torch
from torch import nn

# Setup device-agnostic code

device = "cuda" if torch.cuda.is_available() else "cpu"

## 1. Get Data

Food101 starts with 101 different classes of food and 1000 images per class.

Our dataset, a subset of Food101, starts with just 3 classes, and 100 images per class.

When starting out ML projects, it's important to try things on a small scale and then increase the scale when necessary. The use is to speed up how fast you can experiment. If we don't do this, we will take ages to train every model.

In [3]:
import requests
import zipfile
from pathlib import Path


# Setup path to a data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

if image_path.is_dir():
    print(f"{image_path} directory already exists... skipping download")
else:
    print(f"{image_path} does not exist, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)

# Download data to pizza_steak_sushi

with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/blob/main/data/pizza_steak_sushi.zip")

data/pizza_steak_sushi directory already exists... skipping download
