# 04. PyTorch Custom Datasets 

we've used some datasets with PyTorch before, but how do we get our own data into PyTorch

one way to do this we can use custom datasets.

## Domain Libraries

depending on what we're doing (ie. vision, text, audio) we can look into PyTorch domain libraries for existing data loading functions and customizable dataloading functions

In [2]:
#set up device agnostic code
import torch 
from torch import nn

device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

### 1. Get Data

In [3]:
import requests
import zipfile
from pathlib import Path

#set path to datafolder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

#check if directory exists already
if image_path.is_dir():
    print(f"{image_path} directory already exists")
else:
    print(f"{image_path} directory is being created")
    image_path.mkdir(parents=True, exist_ok=True)

#download dataset
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")   
    print("downloding dataset")
    f.write(request.content)

#extract data from zip file
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak and sushi data")
    zip_ref.extractall(image_path)

data\pizza_steak_sushi directory is being created
downloding dataset
Unzipping pizza, steak and sushi data


## 2. Becoming One With the Data (Data Preparation and Exploration)

In [6]:
import os
def walk_through_dir(dir_path):
    for dir_path, dirnames, filenames in os.walk(dir_path):
        print(f"There are {len(dirnames)} directories and {len(filenames)} images in '{dir_path}")

In [7]:
walk_through_dir(image_path)

There are 2 directories and 0 images in 'data\pizza_steak_sushi
There are 3 directories and 0 images in 'data\pizza_steak_sushi\test
There are 0 directories and 25 images in 'data\pizza_steak_sushi\test\pizza
There are 0 directories and 19 images in 'data\pizza_steak_sushi\test\steak
There are 0 directories and 31 images in 'data\pizza_steak_sushi\test\sushi
There are 3 directories and 0 images in 'data\pizza_steak_sushi\train
There are 0 directories and 78 images in 'data\pizza_steak_sushi\train\pizza
There are 0 directories and 75 images in 'data\pizza_steak_sushi\train\steak
There are 0 directories and 72 images in 'data\pizza_steak_sushi\train\sushi


In [8]:
#set up training and test paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir,test_dir

(WindowsPath('data/pizza_steak_sushi/train'),
 WindowsPath('data/pizza_steak_sushi/test'))