<a href="https://colab.research.google.com/github/nmermigas/PyTorch/blob/main/05_PyTorch_Going_Modular.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview:

In this section, we will use a condensed version of the 04_custom_datasets notebook and will try to run it in a script mode.

What follows is the code of the notebook 04.

## 1. Get Data

In [1]:
import os
import requests
import zipfile
from pathlib import Path

# Setup path to data folder
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"

# If the image folder doesn't exist, download it and prepare it...
if image_path.is_dir():
    print(f"{image_path} directory exists.")
else:
    print(f"Did not find {image_path} directory, creating one...")
    image_path.mkdir(parents=True, exist_ok=True)

# Download pizza, steak, sushi data
with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
    request = requests.get("https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip")
    print("Downloading pizza, steak, sushi data...")
    f.write(request.content)

# Unzip pizza, steak, sushi data
with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    print("Unzipping pizza, steak, sushi data...")
    zip_ref.extractall(image_path)

# Remove zip file
os.remove(data_path / "pizza_steak_sushi.zip")

Did not find data/pizza_steak_sushi directory, creating one...
Downloading pizza, steak, sushi data...
Unzipping pizza, steak, sushi data...


In [8]:

# Setup train and testing paths
train_dir = image_path/"train"
test_dir = image_path / "test"

In [9]:

# Write a transform for image
data_transform = transforms.Compose([
    # Resize our images to 64x64
    transforms.Resize(size=(64,64)),
    # Flip the images randomly on the horizontal
    transforms.RandomHorizontalFlip(p=0.5),
    # Turn the image into a torch.Tensor
    transforms.ToTensor()
])

## 2. Create Datasets and DataLoaders (script mode)

Let's use the jupyter magic function to create a `.py` file for creating DataLoaders.

We can save a code cell's contents to a file using the Jupyter magic `%%magicfile filename`.

In [4]:
# Create a directory for going modular scripts
import os
os.makedirs('going_modular')

In [14]:
%%writefile going_modular/data_setup.py

"""
Contains functionality for creating PyTorch DataLoaders for
image classification data.

"""
import os
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

NUM_WORKERS = os.cpu_count()

def create_dataloaders(
  train_dir:str,
  test_dir:str,
  transform: transforms.Compose,
  batch_size:int,
  num_workers = NUM_WORKERS
):
  """ Creates training and test DataLoaders.
  Takes in a training directory and testing directory path
  and turns them into PyTorch Datasets and then into PyTorch DataLoaders.

  Args:
    train_dir: Path to training directory.
    test_dir: Path to testing directory.
    transform: torchvision transforms to perform on training and testing data.
    batch_size: Number of samples per batch in each of the DataLoaders.
    num_workers: An integer for number of workers per DataLoader

  Returns:
    A tuple of (train_dataloader, test_dataloader, class_names).
    Where class_names is a list of the target classes.
    Example usage:
      train_dataloader, test_dataloader, class_names = create_dataloaders(train_dir,
        test_dir,transform, batch_size, num_workers)

  """
  # Use ImageFolder to create datasets
  train_data = datasets.ImageFolder(train_dir, transform = transform)
  test_data = datasets.ImageFolder(test_dir, transform = transform)

  # Get class names
  class_names = train_data.classes

  # Turn datasets into DataLoaders
  train_dataloader = DataLoader(
      train_data,
      batch_size = batch_size,
      shuffle = True,
      pin_memory=True,
      num_workers = num_workers
  )

  test_dataloader = DataLoader(
      test_data,
      batch_size = batch_size,
      shuffle = False,
      pin_memory=True,
      num_workers = num_workers
  )





  return train_dataloader, test_dataloader, class_names

Overwriting going_modular/data_setup.py


In [17]:
from going_modular import data_setup

train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(train_dir = train_dir,
                                                                               test_dir = test_dir,
                                                                               transform = data_transform,
                                                                               batch_size = 32,
                                                                               num_workers = 2)
print(train_dataloader)

<torch.utils.data.dataloader.DataLoader object at 0x7d5dfbfee4a0>
