# 05. PyTorch Going Modular

This section is all about turning jupyter notebook to python scripts

`Jupyter Notebook -> Python Scripts`

Going Modular is turning Notebook to series of different Python scripts that offer similar functionality.

for example, we can turn our notebook code into following Python files"
- `data_setup.py` - a file to prepare and download data if needed
- `engine.py` - a file containing various `training` functions
- `model_builder.py` - a file to create a PyTorch model
- `train.py` - a file to leverage all other files and train a target PyTorch model
- `utils.py` - a file dedicated to helpful utility functions

libraries like fast.ai's `nb-dev` enables us to write whole Python Library with Jupyter Notebook

Workflow:
1. Start with Jupyter notebook for a quick experiment and visualization
2. when something's working, move the most useful code to Python script.

## What we are going to cover
1. Going Modular: Part 1 (cell mode)
2. Going Modular: Part 2 (script mode)

We will work towards:

Converting our notebook into the files in following folder structure
```
going_modular/
├── going_modular/
│   ├── data_setup.py
│   ├── engine.py
│   ├── model_builder.py
│   ├── train.py
│   └── utils.py
├── models/
│   ├── 05_going_modular_cell_mode_tinyvgg_model.pth
│   └── 05_going_modular_script_mode_tinyvgg_model.pth
└── data/
    └── pizza_steak_sushi/
        ├── train/
        │   ├── pizza/
        │   │   ├── image01.jpeg
        │   │   └── ...
        │   ├── steak/
        │   └── sushi/
        └── test/
            ├── pizza/
            ├── steak/
            └── sushi/
```

## 1. Get Data

In [5]:
import requests
from pathlib import Path
from zipfile import ZipFile
import os

data_path = Path('data')
image_data_path = data_path/ "pizza_steak_sushi"

if image_data_path.is_dir():
    print(f'{image_data_path} already exists')
else:
    print(f'Creating folders: {image_data_path}')
    image_data_path.mkdir(parents=True, exist_ok=True)

data_url = 'https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip'
zip_file_name = 'pizza_steak_sushi.zip'
with open(image_data_path/zip_file_name, 'wb') as f:
    print('downloading...')
    request = requests.get(data_url)
    f.write(request.content)
    print(f'Done.')

with ZipFile(image_data_path/zip_file_name, 'r') as zip_file:
    print('extracting...')
    zip_file.extractall(image_data_path)
    print('Extraction done')

# remove zip file
os.remove(image_data_path/zip_file_name)


data/pizza_steak_sushi already exists
downloading...
Done.
extracting...
Extraction done


Now we have this structure:

```
data/
└── pizza_steak_sushi/
    ├── train/
    │   ├── pizza/
    │   │   ├── train_image01.jpeg
    │   │   ├── test_image02.jpeg
    │   │   └── ...
    │   ├── steak/
    │   │   └── ...
    │   └── sushi/
    │       └── ...
    └── test/
        ├── pizza/
        │   ├── test_image01.jpeg
        │   └── test_image02.jpeg
        ├── steak/
        └── sushi/
```

## 2.Create datasets and Dataloaders `data_setup.py`

In [9]:
import os
from torchvision import datasets, transforms
from torch.utils.data import DataLoader


NUM_WORKERS = os.cpu_count()
def create_dataloaders(
    train_dir: str,
    test_dir: str,
    transform: transforms.Compose,
    batch_size: int,
    num_workers: int = NUM_WORKERS
):
    """Creates training and testing DataLoaders
    Args:
        train_dir: Path to the training directory
        test_dir: Path to the testing directory
        transform: torchvision transforms to perform on training and testing data
        batch_size: Number of samples per batch in each of the DataLoaders
        num_workers: An integer for number of workers per DataLoader

    Returns:
        A tuple of (train_dataloader, test_dataloader, class_names)
    """
    train_data = datasets.ImageFolder(train_dir, transform=transform)
    test_data = datasets.ImageFolder(test_dir, transform=transform)

    # get class names
    class_names = train_data.class_names

    # data loaders
    train_dataloader = DataLoader(dataset=train_data,
                                  batch_size=batch_size,
                                  num_workers = num_workers,
                                  shuffle=True)
    test_dataloader = DataLoader(dataset=test_data,
                                  batch_size=batch_size,
                                  num_workers = num_workers,
                                  shuffle=False)
    return train_dataloader, test_dataloader, class_names