# PyTorch Quickstart Tutorial

<i>This tutorial is to introduce the basics of using PyTorch.</i>

In [1]:
# For tips on running notebooks in Google Colab, see
# https://pytorch.org/tutorials/beginner/colab
%matplotlib inline

## Table of Contents
1. [Working with data](#working-with-data)
2. [Creating Models](#creating-models)
3. [Optimizing the Model Parameters](optimzing-the-model-parameters)
4. [Saving Models](#saving-models)
5. [Loading Models](#loading-models)

## Working with data
- [Dataset](#dataset)
- [DataLoader](#dataloader)

##### Import Modules

In [4]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

### Dataset

#### What is `Dataset`?
- Storage of the samples and their corresponding labels
- `torchvision.datasets` module contains `Dataset` objects

##### Example Code

In [9]:
# Download training data from open datasets.

### FashionMNIST: Dataset with 60000, 28x18 grayscale images in 10 categories ###
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),  ## "transform" the data to "tensor" format
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

##### for better understanding...

In [11]:
### We can see that the number of training data is 60000 ###
print(len(training_data))

60000


In [21]:
### We can see what's in the training_data: tuple with tensor and label ###
return_something = training_data[0]
# print(return_something)
print(type(return_something))  ## this 'tuple': (img, label)

<class 'tuple'>


In [28]:
### ToTensor(): convert np.ndarray ---> torch.FlaotTensor ###
    # It converts
    # PIL Image or np.ndarray -----------> torch.FloatTensor
    #   in the range [0, 255]           in the range [0.0, 1.0]
    #       (H x W x C)                       (C x H x W)
    # (Height x Width x Channel)      (Channel x Height x Width)

    # if the PIL Image belongs to one of the models (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
    # or if the numpy.ndarray has dtype = np.uint8

# Check a data (index: 0)
img, label = training_data[0]
print(img.shape)  # torch.Size([channel, height, width])
print(label)  # this image is labeled to 9

torch.Size([1, 28, 28])
9


##### Example for `Dataset`
- Every TorchVision `Dataset` includes two arguments to modify the samples & labels respectively:
    - `transform`
    - `target_transform`

In [None]:
# import os
# import pandas as pd
# from torchvision.io import read_image

# class CustomImageDataset(Dataset):
#     def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
#         self.img_labels = pd.read_csv(annotations_file)
#         self.img_dir = img_dir
#         self.transform = transform
#         self.target_transform = target_transform
    
#     def __len__(self):
#         return len(self.img_labels)
    
#     def __getitem__(self, idx):
#         img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
#         image = read_image(img_path)
#         label = self.img_labels.iloc[idx, 1]
#         if self.transform:
#             image = self.transform(image)
#         if self.target_transform:
#             label = self.target_transform(label)
#         return image, label

### DataLoader

#### What is `DataLoader`?
- Wraps an iterable around the `Dataset`
- Supports automatic batching, sampling, shuffling, and multiprocess data loading

##### Example Code

In [31]:
# Define a batch size of 64
batch_size = 64  ## each element in the dataloader iterable will return a batch of 64 features & labels

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

# How dataloader work
for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

# Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28]) => meaning: dataloader now want to run 64 times and stack everything to new exit
# (C, H, W) ---------> (B, C, H, W)
#           (Batch_size, Channel, Height, Width)

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


##### For better understanding...

In [32]:
print(f"Len DataLoader: {len(test_dataloader)}")
print(f"Len DataLoader: {len(train_dataloader)}")

Len DataLoader: 157
Len DataLoader: 938


In [33]:
60000 / 64

937.5