# architecture of CNN

In [None]:
import os
import cv2
import matplotlib.pyplot as plt

# Relative path to the image directory
# './' refers to the current directory
image_dir = os.path.join('.', 'image')

# Construct the path to the image using os.path.join
image_path = os.path.join(image_dir, 'CNN_architecture.jpg')

# Reading an image in default mode
image = cv2.imread(image_path)

# Check if the image was successfully loaded
if image is None:
    raise ValueError("Failed to load the image. The file may be corrupted or in an unsupported format.")

plt.axis('off')  # Command for hiding the axis
plt.imshow(image)
plt.show()

## What we're going to cover

We're going to apply the PyTorch Workflow we've been learning in the past couple of sections to computer vision.

![a PyTorch workflow with a computer vision focus](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-pytorch-computer-vision-workflow.png)

Specifically, we're going to cover:

| **Topic** | **Contents** |
| ----- | ----- |
| **0. Computer vision libraries in PyTorch** | PyTorch has a bunch of built-in helpful computer vision libraries, let's check them out.  |
| **1. Load data** | To practice computer vision, we'll start with some images of different pieces of clothing from [FashionMNIST](https://github.com/zalandoresearch/fashion-mnist). |
| **2. Prepare data** | We've got some images, let's load them in with a [PyTorch `DataLoader`](https://pytorch.org/docs/stable/data.html) so we can use them with our training loop. |
| **3. Model 0: Building a baseline model** | Here we'll create a multi-class classification model to learn patterns in the data, we'll also choose a **loss function**, **optimizer** and build a **training loop**. | 
| **4. Making predictions and evaluting model 0** | Let's make some predictions with our baseline model and evaluate them. |
| **5. Setup device agnostic code for future models** | It's best practice to write device-agnostic code, so let's set it up. |
| **6. Model 1: Adding non-linearity** | Experimenting is a large part of machine learning, let's try and improve upon our baseline model by adding non-linear layers. |
| **7. Model 2: Convolutional Neural Network (CNN)** | Time to get computer vision specific and introduce the powerful convolutional neural network architecture. |
| **8. Comparing our models** | We've built three different models, let's compare them. |
| **9. Evaluating our best model** | Let's make some predictons on random images and evaluate our best model. |
| **10. Making a confusion matrix** | A confusion matrix is a great way to evaluate a classification model, let's see how we can make one. |
| **11. Saving and loading the best performing model** | Since we might want to use our model for later, let's save it and make sure it loads back in correctly. |

# 0. Computer vision libraries in PyTorch # how is it different from pip install dataset?? huggingface dataset เอาเข้ามายังไงมันมีใน lib ??

Before we get started writing code, let's talk about some PyTorch computer vision libraries you should be aware of.

| PyTorch module | What does it do? |
| ----- | ----- |
| [`torchvision`](https://pytorch.org/vision/stable/index.html) | Contains datasets, model architectures and image transformations often used for computer vision problems. |
| [`torchvision.datasets`](https://pytorch.org/vision/stable/datasets.html) | Here you'll find many example computer vision datasets for a range of problems from image classification, object detection, image captioning, video classification and more. It also contains [a series of base classes for making custom datasets](https://pytorch.org/vision/stable/datasets.html#base-classes-for-custom-datasets). |
| [`torchvision.models`](https://pytorch.org/vision/stable/models.html) | This module contains well-performing and commonly used computer vision model architectures implemented in PyTorch, you can use these with your own problems. | 
| [`torchvision.transforms`](https://pytorch.org/vision/stable/transforms.html) | Often images need to be transformed (turned into numbers/processed/augmented) before being used with a model, common image transformations are found here. | 
| [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) | Base dataset class for PyTorch.  | 
| [`torch.utils.data.DataLoader`](https://pytorch.org/docs/stable/data.html#module-torch.utils.data) | Creates a Python iterable over a dataset (created with `torch.utils.data.Dataset`). |

> **Note:** The `torch.utils.data.Dataset` and `torch.utils.data.DataLoader` classes aren't only for computer vision in PyTorch, they are capable of dealing with many different types of data.

Now we've covered some of the most important PyTorch computer vision libraries, let's import the relevant dependencies.

In [None]:
import torch
from torch import nn

import torchvision
from torchvision import datasets                   # how is it different from !pip install datasets
from torchvision import transforms
from torchvision.transforms import ToTensor


#matplotlib
import matplotlib.pyplot as plt

#check version
print(torch.__version__)
print(torchvision.__version__)

# 1. getting dataset

we use torchvision.transforms.ToTensor() because we want the data as a 

In [None]:
# setup dataset
from torchvision import datasets
train_data = datasets.FashionMNIST(
    root = "data",   # where to download
    train = True,
    download = True,
    transform = torchvision.transforms.ToTensor(),     # how do we want to transforms the data      #Convert a PIL Image or ndarray to tensor and scale the values accordingly.     #https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html
    target_transform = None  # how do we want to transforms the labels/targets
)


test_data = datasets.FashionMNIST(
    root = "data",
    train = False,
    download = True,
    transform = ToTensor(),
    target_transform = None
)

In [None]:
len(train_data), len(test_data)

In [None]:
train_data[0]

In [None]:
class_names = train_data.classes
class_names

In [None]:
class_to_idx = train_data.class_to_idx
class_to_idx

In [None]:
train_data.targets

In [None]:
# See first training sample
image, label = train_data[0]
image, label

In [None]:
print(f"image shape: {image.shape} -> [chanel, height, width]")
print(f"image label: {class_names[label]}")

## 1.1 check input output shape

In [None]:
image.shape

## 1.2 visualize our data

In [None]:
import matplotlib.pyplot as plt
image, label = train_data[0]
print(image.shape)
print(label)
print(image.squeeze().shape)

In [None]:
plt.imshow(image.squeeze())
plt.title(label)
plt.axis(False)

In [None]:
# Plot more images
torch.manual_seed(42)
fig = plt.figure(figsize=(9, 9))
rows, cols = 4, 4
for i in range(1, rows * cols + 1):
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    img, label = train_data[random_idx]
    fig.add_subplot(rows, cols, i)
    plt.imshow(img.squeeze(), cmap="gray")
    plt.title(class_names[label])
    plt.axis(False);

In [None]:
train_data, test_data

# 2. prepare data loader

Right now, our data is in the form of pytorch dataset

Dataloader turns our dataset into a python iterable

More specifically, we want to turn our data into batches (or mini-batches).

    Why do this?

Because it's more computationally efficient.

In an ideal world you could do the forward pass and backward pass across all of your data at once.

But once you start using really large datasets, unless you've got infinite computing power, it's easier to break them up into batches.

It also gives your model more opportunities to improve.

With mini-batches (small portions of the data), gradient descent is performed more often per epoch (once per mini-batch rather than once per epoch).

What's a good batch size?

32 is a good place to start for a fair amount of problems.

But since this is a value you can set (a hyperparameter) you can try all different kinds of values, though generally powers of 2 are used most often (e.g. 32, 64, 128, 256, 512).

    Dataloader -> https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

In [None]:
from torch.utils.data import DataLoader

BATCH_SIZE = 32
train_dataloader = DataLoader(dataset=train_data,
                              batch_size=BATCH_SIZE,
                              shuffle=True)

test_dataloader = DataLoader(dataset = test_data,
                             batch_size=BATCH_SIZE,
                             shuffle=True)

train_dataloader, test_dataloader

    * Dataloader is batch of data. when we iterate through data loader, each iterate is a batch.

In [None]:
print(len(train_dataloader))
print(len(test_dataloader))

In [None]:
# Check out what's inside the training dataloader
train_features_batch, train_labels_batch = next(iter(train_dataloader))             #example here https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
train_features_batch.shape, train_labels_batch.shape

In [None]:
print(train_labels_batch[0])

# 3. Build a simple baseline model

In [None]:
# create a flatten layer
flatten_model = nn.Flatten()

#get a single sample
x = train_features_batch[0]
print("shape before flatten is", x.shape)

#Flatten the sample
output = flatten_model(x)
print("shape after flatten is", output.shape)

In [None]:
from torch import nn

class FashionMNISTModelV0(nn.Module) :
    def __init__(self,
                 input_shape: int,
                 hidden_units: int,
                 output_shape: int) :
        
        super().__init__()
        
        self.layer_stack = nn.Sequential(
            nn.Flatten(),
            
            nn.Linear(in_features=input_shape,
                      out_features=hidden_units),
            
            nn.Linear(in_features=hidden_units,
                      out_features=output_shape)
            
        )
        
    def forward(self, x) :
        return self.layer_stack(x)
    

## same as 
'''
class FashionMNISTModelV0(nn.Module) :
    def __init__(self,
                 input_shape: int,
                 hidden_units: int,
                 output_shape: int) :
        
        super().__init__()
        
        self.layer_0 = nn.Flatten()
        self.layer_1 = nn.Linear(in_features=input_shape, out_features=hidden_units)
        self.layer_2 = nn.Linear(in_features=hidden_units, out_features=output_shape)
        
    def forward(self, x) :
        return self.layer_2(self.layer_1(self.layer_0(x)))
        
'''

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

torch.manual_seed(42)

#setup model 
model_0 = FashionMNISTModelV0(
    input_shape=784,   # from 28*28 image
    hidden_units=15,
    output_shape=len(class_names)
)

model_0.to(device)
model_0

In [None]:
dummy_x = torch.rand([1,1,28,28])
print(model_0(dummy_x))
print()


dummy_x = torch.rand([1,28,28])
print(model_0(dummy_x))

## 3.1 Setup loss, optimizer and evaluation metrics


In [None]:
import requests
from pathlib import Path 

# Download helper functions from Learn PyTorch repo (if not already downloaded)
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download")
else:
  print("Downloading helper_functions.py")
  # Note: you need the "raw" GitHub URL for this to work
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

In [None]:
# Import accuracy metric
from helper_functions import accuracy_fn # Note: could also use torchmetrics.Accuracy(task = 'multiclass', num_classes=len(class_names)).to(device)

# Setup loss function and optimizer
loss_fn = nn.CrossEntropyLoss() # this is also called "criterion"/"cost function" in some places
optimizer = torch.optim.SGD(params=model_0.parameters(), lr=0.1)

## 3.2 create a function to time our experiment

In [None]:
from timeit import default_timer as timer 
def print_train_time(start: float, end: float, device: torch.device = None):
    """Prints difference between start and end time.

    Args:
        start (float): Start time of computation (preferred in timeit format). 
        end (float): End time of computation.
        device ([type], optional): Device that compute is running on. Defaults to None.

    Returns:
        float: time between start and end in seconds (higher is longer).
    """
    total_time = end - start
    print(f"Train time on {device}: {total_time:.3f} seconds")
    return total_time

## 3.3 Creating a training loop and training a model on batches of data (cpu)

Let's step through it:

1. Loop through epochs.
2. Loop through training batches, perform training steps, calculate the train loss per batch.
3. Loop through testing batches, perform testing steps, calculate the test loss per batch.
4. Print out what's happening.
5. Time it all (for fun).

In [None]:
'''
# import tqdm for progress bar
from tqdm.auto import tqdm

#set the seed
torch.manual_seed(42)
train_time_start_on_cpu = timer()

#set the number of epochs
epoch = 10

# create training and testing loop
for epoch in tqdm(range(epoch)) :
    print("Epoch:", epoch)
    
    ### training
    train_loss = 0
    # loop through
    for batch, (X, y) in enumerate(train_dataloader) :
        model_0.train()
        
        #1. forward pass
        y_pred = model_0(X)
        
        #2. calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        
        #3. optimizer zero grad
        optimizer.zero_grad()
        
        #4. loss backward
        loss.backward()
        
        #5. optimizer step
        optimizer.step()
        
        
        #print out
        if batch % 400 == 0 :
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
            

    # divide total train loss by length of train dataloader
    train_loss /= len(train_dataloader)
    
    ### testing
    test_loss, test_acc = 0, 0
    model_0.eval()
    with torch.inference_mode() :
        for x_test, y_test in test_dataloader :
            #1. forward pass
            test_pred = model_0(x_test)
            
            #2. calculate loss
            test_loss += loss_fn(test_pred, y_test)
            
            #3. calculate accuracy
            test_acc += accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))
        
        #calculate the test loss
        test_loss /= len(test_dataloader)
        
        #calculate test acc
        test_acc /= len(test_dataloader)
    
    #print out
    print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n")
        
# Calculate training time      
train_time_end_on_cpu = timer()
total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu, 
                                           end=train_time_end_on_cpu,
                                           device=str(next(model_0.parameters()).device))

'''



# import tqdm
from tqdm.auto import tqdm

#set the seed
torch.manual_seed(42)
train_time_start_on_cpu = timer()

#set epochs
epochs = 3

#create training loop
for epoch in tqdm(range(epochs)) :
    print(f"Epoch: {epoch}\n-------")
    
    #training
    train_loss = 0
    #loop through batches
    for batch, (X,y) in enumerate(train_dataloader) :
        
        
        model_0.train()
        #1. forward pass
        y_pred = model_0(X)
        
        #2. calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        
        #3. optimizer zero grad
        optimizer.zero_grad()
        
        #4. loss backward
        loss.backward()
        
        #5. optimizer step
        optimizer.step()
        
        # Print out how many samples have been seen
        if batch % 400 == 0:
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
            
    # (average loss per batch per epoch)
    train_loss /= len(train_dataloader)
    
    
    
    
    ### testing
    test_loss, test_acc = 0, 0
    model_0.eval()
    with torch.inference_mode() :
        for X, y in test_dataloader :
            #1. forward pass
            test_pred = model_0(X)
            
            #2. calculate loss
            test_loss += loss_fn(test_pred, y)
            
            #3. calculate accuracy
            test_acc += accuracy_fn(y_true=y, y_pred=test_pred.argmax(dim=1))
            
        #divide total test loss
        test_loss /= len(test_dataloader)
        
        #divide total accuracy
        test_acc /= len(test_dataloader) 
        
    ## Print out what's happening
    print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n")
    
    
    
# Calculate training time      
train_time_end_on_cpu = timer()
total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu, 
                                           end=train_time_end_on_cpu,
                                           device=str(next(model_0.parameters()).device))
                                           

                                           


## 3.4 make predictions and get Model0 result (cpu)

In [None]:
'''
torch.manual_seed(42)
def eval_model(model: torch.nn.Module, 
               data_loader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               accuracy_fn):
    """Returns a dictionary containing the results of model predicting on data_loader.

    Args:
        model (torch.nn.Module): A PyTorch model capable of making predictions on data_loader.
        data_loader (torch.utils.data.DataLoader): The target dataset to predict on.
        loss_fn (torch.nn.Module): The loss function of model.
        accuracy_fn: An accuracy function to compare the models predictions to the truth labels.

    Returns:
        (dict): Results of model making predictions on data_loader.
    """
    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode():
        for X, y in data_loader:
            # Make predictions with the model
            y_pred = model(X)
            
            # Accumulate the loss and accuracy values per batch
            loss += loss_fn(y_pred, y)
            acc += accuracy_fn(y_true=y, 
                                y_pred=y_pred.argmax(dim=1)) # For accuracy, need the prediction labels (logits -> pred_prob -> pred_labels)
        
        # Scale loss and acc to find the average loss/acc per batch
        loss /= len(data_loader)
        acc /= len(data_loader)
        
    return {"model_name": model.__class__.__name__, # only works when model was created with a class
            "model_loss": loss.item(),
            "model_acc": acc}

# Calculate model 0 results on test dataset
model_0_results = eval_model(model=model_0, data_loader=test_dataloader,
    loss_fn=loss_fn, accuracy_fn=accuracy_fn
)
model_0_results

'''



torch.manual_seed(42)
def eval_model(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               accuracy_fn) :
    """
    Returns a dictionary containing the results of model prediction on data_loader.

    Args:
        model (torch.nn.Module): model 
        data_loader (torch.utils.data.DataLoader): target dataset
        loss_fn (torch.nn.Module): loss function
        accuracy_fn (_type_): accuracy

    Returns:
        (dict) : Results
    """
    
    loss, acc = 0, 0
    model.eval()
    with torch.inference_mode() :
        for X, y in data_loader :
            
            #1. make prediction
            y_pred = model_0(X)
            
            #2. loss and accuracy
            loss += loss_fn(y_pred, y)
            acc += accuracy_fn(y_true = y, y_pred = y_pred.argmax(dim=1))
            
        # average
        loss /= len(data_loader)
        acc /= len(data_loader)
    
    return {"model_name": model.__class__.__name__, # only works when model was created with a class
            "model_loss": loss.item(),
            "model_acc": acc}
    
    
# Calculate model 0 results on test dataset
model_0_results = eval_model(model=model_0, data_loader=test_dataloader,
    loss_fn=loss_fn, accuracy_fn=accuracy_fn
)
model_0_results

# 4. Setup device agnostic-code (for using a GPU if there is one)

In [None]:
# Setup device agnostic code
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
device

# 5. Model 1: Building a better model with non-linearity

In [None]:
from torch import nn

class FashionMNISTModelV1(nn.Module) :
    def __init__(self,
                 input_shape: int,
                 hidden_units: int,
                 output_shape: int) :
        
        super().__init__()
        
        self.layer_stack = nn.Sequential(
            nn.Flatten(),
            
            nn.Linear(in_features=input_shape,
                      out_features=hidden_units),
            
            nn.ReLU(),
            
            nn.Linear(in_features=hidden_units,
                      out_features=output_shape),
            
            nn.ReLU()
            
        )
        
    def forward(self, x) :
        return self.layer_stack(x)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

torch.manual_seed(42)

#setup model 
model_1 = FashionMNISTModelV1(
    input_shape=784,   # from 28*28 image
    hidden_units=15,
    output_shape=len(class_names)
)

model_1.to(device)
model_1

## 5.1 Setup loss, optimizer and evaluation metrics

In [None]:
from helper_functions import accuracy_fn
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_1.parameters(), 
                            lr=0.1)

## 5.2 Functionizing training and test loops

In [None]:
'''
# import tqdm for progress bar
from tqdm.auto import tqdm

#set the seed
torch.manual_seed(42)
train_time_start_on_cpu = timer()

#set the number of epochs
epoch = 10

# create training and testing loop
for epoch in tqdm(range(epoch)) :
    print("Epoch:", epoch)
    
    ### training
    train_loss = 0
    # loop through
    for batch, (X, y) in enumerate(train_dataloader) :
        model_1.train()
        
        #1. forward pass
        y_pred = model_1(X)
        
        #2. calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        
        #3. optimizer zero grad
        optimizer.zero_grad()
        
        #4. loss backward
        loss.backward()
        
        #5. optimizer step
        optimizer.step()
        
        
        #print out
        if batch % 400 == 0 :
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
            

    # divide total train loss by length of train dataloader
    train_loss /= len(train_dataloader)
    
    ### testing
    test_loss, test_acc = 0, 0
    model_1.eval()
    with torch.inference_mode() :
        for x_test, y_test in test_dataloader :
            #1. forward pass
            test_pred = model_1(x_test)
            
            #2. calculate loss
            test_loss += loss_fn(test_pred, y_test)
            
            #3. calculate accuracy
            test_acc += accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))
        
        #calculate the test loss
        test_loss /= len(test_dataloader)
        
        #calculate test acc
        test_acc /= len(test_dataloader)
    
    #print out
    print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n")

'''
    
        
    


def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device) :
    
    train_loss = 0
    train_acc = 0
    model.to(device)
    for batch, (X, y) in enumerate(data_loader) :
        X, y = X.to(device), y.to(device)
        
        #1. forward pass
        y_pred = model(X)
        
        #2. Calculate loss
        loss = loss_fn(y_pred, y)
        accuracy = accuracy_fn(y_true=y,
                                 y_pred=y_pred.argmax(dim=1))
        
        train_loss += loss
        train_acc += accuracy
        
        #3. optimizer zero grad
        optimizer.zero_grad()
        
        #4. loss backward
        loss.backward()
        
        #5. optimizer step
        optimizer.step()
        
        
    # Calculate loss and accuracy per epoch and print out what's happening
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")




def test_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               accuracy_fn,
               device: torch.device = device) :
    
    test_loss = 0
    test_acc = 0
    model.to(device)
    model.eval()
    
    with torch.inference_mode() :
        for X, y in data_loader :
            
            #send data to gpu
            X, y = X.to(device), y.to(device)
            
            #1. forward pass
            test_pred = model(X)
            
            #2. calculate loss and accuracy
            loss = loss_fn(test_pred, y)
            accuracy = accuracy_fn(y_true=y,
                y_pred=test_pred.argmax(dim=1))
            
            test_loss += loss
            test_acc += accuracy
            
            
        #average
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)
        print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\n")



'''
def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device):
    train_loss, train_acc = 0, 0
    model.to(device)
    for batch, (X, y) in enumerate(data_loader):
        # Send data to GPU
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X)

        # 2. Calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        train_acc += accuracy_fn(y_true=y,
                                 y_pred=y_pred.argmax(dim=1)) # Go from logits -> pred labels

        # 3. Optimizer zero grad
        optimizer.zero_grad()

        # 4. Loss backward
        loss.backward()

        # 5. Optimizer step
        optimizer.step()

    # Calculate loss and accuracy per epoch and print out what's happening
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")
    
    
    
    

def test_step(data_loader: torch.utils.data.DataLoader,
              model: torch.nn.Module,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device = device):
    test_loss, test_acc = 0, 0
    model.to(device)
    model.eval() # put model in eval mode
    # Turn on inference context manager
    with torch.inference_mode(): 
        for X, y in data_loader:
            # Send data to GPU
            X, y = X.to(device), y.to(device)
            
            # 1. Forward pass
            test_pred = model(X)
            
            # 2. Calculate loss and accuracy
            test_loss += loss_fn(test_pred, y)
            test_acc += accuracy_fn(y_true=y,
                y_pred=test_pred.argmax(dim=1) # Go from logits -> pred labels
            )
        
        #average
        test_loss /= len(data_loader)
        test_acc /= len(data_loader)
        print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\n")

'''


In [None]:
torch.manual_seed(42)

# Measure time
from timeit import default_timer as timer
train_time_start_on_cpu = timer()

epochs = 3
for epoch in tqdm(range(epochs)) :
    print(f"Epoch: {epoch}\n---------")
    
    train_step(data_loader=train_dataloader,
               model=model_1,
               loss_fn=loss_fn,
               optimizer=optimizer,
               accuracy_fn=accuracy_fn)

    test_step(data_loader=test_dataloader,
              model=model_1,
              loss_fn=loss_fn,
              accuracy_fn=accuracy_fn)

# Calculate training time      
train_time_end_on_cpu = timer()
total_train_time_model_1 = print_train_time(start=train_time_start_on_cpu, 
                                           end=train_time_end_on_cpu,
                                           device=str(next(model_1.parameters()).device))




## 5.3 result

In [None]:
torch.manual_seed(42)
def eval_model(model: torch.nn.Module, 
               data_loader: torch.utils.data.DataLoader, 
               loss_fn: torch.nn.Module, 
               accuracy_fn,
               device: torch.device = device):
    """Evaluate a given model on a given dataset

    Args:
        model (torch.nn.Module): model
        data_loader (torch.utils.data.DataLoader): target data
        loss_fn (torch.nn.Module): loss function
        accuracy_fn (_type_): accuracy
        device (torch.device, optional): device
        
    Returns:
        (dict): Results of model making predictions on data_loader.
        
    """
    
    eval_loss = 0
    eval_acc = 0
    model.eval()
    
    with torch.inference_mode() :
        for X, y in data_loader :
            
            #send data to device
            X, y = X.to(device), y.to(device)
            y_pred = model(X)
            
            loss = loss_fn(y_pred, y)
            accuracy = accuracy_fn(y_true=y, y_pred=y_pred.argmax(dim=1))
            eval_loss += loss
            eval_acc += accuracy
            
        #average
        eval_loss /= len(data_loader)
        eval_acc /= len(data_loader)
        
    return {"model_name": model.__class__.__name__, # only works when model was created with a class
            "model_loss": eval_loss.item(),
            "model_acc": eval_acc}
            
    
# Calculate model 1 results with device-agnostic code 
model_1_results = eval_model(model=model_1, data_loader=test_dataloader,
    loss_fn=loss_fn, accuracy_fn=accuracy_fn,
    device=device
)
model_1_results
    

# 6. Model2 : CNN model

The CNN model we're going to be using is known as TinyVGG from the [CNN Explainer](https://poloclub.github.io/cnn-explainer/) website.

It follows the typical structure of a convolutional neural network:

`Input layer -> [Convolutional layer -> activation layer -> pooling layer] -> Output layer`

Where the contents of `[Convolutional layer -> activation layer -> pooling layer]` can be upscaled and repeated multiple times, depending on requirements. 

let's now build a CNN that replicates the model on the [CNN Explainer website](https://poloclub.github.io/cnn-explainer/).

![TinyVGG architecture, as setup by CNN explainer website](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/03-cnn-explainer-model.png)

To do so, we'll leverage the [`nn.Conv2d()`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) and [`nn.MaxPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) layers from `torch.nn`.

    ** what does CNN (and most of neural network) do -> compress the input into some representation of the data

In [None]:
#  create a CNN
class FashionMNISTModelV2(nn.Module) :
    """
    Model architecture that replicates the TinyVGG
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int) :
        super().__init__()
        
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape,                  #nn.Conv2d for data with 2d (image)
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        
        
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden_units,                  #nn.Conv2d for data with 2d (image)
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units,
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        
        
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(in_features=hidden_units *7 *7,          # *you need to calculate the last layer size 
                      out_features = output_shape)
        )
        
        
    def forward(self, x):
        x = self.conv1(x)
        #print(x.shape)
        x = self.conv2(x)
        #print(x.shape)
        x = self.classifier(x)
        #print(x.shape)
        return x
        
        
torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1, 
    hidden_units=10, 
    output_shape=len(class_names)).to(device)
model_2


In [None]:
torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1,
                              hidden_units=15,
                              output_shape=len(class_names))
model_2.to(device)

## 7.0 flatten and unsqueeze (you might read 7.1 and 7.2 first)

In [None]:
rand_image_tensor = torch.rand(size=(10,7,7))
flatten_layer = nn.Flatten()

print("rand_image_tensor shape                             :", rand_image_tensor.shape)
print("rand_image_tensor.unsqueeze(0) shape                :", rand_image_tensor.unsqueeze(0).shape)
print("flatten_layer_rand_image_tensor shape               :", flatten_layer(rand_image_tensor).shape)
print("flatten_layer(rand_image_tensor.unsqueeze(0)) shape :", flatten_layer(rand_image_tensor.unsqueeze(0)).shape)



from how does nn.Flatten() work -> https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html
we need to unsqueeze(0)

In [None]:
rand_image_tensor = torch.rand(size=(1,28,28))
model_2(rand_image_tensor.unsqueeze(0).to(device))

## 7.1 stepping through nn.Conv2d() and nn.ReLU()
https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

In [None]:
torch.manual_seed(42)

#create a batch of images
batch_images = torch.rand(size=(32,3,64,64))
test_image = batch_images[0]

print("Single image shape:", test_image.shape)
print("batch images shape:", batch_images.shape)
#print(f"Test image:\n {test_image}")

In [None]:
# craete a single conv2d

conv_layer = nn.Conv2d(in_channels=3,
                       out_channels=10,
                       kernel_size=(3,3),
                       stride=1,
                       padding=0)

conv_output_single_image = conv_layer(test_image)
print("conv output of a single image shape  :", conv_output_single_image.shape)
#print(conv_output)


#also works on batch of image
conv_output_batch_images = conv_layer(batch_images)
print("conv output of batch images shape    :", conv_output_batch_images.shape)



relu_layer = nn.ReLU()  # Define the ReLU layer

relu_output_single_image = relu_layer(conv_output_single_image)  # Apply ReLU using nn.ReLU
print("relu output of a single image shape  :", relu_output_single_image.shape)
# print(nn.ReLU(conv_output))  -> #The line print(nn.ReLU(conv_output)) does not work as intended because nn.ReLU is a class that needs to be instantiated before being applied to a tensor. Additionally, even if instantiated correctly, nn.ReLU does not directly operate on the tensor when passed as a parameter to its constructor. Instead, you need to create an instance of nn.ReLU and then call it as a function, passing the tensor as an argument.

relu_output_batch_images = relu_layer(conv_output_batch_images)
print("relu output of a batch images shape  :", relu_output_batch_images.shape)


## 7.2 stepping through MAXPOOL2D

In [None]:
test_image.shape

In [None]:
max_pool_layer = nn.MaxPool2d(kernel_size=2)
single_image_through_conv = conv_layer(test_image)
print("single_image_through_conv shape               :", single_image_through_conv.shape)

single_image_through_conv_and_max_pool = max_pool_layer(single_image_through_conv)
print("single_image_through_conv_and_max_pool shape  :", single_image_through_conv_and_max_pool.shape)


batch_images_through_conv = conv_layer(batch_images)
print("batch_images_through_conv shape               :", batch_images_through_conv.shape)

batch_images_through_conv_and_max_pool = max_pool_layer(batch_images_through_conv)
print("batch_images_through_conv_and_max_pool shape  :", batch_images_through_conv_and_max_pool.shape)

## 7.3 Setup a loss function and optimizer for model_2

In [None]:
from helper_functions import accuracy_fn
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_1.parameters(), 
                            lr=0.1)

## 7.4 Training and testing model_2 using our training and test functions

In [None]:
# Setup loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model_2.parameters(), 
                             lr=0.1)

*** มาอ่าน
1. optimizer = torch.optim.SGD(params=model_2.parameters(), 
                             lr=0.1)   ต้องใช้ model_2
2. .to(device) -> Device Mismatch: When you call dummy_x.to(device), it returns a new tensor that has been moved to the specified device, but it doesn’t change the original dummy_x tensor. You need to either reassign dummy_x to this new tensor or do the operation inline when you pass it to the model.

3. model.train(), model.eval()

In [None]:
# import tqdm for progress bar
from tqdm.auto import tqdm

#set the seed
torch.manual_seed(42)
train_time_start_on_cpu = timer()

#set the number of epochs
epoch = 10

# create training and testing loop
for epoch in tqdm(range(epoch)) :
    print("Epoch:", epoch)
    
    ### training
    train_loss = 0
    # loop through
    for batch, (X, y) in enumerate(train_dataloader) :
        model_2.train()
        
        #1. forward pass
        y_pred = model_2(X)
        
        #2. calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss
        
        #3. optimizer zero grad
        optimizer.zero_grad()
        
        #4. loss backward
        loss.backward()
        
        #5. optimizer step
        optimizer.step()
        
        
        #print out
        if batch % 400 == 0 :
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
            

    # divide total train loss by length of train dataloader
    train_loss /= len(train_dataloader)
    
    ### testing
    test_loss, test_acc = 0, 0
    model_2.eval()
    with torch.inference_mode() :
        for x_test, y_test in test_dataloader :
            #1. forward pass
            test_pred = model_2(x_test)
            
            #2. calculate loss
            test_loss += loss_fn(test_pred, y_test)
            
            #3. calculate accuracy
            test_acc += accuracy_fn(y_true=y_test, y_pred=test_pred.argmax(dim=1))
        
        #calculate the test loss
        test_loss /= len(test_dataloader)
        
        #calculate test acc
        test_acc /= len(test_dataloader)
    
    #print out
    print(f"\nTrain loss: {train_loss:.5f} | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\n")
        
# Calculate training time      
train_time_end_on_cpu = timer()
total_train_time_model_0 = print_train_time(start=train_time_start_on_cpu, 
                                           end=train_time_end_on_cpu,
                                           device=str(next(model_2.parameters()).device))