# CNN for Image Classification

Halo, semuanya! 👋🏻

Selamat datang di Kuliah Praktisi "Jaringan Syaraf Tiruan" atau *Artificial Neural Networks*. Mari kita kenalan dulu!

* Saya **Syahrul Bahar Hamdani**, panggil aja **Dani**
* Matematika 2012, lulus 2016. Ambil ROK, pakai PSO untuk Penjadwalan Meeting di skripsi, dibimbing oleh Pak Herry dan Bu Auli 🙏🏻
* Ambil S2 Sains Komputasi ITB tahun 2017 dan lulus 2019. Ambil tesis berjudul "**_Predictive Maintenance_ Mesin Pesawat dengan Pendekatan _Machine Learning_**" ([bukti](https://digilib.itb.ac.id/index.php/gdl/view/35771)) yang dibimbing oleh Bu Nuning Nuraini
* Sekarang bekerja sebagai **Lead Data Scientist** di [KoinWorks](https://koinworks.com/)

Di notebook ini, kita akan menggunakan [PyTorch](https://pytorch.org/https://pytorch.org/) sebagai library utama.

## Agenda

Di notebook ini, kita akan bahas bagaimana membuat model deep learning CNN untuk klasifikasi gambar. Kita akan coba menggunakan _top-down_ approach, dimulai dari hasil akhir model, kemudian kita akan coba bedah komponen penyusunnya.

Agenda kita hari ini:
* CNN dengan PyTorch
* Kenapa menggunakan CNN?
* Bagaimana cara kerja CNN?

In [None]:
import pickle
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from PIL import Image

## Datasets

Kita akan menggunakan data CIFAR-10 yang dikumpulkan oleh Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Dataset CIFAR-10 terdiri dari total 60,000 gambar berukuran **32x32** piksel dalam **10 class**, dengan 6,000 gambar per class.

In [None]:
DATA_DIR = Path("data")

In [None]:
BATCH_SIZE = 4
transformers = torchvision.transforms.Compose(
    [
        # transform PIL to tensor
        torchvision.transforms.ToTensor(),
        # normalize image
        torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ]
)

# training set
train_dataset = torchvision.datasets.CIFAR10(
    DATA_DIR,
    train=True,
    download=True,
    transform=transformers
)
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
)

# test set
test_dataset = torchvision.datasets.CIFAR10(
    DATA_DIR, train=False, download=True,
    transform=transformers
)
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True
)

class_names = train_dataset.classes

Apa yang terjadi di sini?

Kita baru saja mengunduh dataset, jika belum ada, dan memuat data untuk bisa digunakan. Lalu, seperti apa sih dataset nya? Mari kita coba visualisasikan.

In [None]:
images, labels = next(iter(train_loader))

Perhatikan ukuran matriks yang diperoleh dari `train_loader` di bawah ini. Kenapa bisa punya 4 dimensi dengan ukuran `4x3x32x32`?

In [None]:
images.shape

Untuk memvisualisasikan gambar, kita gunakan function berikut.

In [None]:
def imshow(images, labels, grid_size=(2, 2)):
    "Function to visualize image."
    fig = plt.figure(figsize=(3, 3), tight_layout=True)
    for ax in range(1, BATCH_SIZE+1):
        np_img = images[ax-1].numpy()*.5 + .5
        fig.add_subplot(grid_size[0], grid_size[1], ax)
        plt.imshow(np.transpose(np_img, (1, 2, 0)))
        plt.title(class_names[labels[ax-1]])
        plt.xticks([])
        plt.yticks([])
    plt.show()
    plt.show()

Kita bisa jalankan cell di bawah setiap kali kita ingin memvisualisasikan sampel dari dataset.

In [None]:
for images, labels in train_loader:
    break
imshow(images, labels)

Jika kita tidak menggunakan PyTorch, kita bisa menggunakan fungsi berikut ini untuk memuat dataset.

<details>
<summary>Fungsi untuk load data</summary>

```python
def load_cifar(data_dir, is_train=True):
    train_list = [
        "data_batch_1",
        "data_batch_2",
        "data_batch_3",
        "data_batch_4",
        "data_batch_5",
    ]
    test_list = [
        "test_batch",
    ]
    meta = {
        "filename": "batches.meta",
        "key": "label_names",
    }

    if is_train:
        downloaded_list = train_list
    else:
        downloaded_list = test_lit

    data = []
    labels = []

    # load image data
    for file_name in downloaded_list:
        file_path = data_dir / file_name
        with open(file_path, "rb") as f:
            entry = pickle.load(f, encoding="latin1")
            data.append(entry["data"])
            if "labels" in entry:
                labels.extend(entry["labels"])
            else:
                labels.extend(entry["fine_labels"])

    data = np.vstack(data).reshape(-1, 3, 32, 32)
    data = data.transpose(0, 2, 3, 1)

    # load metadata
    meta_file_path = CIFAR_DIR / meta["filename"]
    with open(meta_file_path, "rb") as infile:
        metadata = pickle.load(infile, encoding="latin1")
        classes = metadata[meta["key"]]
    class_to_idx = {_class: i for i, _class in enumerate(classes)}

    return data, labels, classes, class_to_idx


CIFAR_DIR = DATA_DIR / "cifar-10-batches-py"
train_data, train_labels, train_classes, train_class2idx = load_cifar(CIFAR_DIR, is_train=True)
```
</details>

## Representasi Gambar Digital

Setiap gambar sebenarnya tersusun atas matriks piksel yang bernilai antara 0 sampai 255. Selain itu, umumnya sebuah gambar terdiri dari 3 unsur warna (*channel*), yaitu merah, hijau, dan biru (RGB). Tapi pada kasus khusus seperti gambar abu-abu (*grayscale*), hanya terdiri dari 1 *channel* saja.

In [None]:
unnormalized_img = ((images[0] / 2 + .5)*255).type(torch.uint8)
unnormalized_img

Perbedaan akan terlihat jelas untuk gambar dengan 1 channel saja, yaitu *grayscale*, seperti contoh di bawah ini.

<div align='center'>
<img src="https://sylabs.notion.site/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F6e7f1241-9620-4dab-a730-015c02a24aef%2Fpixel-of-5.png?table=block&id=03028777-803d-4186-9c2c-65bb82aeb26c&spaceId=685593da-9b2b-4a94-b296-d52808c79757&width=1120&userId=&cache=v2" width="40%"/>
</div>

Pada contoh gambar di atas, angka 0 merepresentasika warna hitam, 255 merepresentasikan warna putih, dan nilai piksel di antaranya merepresentasikan perubahan warna dari hitam ke putih (abu-abu).

## How Neural Nets Classify Images?

Bagaimana kamu bisa tahu bahwa gambar berikut adalah gambar kucing?

<div align='center'>
    <img src="https://sylabs.notion.site/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fbb386b65-c344-483c-b3cb-9640eea19bc7%2Fcat.jpg?table=block&id=82f2b8e6-ccd4-4df0-a5f8-d078341726be&spaceId=685593da-9b2b-4a94-b296-d52808c79757&width=2000&userId=&cache=v2" width=50%/>
</div>
    
Sama halnya dengan gambar, komputer “memperhatikan” **fitur-fitur abstrak** pada suatu objek dari nilai pikselnya. Lalu bagaimana komputer bisa dengan tepat melihat ini? Salah satu caranya adalah dengan menggunakan model deep learning, khususnya convolutional neural network.
    
Akan tetapi, neural networks dengan *fully-connected layer* akan sangat boros dalam hal jumlah bobot yang akan dilatih. Dengan data CIFAR-10 yang berukuran `32x32x3` piksel, kita akan punyai **3,072** unit pada input layer. 

<div align='center'>
    <img src="https://sylabs.notion.site/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fe903f7be-0e20-44d3-982a-3e189f893bce%2FScreen_Shot_2022-11-13_at_12.35.43.png?table=block&id=0f993f10-3594-490c-b897-2d1f3ebebd6d&spaceId=685593da-9b2b-4a94-b296-d52808c79757&width=2000&userId=&cache=v2" width=50%/>
</div>

Oleh karena itu, CNN dibuat untuk mengatasi masalah ini.

### CNN

Mari kita buat model CNN sederhana menggunakan PyTorch dengan arsitektur:

* **2 convolutional layer** yang masing-masing diikuti oleh **Max Pooling layer** dengan masing-masing ukuran kernel secara berturut-turut:
    * `6x5x5`
    * `16x5x5`
* **3 fully-connected layer** dengan jumlah unit pada masing-masing layer secara berturut-turut:
    * 120 unit
    * 84 unit
    * 10 unit (kenapa?)

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Apa yang kita lakukan di atas? 🤔

In [None]:
conv_layer = nn.Conv2d(3, 6, 5)
pool = nn.MaxPool2d(2, 2)
print(conv_layer)
print(pool)

print("original shape:", images[0].shape)
print("shape after convolution:", conv_layer(images[0]).shape)
print("shape after max-pooling:", pool(conv_layer(images[0])).shape)

### Train CNN

In [None]:
def train_model(model, optimizer, criterion, train_loader, valid_loader, num_epochs=2):
    valid_loss_min = np.Inf

    for epoch in range(1, num_epochs+1):  # loop over the dataset multiple times
        train_loss = 0
        valid_loss = 0

        # train model
        model.train()
        for batch, data in enumerate(train_loader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs)

            # calculate batch loss
            loss = criterion(outputs, labels)

            # do backprop
            loss.backward()

            # update weights
            optimizer.step()

            # update training loss
            train_loss += loss.item()
        
        # validate model
        model.eval()
        for batch, data in enumerate(valid_loader, 0):
            inputs, labels = data
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            valid_loss += loss.item()

        # calculate average loss
        train_loss = train_loss / len(train_loader)
        valid_loss = valid_loss / len(valid_loader)

        print(
            'Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
                epoch, train_loss, valid_loss
            )
        )
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.6f} --> {:.6f})...'.format(
                valid_loss_min, valid_loss
            ))
            valid_loss_min = valid_loss

    print('Finished Training & Validating')
    return model

In [None]:
torch.manual_seed(11)

# instantiate model
model = SimpleCNN()

# define loss function
loss = nn.CrossEntropyLoss()

# define optimization algorithm
optimizer = optim.SGD(model.parameters(), lr=1e-2)
model = train_model(model, optimizer, loss, train_loader, test_loader, num_epochs=10)

In [None]:
total_params = 0
for layer_name, layer_param in model.state_dict().items():
    if layer_param.requires_grad:
        print("Requires grad!")
    print(f"{layer_name}: {layer_param.numel()}")
    total_params += layer_param.numel()
print("Total trainable weights:", total_params)

### Performance Metrics

In [None]:
for images, labels in test_loader:
    break

imshow(images, labels)

In [None]:
outputs = model(images)
outputs

In [None]:
_, predictions = outputs.max(1)

imshow(images, predictions)

In [None]:
correct = 0
total = 0

# since we're not training, we don't need to calculate the gradients for our outputs
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs = model(images)
        # the class with the highest energy is what we choose as prediction
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the model on the test images: {100 * correct // total}%')

In [None]:
criterion = nn.CrossEntropyLoss()

test_loss = 0.0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

# set model mode to evaluation
model.eval()

for inputs, labels in test_loader:
    # forward pass: compute predicted outputs by passing inputs to the model
    outputs = model(inputs)

    # calculate the loss
    loss = criterion(outputs, labels)

    # update test loss 
    test_loss += loss.item()*inputs.size(0)

    # convert output probabilities to predicted class
    _, preds = torch.max(outputs, 1)

    # compare predictions to true label
    correct = np.squeeze(preds.eq(labels.view_as(preds)))

    # calculate test accuracy for each object class
    for i in range(BATCH_SIZE):
        label = labels.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1

# calculate and print avg test loss
test_loss = test_loss / len(test_loader.dataset)
print('Test Loss: {:.6f}\n'.format(test_loss))

for i in range(10):
    if class_total[i] > 0:
        print('Test Accuracy of %10s: %2d%% (%2d/%2d)' % (
            class_names[i],
            100 * class_correct[i] / class_total[i],
            np.sum(class_correct[i]),
            np.sum(class_total[i])
        ))
    else:
        print('Test Accuracy of %5s: N/A (no training examples)' % (classes[i]))

print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
    100. * np.sum(class_correct) / np.sum(class_total),
    np.sum(class_correct),
    np.sum(class_total)
))

## Model Usage

### Save Model

In [None]:
MODEL_DIR = Path("models")

if not MODEL_DIR.exists():
    MODEL_DIR.mkdir(parents=True, exist_ok=False)

In [None]:
torch.save(model, MODEL_DIR / "model.pth")

In [None]:
torch.save(model.state_dict(), MODEL_DIR / "model_state.pth")

### Load Model

In [None]:
model.state_dict()

In [None]:
new_model = torch.load(MODEL_DIR / "model.pth")

In [None]:
new_model.state_dict()

In [None]:
cnn = SimpleCNN()
cnn

In [None]:
cnn.state_dict()

In [None]:
trained_state_dict = torch.load(MODEL_DIR / "model_state.pth")
cnn.load_state_dict(trained_state_dict)

In [None]:
cnn.state_dict()

### Use Models

In [None]:
img = Image.open(DATA_DIR / "garuda.jpeg")

In [None]:
img

In [None]:
transformers(img).shape

In [None]:
new_transformers = torchvision.transforms.Compose(
    [
        # transform PIL to tensor
        torchvision.transforms.ToTensor(),
        # normalize image
        torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        torchvision.transforms.Resize((32, 32))
    ]
)

In [None]:
transformed_img = new_transformers(img)
unnormalized_img = ((transformed_img*.5 + .5)*255).type(torch.uint8)
plt.figure(figsize=(3, 3))
plt.imshow(np.transpose(unnormalized_img, (1, 2, 0)))
plt.show()

In [None]:
transformed_img.shape

In [None]:
torch.unsqueeze(transformed_img, 0).shape

In [None]:
new_model.eval()

output = new_model(torch.unsqueeze(transformed_img, 0))
_, prediction = output.max(1)

plt.figure(figsize=(3, 3))
plt.imshow(np.transpose(unnormalized_img, (1, 2, 0)))
plt.title(class_names[prediction])
plt.show()

## References

* [Learning Multiple Layers of Features from Tiny Images](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf), Alex Krizhevsky, 2009.
* https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
* https://pytorch.org/tutorials/beginner/basics/saveloadrun_tutorial.html