<a href="https://colab.research.google.com/github/kameda-yoshinari/IMISToolExeA/blob/main/700_PyPorchPractice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 7. PyTorch Practice of Image Classification by Computer Vision

## Preface  

**Let's actually build a new image classifier.**  

Before you proceed beyond this point, think for a while how you will go if you are actually asked it?  


The original of this lesson is taken from ["Transfer learning tutorial"](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) given by PyTorch tutorial.  
As you see here, you can jump start by finding the right tutorial.

Since learning from scratch is time-consuming, we use a method called Transfer Learning to shorten the learning time.  

(Note that we are going to make a image classifier that can classify the images into the classes defined here.)


## Preparation as Google Colab

All the files will be placed on your Google Drive.

In [None]:
!echo "Start mounting your Google Drive."
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!echo "Make a working folder and move to there."
%cd /content/drive/My\ Drive/
%mkdir -p IMIS_Tool-A/Work700
%cd       IMIS_Tool-A/Work700
!ls

In [None]:
!echo "For tips on running notebooks in Google Colab, see https://pytorch.org/tutorials/beginner/colab "
%matplotlib inline


# Transfer Learning for Computer Vision Tutorial
**Author**: [Sasank Chilamkurthy](https://chsasank.github.io)  
**Author(minor changes)**: KAMEDA Yoshinari

## Instroduction

In this tutorial, you will learn how to train a convolutional neural network for image classification using transfer learning.  
You can read more about the transfer learning at [cs231n notes](https://cs231n.github.io/transfer-learning/)_

Quoting these notes,

    In practice, very few people train an entire Convolutional Network
    from scratch (with random initialization), because it is relatively
    rare to have a dataset of sufficient size. Instead, it is common to
    pretrain a ConvNet on a very large dataset (e.g. ImageNet, which
    contains 1.2 million images with 1000 categories), and then use the
    ConvNet either as an initialization or a fixed feature extractor for
    the task of interest.

These two major transfer learning scenarios look as follows:

-  **Finetuning the ConvNet (model_ft)** : Instead of random initialization, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
-  **ConvNet as fixed feature extractor (model_conv)**: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.





## ImageNet

You should learn the imagenet and understand how big the size of the image dataset is to build a good classifier.

* ImageNet (official)  
   https://www.image-net.org/download.php  
* 微妙な日本語解説  
   https://cvml-expertguide.net/terms/dataset/image-dataset/imagenet/

## Preparation

Loading necessary libraries (as PyTorch).

In [None]:
# License: BSD
# Author: Sasank Chilamkurthy

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
from PIL import Image
from tempfile import TemporaryDirectory

cudnn.benchmark = True
plt.ion()   # interactive mode

## Load Data (from the Internet)

We will use torchvision and torch.utils.data packages for loading the data.

The problem we're going to solve today is to train a model to classify **ants** and **bees**. We have about 120 training images each for ants and bees.
There are 75 validation images for each class. Usually, this is a very small dataset to generalize upon, if trained from scratch.
Since we are using transfer learning, we should be able to generalize reasonably well.

This dataset is a very small subset of imagenet.


In [None]:
# Loading the data and unzip by shell command

%%shell
rm -rf data/
wget -P ./data https://download.pytorch.org/tutorial/hymenoptera_data.zip
unzip ./data/hymenoptera_data.zip -d ./data

Note that you do not need to run this process for the second time as the dataset is on your drive.  

You should check the data very well (this time one data file is actually corrupted).

## Dataset tuning for use here

This process is sometimes critical for the recognition performance.

* Augmentation and normalization for training
* Normalization for validation

(Why augmentation won't be applied to validation?)  
(What is the normalized image size?)  
(What happens if the image is larger or smaller? Not square?)  
(What will be done for the color balance?)  
(What is the magic number of [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]?)  
(What are the "augmentation"?)  

Check the actual commands to do these.

In [None]:
!ls ./data/hymenoptera_data/train
!ls ./data/hymenoptera_data/val

In [None]:
# Dataset definition for use

# Data augmentation and normalization for training
# Just normalization for validation

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
                  for x in ['train', 'val']}



## Data loader  

You should obtain the Google Colab runtime with GPU.  

(What is the batch_size?)  
(What is controlled by "shuffle=True"?)  
(What is num_workers?)  
(What is the role of "device"?)  
  
(What the warning is saying? ... See the lesson of 200)


In [None]:
# Data loader for train and val

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Check what you have done

You can see some variables by just typing the valiable name.

In [None]:
# Uncomment one of these
#print("abc")
#data_transforms
#data_transforms['train']
#data_transforms['train'].transforms
#data_transforms['val']
#data_dir
#image_datasets
#image_datasets['train']
#image_datasets['train'].classes
#image_datasets['train'].imgs
#image_datasets['train'].transforms
#image_datasets['val']
#dataloaders
class_names
#dataset_sizes

In [None]:
!ls -1 data/hymenoptera_data/train/ants | wc -l
!ls -1 data/hymenoptera_data/train/bees | wc -l
!ls -1 data/hymenoptera_data/val/ants | wc -l
!ls -1 data/hymenoptera_data/val/bees | wc -l

You should find something goes wrong here!  

(What is the "wrong" issue?)  
(Find the reason why it happens and discuss it will harm the future process or not)  

In [None]:
# Uncomment one these to see the image lists
# Or you should check the google drive to check images

!ls -1 data/hymenoptera_data/train/ants
#!ls -1 data/hymenoptera_data/train/bees
#!ls -1 data/hymenoptera_data/val/ants
#!ls -1 data/hymenoptera_data/val/bees


## Visualize a few images

Let's visualize a few training images so as to understand the data augmentations.  

(Note that the iter/next command pair is quite unique to python, and thanks to this pair, you can see new images everytime you run the second code cell.)  


In [None]:
def imshow(inp, title=None):
    """Display image for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


In [None]:
# Get a batch of training data

print(dataloaders['train'])
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
print(out.shape)

imshow(out, title=[class_names[x] for x in classes])

# 0...ants, 1...bees
print(classes)

In [None]:
# See the dimensions (shape) of images here.
# What do 4, 3, 224, and 224 mean?

inputs.shape

In [None]:
# See how the true labels are given

print(classes[0])
print(classes[1])
print(classes[2])
print(classes[3])

## Generic preparation

### On training the model

Now, let's write a **general** function to train a model. Here, we will illustrate:

-  Scheduling the learning rate
-  Saving the best model

In the following, parameter ``scheduler`` is an LR scheduler object from ``torch.optim.lr_scheduler``.



In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    # Create a temporary directory to save training checkpoints
    with TemporaryDirectory() as tempdir:
        best_model_params_path = os.path.join(tempdir, 'best_model_params.pt')

        torch.save(model.state_dict(), best_model_params_path)
        best_acc = 0.0

        for epoch in range(num_epochs):
            print(f'Epoch {epoch}/{num_epochs - 1}')
            print('-' * 10)

            # Each epoch has a training and validation phase
            for phase in ['train', 'val']:
                if phase == 'train':
                    model.train()  # Set model to training mode
                else:
                    model.eval()   # Set model to evaluate mode

                running_loss = 0.0
                running_corrects = 0

                # Iterate over data.
                for inputs, labels in dataloaders[phase]:
                    inputs = inputs.to(device)
                    labels = labels.to(device)

                    # zero the parameter gradients
                    optimizer.zero_grad()

                    # forward
                    # track history if only in train
                    with torch.set_grad_enabled(phase == 'train'):
                        outputs = model(inputs)
                        _, preds = torch.max(outputs, 1)
                        loss = criterion(outputs, labels)

                        # backward + optimize only if in training phase
                        if phase == 'train':
                            loss.backward()
                            optimizer.step()

                    # statistics
                    running_loss += loss.item() * inputs.size(0)
                    running_corrects += torch.sum(preds == labels.data)
                if phase == 'train':
                    scheduler.step()

                epoch_loss = running_loss / dataset_sizes[phase]
                epoch_acc = running_corrects.double() / dataset_sizes[phase]

                print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

                # deep copy the model
                if phase == 'val' and epoch_acc > best_acc:
                    best_acc = epoch_acc
                    torch.save(model.state_dict(), best_model_params_path)

            print()

        time_elapsed = time.time() - since
        print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
        print(f'Best val Acc: {best_acc:4f}')

        # load best model weights
        model.load_state_dict(torch.load(best_model_params_path))
    return model

### On visualizing the model predictions

Generic function to display predictions for a few images.  
Note that the images are taken from "val" dataset.  




In [None]:
def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                # 0 ant, 1 bee
                # print(preds[j])
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

## Finetuning the ConvNet (model_ft)



### A pretrained model

Load a pretrained model (from strage space to memory) and reset the final fully connected layer.

In [None]:
model_ft = models.resnet18(weights='IMAGENET1K_V1')
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to ``nn.Linear(num_ftrs, len(class_names))``.
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

### Train !!

It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.

FYI: At the test run;

* 25:03 on CPU (Intel(R) Xeon(R) CPU @ 2.20GHz x 2 processor)  
*  1:30 on GPU (Intel(R) Xeon(R) CPU @ 2.30GHz x 2 + Tesla-T4)


In [None]:
!cat /proc/cpuinfo

In [None]:
!nvidia-smi

In [None]:
# Warning: it takes time!

model_ft   = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=25)


### Check the result on "val" dataset

In [None]:
visualize_model(model_ft)

## ConvNet as fixed feature extractor (model_conv)



### learning at the final layer only

Here, we need to freeze all the network except the final layer.   
We need to set ``requires_grad = False`` to freeze the parameters so that the gradients are not computed in ``backward()``.

You can read more about this in the documentation [here](https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward)_.


In [None]:
model_conv = torchvision.models.resnet18(weights='IMAGENET1K_V1')
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

### Train !!

On CPU this will take about half the time compared to previous scenario.
This is expected as gradients don't need to be computed for most of the network. However, forward does need to be computed.

FYI: At the test run;

*  1:34 on GPU (Intel(R) Xeon(R) CPU @ 2.30GHz x 2 + Tesla-T4)



In [None]:
# Warning: it takes time!

model_conv = train_model(model_conv, criterion, optimizer_conv, exp_lr_scheduler, num_epochs=25)

### Check the results on "val" dataset

In [None]:
visualize_model(model_conv)

plt.ioff()
plt.show()

## Inference on custom images

Use the trained model to make predictions on custom images and visualize
the predicted class labels along with the images.




In [None]:
def visualize_model_predictions(model,img_path):
    was_training = model.training
    model.eval()

    img = Image.open(img_path)
    img = data_transforms['val'](img)
    img = img.unsqueeze(0)
    img = img.to(device)

    with torch.no_grad():
        outputs = model(img)
        _, preds = torch.max(outputs, 1)

        ax = plt.subplot(2,2,1)
        ax.axis('on')
        ax.set_title(f'Predicted: {class_names[preds[0]], preds[0]}')
        imshow(img.cpu().data[0])

        model.train(mode=was_training)

Let's see the results on val dataset. (Specify any of images.)

In [None]:
img_path='data/hymenoptera_data/val/bees/72100438_73de9f17af.jpg'
img_path='data/hymenoptera_data/val/ants/319494379_648fb5a1c6.jpg'
img_path='usb_2.jpg'


visualize_model_predictions(
    model_ft,
    img_path
)

visualize_model_predictions(
    model_conv,
    img_path
)

plt.ioff()
plt.show()

## Further Learning

If you would like to learn more about the applications of transfer learning,
checkout our [Quantized Transfer Learning for Computer Vision Tutorial](https://pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html).





---


# Building your own classifier with your images

Let's build your won classifier.  

Basic strategy is to keep the original structure as much as possible.

* Two classes (By replacing ants and bees)
* Images from a USB cameras

## Folders to cook

In [None]:
# Delete the unsed folders and add new folders (with the name of the new labels)

# !rm -rf ./data/hymenoptera_data/train/ants
# !mkdir  ./data/hymenoptera_data/train/face


## Tips to get images from camera capture

Taken from the help -> code snipet of google colab.

In [None]:
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

def take_photo(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

In [None]:
# Set the initial number
n=100

In [None]:
# comment this line out to avoid error
#from IPython.display import Image

while n < 1000:

    n = n+1

    try:
        # set the appropriate path to save images
        filename = take_photo('./usb_{}.jpg'.format(n))
        print('Saved to {}'.format(filename))

        # Show the image which was just taken.
        display(Image(filename))

    except Exception as err:
        # Errors will be thrown if the user does not have a webcam or if they do not
        # grant the page permission to access it.
        print(str(err))

## Further study

* More than two classes
* Realtime USB camera image recognition on the fly
* Investigation the factors that make influence on recognition performance
* Code cleaning (as this course's code is a mix of snipets ...)

---
Tools and Practices for Intelligent Interaction Systems A  
Master's and Docotal programs in intelligent and mechanical interaction systems, University of Tsukuba, Japan.  
KAMEDA Yoshinari, SHIBUYA Takeshi  

知能システムツール演習a  
知能機能システム学位プログラム (筑波大学大学院)  
担当：亀田能成，澁谷長史  

2023/08/07.  
