<a href="https://colab.research.google.com/github/komazawa-deep-learning/komazawa-deep-learning.github.io/blob/master/2021notebooks/2021_0925face_dataset_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# -*- coding: utf8 -*-

---
* License: BSD
* Author: Sasank Chilamkurthy
* 顔，非顔判別データセットを用いた顔認識モデルのデモ
---

- source: file:///Users/asakawa/study/2020pytorch_tutorials.git/beginner_source/transfer_learning_tutorial.py
- data: 2021_0924
- author: 浅川伸一
- memo: 駒澤で来週相貌の話をするかも知れない。
竹市先生が，今週 2021_0924 紡錘状回の話をしたから.


<!-- # Transfer Learning for Computer Vision Tutorial 
* Author: [Sasank Chilamkurthy](https://chsasank.github.io)
-->

このチュートリアルでは、転移学習を使って画像分類用の畳み込みニューラルネットワークを学習する方法を学びます。
転移学習については [cs231n notes](https://cs231n.github.io/transfer-learning/)で詳しく紹介されている。
<!-- In this tutorial, you will learn how to train a convolutional neural network for image classification using transfer learning. You can read more about the transfer learning at [cs231n notes](https://cs231n.github.io/transfer-learning/) -->

Quoting these notes,

> 実際には， 十分なサイズのデータセットを持つことは比較的まれであるため， 最初から (ランダムな初期化を行って) 畳み込みニューラルネットワーク全体を学習する人はほとんどいない。その代わり， 非常に大規模なデータセット (例えば 120 万枚の画像と 1000 カテゴリの ImageNet) で ConvNet を事前学習し， その ConvNet を初期化または目的の課題のための固定の特徴抽出器として使用するのが一般的である。
<!-- In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively     rare to have a dataset of sufficient size. 
    Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest.   -->

この 2 つの主要な転移学習シナリオは以下のようになる:

- **畳み込みニューラルネットワークの初期化**  詳細学習畳み込みニューラルネットワーク**：ランダムな初期化の代わりに imagenet  1000 データセットで学習されたような， 事前学習されたネットワークでネットワークを初期化する。
その後の学習は通常通りである。
- **固定特徴抽出器としての畳み込みニューラルネットワーク**。ここでは， 最後の完全連結層を除いて，すべてのネットワークの重みを固定する。
この最終完全連結層は， ランダムな重みを持つ新しい層に置き換えられ，この層だけが学習される。

<!-- These two major transfer learning scenarios look as follows:

- **Finetuning the convnet**: Instead of random initialization, we initialize the network with a pretrained network, like the one that is    trained on imagenet 1000 dataset. 
Rest of the training looks as usual.
- **ConvNet as fixed feature extractor**: Here, we will freeze the weights for all of the network except that of the final fully connected    layer. 
This last fully connected layer is replaced with a new one with random weights and only this layer is trained.
 -->


In [None]:
!wget https://komazawa-deep-learning.github.io/2021komazawa_faces.tgz -O 2021komazawa_faces.tgz
!tar xzf 2021komazawa_faces.tgz

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

plt.ion()   # interactive mode

In [None]:
# 訓練データに対しては，データ拡張と正規化を行い
# 検証データに対しては，正規化を行う
data_transforms = {
    'train': transforms.Compose([
        #transforms.RandomResizedCrop(224),  # 
        transforms.RandomHorizontalFlip(),  # ランダムに左右反転
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

#data_dir = 'data/hymenoptera_data'
#data_dir = 'trainingdata'
data_dir = 'data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

In [None]:
print(f'class_names:{class_names}')
print(f'dataset_sizes:{dataset_sizes}')

# 1. データの可視化

In [None]:
def imshow(inp, title=None):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.figure(figsize=(10,10))
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # pause a bit so that plots are updated


# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])
print(classes)

# 2. # モデルの訓練

モデルを訓練するための一般的な関数を書いてみましょう。

- 学習率のスケジューリング
- ベストモデルの保存

以下では パラメータ ``scheduler`` に ``torch.optim.lr_scheduler`` の LR スケジューラオブジェクトを指定しています。
<!-- Now, let's write a general function to train a model. 
Here, we will illustrate:

-  Scheduling the learning rate
-  Saving the best model

In the following, parameter ``scheduler`` is an LR scheduler object from ``torch.optim.lr_scheduler``. -->

In [None]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} 損失: {epoch_loss:.4f} 精度: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        #print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

# 2. モデル予測の視覚化

<!-- # Visualizing the model predictions --->
<!--
# Generic function to display predictions for a few images
-->

In [None]:
def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['val']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title('predicted: {}'.format(class_names[preds[j]]))
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)

# 3. 詳細チューニング

<!-- # Finetuning the convnet
# ----------------------
#
# Load a pretrained model and reset final fully connected layer.
-->


In [None]:
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
# 最終層のニューロン数を 2 に付け替えます。
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = nn.Linear(num_ftrs, 2)
model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=1, gamma=0.1)

In [None]:
######################################################################
# Train and evaluate
# ^^^^^^^^^^^^^^^^^^
#
# It should take around 15-25 min on CPU. On GPU though, it takes less than a
# minute.
#

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=3)

In [None]:
visualize_model(model_ft)

# 5. 固定特徴検出器としての畳み込みニューラルネットワーク

<!--
# ConvNet as fixed feature extractor
# ----------------------------------
-->

ここでは， 最終層を除くすべてのネットワークを凍結させる必要があります。
また  ``requires_grad == False``  を設定してパラメータを固定し，  ``backward()``  で勾配が計算されないようにする必要があります。

これについての詳細は，以下の文書をご覧ください:
<https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward>
<!-- Here, we need to freeze all the network except the final layer. 
We need to set ``requires_grad == False`` to freeze the parameters so that the gradients are not computed in ``backward()``.

You can read more about this in the documentation 
`here <https://pytorch.org/docs/notes/autograd.html#excluding-subgraphs-from-backward>`. -->

In [None]:
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

# 6. 訓練と評価
<!-- # Train and evaluate -->

CPU では 先ほどのシナリオと比べて約半分の時間で済みます。
これは， ネットワークのほとんどの部分で勾配を計算する必要がないことから予想されます。
しかし，前向き計算は実行する必要があります。
<!-- On CPU this will take about half the time compared to previous scenario.
This is expected as gradients don't need to be computed for most of the network. 
However, forward does need to be computed.
 -->

In [None]:
model_conv = train_model(model_conv, 
                         criterion, 
                         optimizer_conv,
                         exp_lr_scheduler, 
                         num_epochs=5)

In [None]:
visualize_model(model_conv)

plt.ioff()
plt.show()