Homework 13 - Network Compression
===

> Author: Arvin Liu (r09922071@ntu.edu.tw), this colab is modified from ML2021-HW3

If you have any questions, feel free to ask: ntu-ml-2021spring-ta@googlegroups.com

## **Intro**

HW13 is about network compression

There are many types of Network/Model Compression,  here we introduce two:
* Knowledge Distillation
* Design Architecture


The process of this notebook is as follows: <br/>
1. Introduce depthwise, pointwise and group convolution in MobileNet.
2. Design the model of this colab
3. Introduce Knowledge-Distillation
4. Set up TeacherNet and it would be helpful in training


## **About the Dataset**  *(same as HW3)*

The dataset used here is food-11, a collection of food images in 11 classes.

For the requirement in the homework, TAs slightly modified the data.
Please DO NOT access the original fully-labeled training data or testing labels.

Also, the modified dataset is for this course only, and any further distribution or commercial use is forbidden.

In [1]:
!nvidia-smi

Tue Jun 29 06:22:28 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    26W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
#from google.colab import drive
#drive.mount('/content/gdrive')


#import os

# your workspace in your drive
#workspace = 'ML2021-hw13'


#try:
#  os.chdir(os.path.join('/content/gdrive/MyDrive/', workspace))
#except:
#  os.mkdir(os.path.join('/content/gdrive/MyDrive/', workspace))
#  os.chdir(os.path.join('/content/gdrive/MyDrive/', workspace))

#os.makedirs('/content/gdrive/MyDrive/ML2021-hw13/checkpoints', exist_ok=True)

In [3]:
### This block is same as HW3 ###
# Download the dataset
# You may choose where to download the data.

# Google Drive
# !gdown --id '1qdyNN0Ek4S5yi-pAqHes1yjj5cNkENCc' --output food-11.zip
# !gdown --id '1c0Q1EP6yIx0O2rqVMIVInIt8wFjLxmRh' --output food-11.zip
# !gdown --id '1hKO054nT1R8egcXY2-tgQbwX4EjowRLz' --output food-11.zip
# !gdown --id '1_7_uC1WUvX6H51gQaYmI4q3AezdQJhud' --output food-11.zip
# !gdown --id '12bz82Zpx0_7BDGXq4nRt7E_fMFmILoc9' --output food-11.zip
# !gdown --id '1oiqRKrDQXVBM5y63MeEaHxFmCIzNXx1Q' --output food-11.zip
#!gdown --id '1qaL43sl4qUMeCT1OVpk4aOFycnLL5ZJX' --output food-11.zip

# If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).

# Dropbox
#!wget https://www.dropbox.com/s/m9q6273jl3djall/food-11.zip -O food-11.zip

# MEGA
#!sudo apt install megatools
#!megadl "https://mega.nz/#!zt1TTIhK!ZuMbg5ZjGWzWX1I6nEUbfjMZgCmAgeqJlwDkqdIryfg"

# Unzip the dataset.
# This may take some time.
#!unzip -q food-11.zip

Downloading...
From: https://drive.google.com/uc?id=1qaL43sl4qUMeCT1OVpk4aOFycnLL5ZJX
To: /content/food-11.zip
963MB [00:08, 114MB/s] 
replace food-11/training/unlabeled/00/5176.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

## **Import Packages**  *(same as HW3)*

First, we need to import packages that will be used later.

In this homework, we highly rely on **torchvision**, a library of PyTorch.

In [4]:
### This block is same as HW3 ###
# Import necessary packages.
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch
import torchvision.transforms as transforms
import torchvision.models as models

from PIL import Image
# "ConcatDataset" and "Subset" are possibly useful when doing semi-supervised learning.
from torch.utils.data import ConcatDataset, DataLoader, Subset
from torchvision.datasets import DatasetFolder

# This is for the progress bar.
from tqdm.auto import tqdm

## **Dataset, Data Loader, and Transforms** *(similar to HW3)*

Torchvision provides lots of useful utilities for image preprocessing, data wrapping as well as data augmentation.

Here, since our data are stored in folders by class labels, we can directly apply **torchvision.datasets.DatasetFolder** for wrapping data without much effort.

Please refer to [PyTorch official website](https://pytorch.org/vision/stable/transforms.html) for details about different transforms.

---
**The only diffference with HW3 is that the transform functions are different.**

In [5]:
### This block is similar to HW3 ###
# It is important to do data augmentation in training.
# However, not every augmentation is useful.
# Please think about what kind of augmentation is helpful for food recognition.

train_tfm = transforms.Compose([
  # Resize the image into a fixed shape (height = width = 142)
	transforms.Resize((142, 142)),
  transforms.RandomHorizontalFlip(),
  transforms.RandomRotation(15),
	transforms.RandomCrop(128),
	transforms.ToTensor(),
])

# We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
    # Resize the image into a fixed shape (height = width = 142)
    transforms.Resize((142, 142)),
    transforms.CenterCrop(128),
    transforms.ToTensor(),
])


In [6]:
### This block is similar to HW3 ###
# Batch size for training, validation, and testing.
# A greater batch size usually gives a more stable gradient.
# But the GPU memory is limited, so please adjust it carefully.
batch_size = 64

# Construct datasets.
# The argument "loader" tells how torchvision reads the data.
train_set = DatasetFolder("food-11/training/labeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
valid_set = DatasetFolder("food-11/validation", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)
unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)
test_set = DatasetFolder("food-11/testing", loader=lambda x: Image.open(x), extensions="jpg", transform=test_tfm)

# Construct data loaders.
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_set, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False)

# **Architecture / Model Design**
The following are types of convolution layer design that has fewer parameters.

## **Depthwise & Pointwise Convolution**
![](https://i.imgur.com/FBgcA0s.png)
> Blue: the connection between layers \
> Green: the expansion of **receptive field** \
> (reference: arxiv:1810.04231)

(a) normal convolution layer: It is fully connected. The difference between fully connected layer and fully connected convolution layer is the operation. (multiply --> convolution)

(b) Depthwise convolution layer(DW): You can consider each feature map pass through their own filter and then pass through pointwise convolution layer(PW) to combine the information of all pixels in feature maps.


(c) Group convolution layer(GC): Group the feature maps. Each group passes their filter then concate together. If group_size = input_feature_size, then GC becomes DC (channels are independent). If group_size = 1, then GC becomes fully connected.

<img src="https://i.imgur.com/Hqhg0Q9.png" width="500px">


## **Implementation details**
```python
# Regular Convolution, # of params = in_chs * out_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding)

# Group Convolution, "groups" controls the connections between inputs and
# outputs. in_chs and out_chs must both be divisible by groups.
nn.Conv2d(in_chs, out_chs, kernel_size, stride, padding, groups=groups)

# Depthwise Convolution, out_chs=in_chs=groups, # of params = in_chs * kernel_size^2
nn.Conv2d(in_chs, out_chs=in_chs, kernel_size, stride, padding, groups=in_chs)

# Pointwise Convolution, a.k.a 1 by 1 convolution, # of params = in_chs * out_chs
nn.Conv2d(in_chs, out_chs, 1)

# Merge Depthwise and Pointwise Convolution (without )
def dwpw_conv(in_chs, out_chs, kernel_size, stride, padding):
    return nn.Sequential(
        nn.Conv2d(in_chs, in_chs, kernels, stride, padding, groups=in_chs),
        nn.Conv2d(in_chs, out_chs, 1),
    )
```

## **Model**

The basic model here is simply a stack of convolutional layers followed by some fully-connected layers. You can take advatage of depthwise & pointwise convolution to make your model deeper, but still follow the size constraint.

In [7]:
def dwpw_conv(in_chs, out_chs, kernel_size, stride, padding):
    return nn.Sequential(
        nn.Conv2d(in_chs, in_chs, kernel_size, stride, padding, groups=in_chs),
        nn.BatchNorm2d(in_chs),
        nn.ReLU(),
        nn.Conv2d(in_chs, out_chs, 1),
        nn.MaxPool2d(2),
    )

class StudentNet(nn.Module):
    def __init__(self):
      super(StudentNet, self).__init__()

      # ---------- TODO ----------
      # Modify your model architecture

      self.cnn =  nn.Sequential(
            nn.Sequential(
                nn.Conv2d(3, 64, 3, 1, 0),
                nn.BatchNorm2d(64),
                nn.ReLU(),
                nn.MaxPool2d(2),
            ),
            dwpw_conv(64, 128, 3, 1, 0),
            dwpw_conv(128, 256, 3, 1, 0),
           
            nn.Sequential(
                nn.Conv2d(256, 256, 3, 1, 0, groups=256),
                nn.BatchNorm2d(256),
                nn.ReLU(),
                nn.Conv2d(256, 150, 1),
            ),
            # Here we adopt Global Average Pooling for various input size.
            nn.AdaptiveAvgPool2d((1, 1)),
      )
      self.fc = nn.Sequential(
        nn.Linear(150, 64),
        nn.ReLU(),
        nn.Linear(64, 11),
      )
      
    def forward(self, x):
      out = self.cnn(x)
      out = out.view(out.size()[0], -1)
      return self.fc(out)

## **Model Analysis**

Use `torchsummary` to get your model architecture (screenshot or pasting text are allowed.) and numbers of 
parameters, these two information should be submit to your NTU Cool questions.

Note that the number of parameters **should not greater than 100,000**, or you'll get penalty in this homework.


In [8]:
from torchsummary import summary

student_net = StudentNet()
summary(student_net, (3, 128, 128), device="cpu")

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 126, 126]           1,792
       BatchNorm2d-2         [-1, 64, 126, 126]             128
              ReLU-3         [-1, 64, 126, 126]               0
         MaxPool2d-4           [-1, 64, 63, 63]               0
            Conv2d-5           [-1, 64, 61, 61]             640
       BatchNorm2d-6           [-1, 64, 61, 61]             128
              ReLU-7           [-1, 64, 61, 61]               0
            Conv2d-8          [-1, 128, 61, 61]           8,320
         MaxPool2d-9          [-1, 128, 30, 30]               0
           Conv2d-10          [-1, 128, 28, 28]           1,280
      BatchNorm2d-11          [-1, 128, 28, 28]             256
             ReLU-12          [-1, 128, 28, 28]               0
           Conv2d-13          [-1, 256, 28, 28]          33,024
        MaxPool2d-14          [-1, 256,

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


## **Knowledge Distillation**

<img src="https://i.imgur.com/H2aF7Rv.png=100x" width="500px">

Since we have a learned big model, let it teach the other small model. In implementation, let the training target be the prediction of big model instead of the ground truth.

## **Why it works?**
* If the data is not clean, then the prediction of big model could ignore the noise of the data with wrong labeled.
* The labels might have some relations. Number 8 is more similar to 6, 9, 0 than 1, 7, for example.


## **How to implement?**
* $Loss = \alpha T^2 \times KL(\frac{\text{Teacher's Logits}}{T} || \frac{\text{Student's Logits}}{T}) + (1-\alpha)(\text{Original Loss})$
* Note that the logits here should have passed softmax.

In [9]:
def loss_fn_kd(outputs, labels, teacher_outputs, T=20, alpha=0.5):
    hard_loss = F.cross_entropy(outputs, labels) * (1. - alpha) 
    # ---------- TODO ----------
    # Complete soft loss in knowledge distillation
    # Reference from last year's hw7, as suggested from TA
    soft_loss = nn.KLDivLoss(reduction='batchmean')(F.log_softmax(outputs/T, dim=1), F.softmax(teacher_outputs/T, dim=1)) * (alpha * T * T) 
    return hard_loss + soft_loss

## **Teacher Model Setting**
We provide a well-trained teacher model to help you knowledge distillation to student model.
Note that if you want to change the transform function, you should consider  if suitable for this well-trained teacher model.
* If you cannot successfully gdown, you can change a link. (Backup link is provided at the bottom of this colab tutorial).


In [10]:
# Download teacherNet
!gdown --id '1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m' --output teacher_net.ckpt
# Load teacherNet
teacher_net = torch.load('./teacher_net.ckpt')
teacher_net.eval()

Downloading...
From: https://drive.google.com/uc?id=1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m
To: /content/teacher_net.ckpt
44.8MB [00:00, 79.2MB/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

## **Generate Pseudo Labels in Unlabeled Data**

Since we have a well-trained model, we can use this model to predict pseudo-labels and help the student network train well. Note that you 
**CANNOT** use well-trained model to pseudo-label the test data. 


---

**AGAIN, DO NOT USE TEST DATA FOR PURPOSE OTHER THAN INFERENCING**

* Because If you use teacher network to predict pseudo-labels of the test data, you can only use student network to overfit these pseudo-labels without train/unlabeled data. In this way, your kaggle accuracy will be as high as the teacher network, but the fact is that you just overfit the test data and your true testing accuracy is very low. 
* These contradict the purpose of these assignment (network compression); therefore, you should not misuse the test data.
* If you have any concerns, you can email us.


In [11]:
# "cuda" only when GPUs are available.
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize a model, and put it on the device specified.
student_net = student_net.to(device)
teacher_net = teacher_net.to(device)

# Whether to do pseudo label.
do_semi = True

def get_pseudo_labels(dataset, model):
    loader = DataLoader(dataset, batch_size=batch_size*3, shuffle=False, pin_memory=True)
    pseudo_labels = []
    for batch in tqdm(loader):
        # A batch consists of image data and corresponding labels.
        img, _ = batch

        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))
            pseudo_labels.append(logits.argmax(dim=-1).detach().cpu())
        # Obtain the probability distributions by applying softmax on logits.
    pseudo_labels = torch.cat(pseudo_labels)
    # Update the labels by replacing with pseudo labels.
    for idx, ((img, _), pseudo_label) in enumerate(zip(dataset.samples, pseudo_labels)):
        dataset.samples[idx] = (img, pseudo_label.item())
    return dataset

if do_semi:
    # Generate new trainloader with unlabeled set.
    unlabeled_set = get_pseudo_labels(unlabeled_set, teacher_net)
    concat_dataset = ConcatDataset([train_set, unlabeled_set])
    train_loader = DataLoader(concat_dataset, batch_size=batch_size, shuffle=True, pin_memory=True, drop_last=True)

HBox(children=(FloatProgress(value=0.0, max=36.0), HTML(value='')))




## **Training** *(similar to HW3)*

You can finish supervised learning by simply running the provided code without any modification.

The function "get_pseudo_labels" is used for semi-supervised learning.
It is expected to get better performance if you use unlabeled data for semi-supervised learning.
However, you have to implement the function on your own and need to adjust several hyperparameters manually.

For more details about semi-supervised learning, please refer to [Prof. Lee's slides](https://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/semi%20(v3).pdf).

Again, please notice that utilizing external data (or pre-trained model) for training is **prohibited**.

---
**The only diffference with HW3 is that you should use loss in  knowledge distillation.**




In [12]:
# For the classification task, we use cross-entropy as the measurement of performance.
criterion = nn.CrossEntropyLoss()

# Initialize optimizer, you may fine-tune some hyperparameters such as learning rate on your own.
optimizer = torch.optim.Adam(student_net.parameters(), lr=0.001, weight_decay=1e-4)

# The number of training epochs.
n_epochs = 100
best_valid_acc = 0

for epoch in range(n_epochs):
    # ---------- Training ----------
    # Make sure the model is in train mode before training.
    student_net.train()

    # These are used to record information in training.
    train_loss = []
    train_accs = []

    # Iterate the training set by batches.
    for batch in tqdm(train_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # Forward the data. (Make sure data and model are on the same device.)
        logits = student_net(imgs.to(device))
        # Teacher net will not be updated. And we use torch.no_grad
        # to tell torch do not retain the intermediate values
        # (which are for backpropgation) and save the memory.
        with torch.no_grad():
          soft_labels = teacher_net(imgs.to(device))
        
        # Calculate the loss in knowledge distillation method.
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Gradients stored in the parameters in the previous step should be cleared out first.
        optimizer.zero_grad()

        # Compute the gradients for parameters.
        loss.backward()

        # Clip the gradient norms for stable training.
        grad_norm = nn.utils.clip_grad_norm_(student_net.parameters(), max_norm=10)

        # Update the parameters with computed gradients.
        optimizer.step()

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().mean()

        # Record the loss and accuracy.
        train_loss.append(loss.item())
        train_accs.append(acc)

    # The average loss and accuracy of the training set is the average of the recorded values.
    train_loss = sum(train_loss) / len(train_loss)
    train_acc = sum(train_accs) / len(train_accs)

    # Print the information.
    print(f"[ Train | {epoch + 1:03d}/{n_epochs:03d} ] loss = {train_loss:.5f}, acc = {train_acc:.5f}")


    # ---------- Validation ----------
    # Make sure the model is in eval mode so that some modules like dropout are disabled and work normally.
    student_net.eval()

    # These are used to record information in validation.
    valid_loss = []
    valid_accs = []

    # Iterate the validation set by batches.
    for batch in tqdm(valid_loader):

        # A batch consists of image data and corresponding labels.
        imgs, labels = batch

        # We don't need gradient in validation.
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
          logits = student_net(imgs.to(device))
          soft_labels = teacher_net(imgs.to(device))
        # We can still compute the loss (but not the gradient).
        loss = loss_fn_kd(logits, labels.to(device), soft_labels)

        # Compute the accuracy for current batch.
        acc = (logits.argmax(dim=-1) == labels.to(device)).float().detach().cpu().view(-1).numpy()

        # Record the loss and accuracy.
        valid_loss.append(loss.item())
        valid_accs += list(acc)

    # The average loss and accuracy for entire validation set is the average of the recorded values.
    valid_loss = sum(valid_loss) / len(valid_loss)
    valid_acc = sum(valid_accs) / len(valid_accs)

    # Print the information.
    print(f"[ Valid | {epoch + 1:03d}/{n_epochs:03d} ] loss = {valid_loss:.5f}, acc = {valid_acc:.5f}")
    if valid_acc > best_valid_acc:
        print('Validation accuracy improve from %.5f to %.5f at epoch %d'%(best_valid_acc, valid_acc, epoch+1))
        best_valid_acc = valid_acc
        torch.save(student_net.state_dict(), './student_net_best.ckpt')

HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 001/100 ] loss = 16.51040, acc = 0.30570


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 001/100 ] loss = 21.71022, acc = 0.30758
Validation accuracy improve from 0.00000 to 0.30758 at epoch 1


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 002/100 ] loss = 14.08812, acc = 0.43537


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 002/100 ] loss = 19.84700, acc = 0.38333
Validation accuracy improve from 0.30758 to 0.38333 at epoch 2


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 003/100 ] loss = 12.81396, acc = 0.49817


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 003/100 ] loss = 18.55648, acc = 0.46061
Validation accuracy improve from 0.38333 to 0.46061 at epoch 3


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 004/100 ] loss = 11.93420, acc = 0.52810


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 004/100 ] loss = 17.80410, acc = 0.45152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 005/100 ] loss = 11.32260, acc = 0.56118


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 005/100 ] loss = 18.19297, acc = 0.45606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 006/100 ] loss = 10.87045, acc = 0.57711


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 006/100 ] loss = 16.69445, acc = 0.52576
Validation accuracy improve from 0.46061 to 0.52576 at epoch 6


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 007/100 ] loss = 10.37777, acc = 0.60055


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 007/100 ] loss = 17.11928, acc = 0.49848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 008/100 ] loss = 10.01241, acc = 0.61719


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 008/100 ] loss = 15.49512, acc = 0.57879
Validation accuracy improve from 0.52576 to 0.57879 at epoch 8


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 009/100 ] loss = 9.57598, acc = 0.63809


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 009/100 ] loss = 14.41175, acc = 0.59848
Validation accuracy improve from 0.57879 to 0.59848 at epoch 9


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 010/100 ] loss = 9.40475, acc = 0.64692


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 010/100 ] loss = 14.41014, acc = 0.59848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 011/100 ] loss = 9.10836, acc = 0.66011


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 011/100 ] loss = 14.37741, acc = 0.59848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 012/100 ] loss = 8.89876, acc = 0.66670


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 012/100 ] loss = 15.42904, acc = 0.55303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 013/100 ] loss = 8.77226, acc = 0.66954


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 013/100 ] loss = 15.38512, acc = 0.55606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 014/100 ] loss = 8.55073, acc = 0.68476


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 014/100 ] loss = 14.81720, acc = 0.55152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 015/100 ] loss = 8.37397, acc = 0.68202


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 015/100 ] loss = 13.32174, acc = 0.64697
Validation accuracy improve from 0.59848 to 0.64697 at epoch 15


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 016/100 ] loss = 8.21232, acc = 0.68983


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 016/100 ] loss = 13.98996, acc = 0.61212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 017/100 ] loss = 8.15472, acc = 0.69531


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 017/100 ] loss = 13.77832, acc = 0.61818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 018/100 ] loss = 7.86903, acc = 0.70201


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 018/100 ] loss = 13.30925, acc = 0.63788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 019/100 ] loss = 7.80421, acc = 0.71073


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 019/100 ] loss = 13.65086, acc = 0.63182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 020/100 ] loss = 7.67392, acc = 0.71449


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 020/100 ] loss = 14.16575, acc = 0.61364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 021/100 ] loss = 7.68762, acc = 0.71337


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 021/100 ] loss = 12.83877, acc = 0.60909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 022/100 ] loss = 7.44034, acc = 0.72362


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 022/100 ] loss = 13.14419, acc = 0.61970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 023/100 ] loss = 7.41759, acc = 0.72443


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 023/100 ] loss = 13.16685, acc = 0.63788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 024/100 ] loss = 7.31886, acc = 0.73204


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 024/100 ] loss = 13.02099, acc = 0.63939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 025/100 ] loss = 7.23110, acc = 0.73640


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 025/100 ] loss = 12.67849, acc = 0.65909
Validation accuracy improve from 0.64697 to 0.65909 at epoch 25


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 026/100 ] loss = 7.07415, acc = 0.73722


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 026/100 ] loss = 11.81242, acc = 0.68182
Validation accuracy improve from 0.65909 to 0.68182 at epoch 26


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 027/100 ] loss = 7.02258, acc = 0.74107


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 027/100 ] loss = 12.02225, acc = 0.67576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 028/100 ] loss = 6.91355, acc = 0.74503


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 028/100 ] loss = 14.97043, acc = 0.59545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 029/100 ] loss = 6.91083, acc = 0.74858


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 029/100 ] loss = 13.76401, acc = 0.61970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 030/100 ] loss = 6.88408, acc = 0.74868


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 030/100 ] loss = 13.87575, acc = 0.61818


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 031/100 ] loss = 6.70494, acc = 0.75538


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 031/100 ] loss = 13.80333, acc = 0.60303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 032/100 ] loss = 6.66964, acc = 0.76157


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 032/100 ] loss = 14.67462, acc = 0.56667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 033/100 ] loss = 6.61600, acc = 0.76147


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 033/100 ] loss = 12.86426, acc = 0.65606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 034/100 ] loss = 6.53751, acc = 0.76045


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 034/100 ] loss = 11.67053, acc = 0.68030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 035/100 ] loss = 6.52540, acc = 0.76847


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 035/100 ] loss = 13.24499, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 036/100 ] loss = 6.44966, acc = 0.76887


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 036/100 ] loss = 11.34032, acc = 0.68030


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 037/100 ] loss = 6.36206, acc = 0.77445


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 037/100 ] loss = 11.50668, acc = 0.71212
Validation accuracy improve from 0.68182 to 0.71212 at epoch 37


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 038/100 ] loss = 6.33979, acc = 0.77121


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 038/100 ] loss = 12.13734, acc = 0.65909


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 039/100 ] loss = 6.36596, acc = 0.77466


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 039/100 ] loss = 10.99648, acc = 0.69394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 040/100 ] loss = 6.20080, acc = 0.77831


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 040/100 ] loss = 11.71375, acc = 0.68333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 041/100 ] loss = 6.24038, acc = 0.77881


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 041/100 ] loss = 12.78312, acc = 0.64545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 042/100 ] loss = 6.07140, acc = 0.78287


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 042/100 ] loss = 12.87270, acc = 0.63939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 043/100 ] loss = 6.04975, acc = 0.78500


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 043/100 ] loss = 11.76929, acc = 0.65152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 044/100 ] loss = 6.03444, acc = 0.78531


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 044/100 ] loss = 11.59460, acc = 0.66667


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 045/100 ] loss = 5.93470, acc = 0.79414


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 045/100 ] loss = 12.13996, acc = 0.67273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 046/100 ] loss = 5.95460, acc = 0.78774


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 046/100 ] loss = 11.47066, acc = 0.69697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 047/100 ] loss = 5.94354, acc = 0.78977


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 047/100 ] loss = 11.18528, acc = 0.72727
Validation accuracy improve from 0.71212 to 0.72727 at epoch 47


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 048/100 ] loss = 5.80104, acc = 0.79616


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 048/100 ] loss = 12.09116, acc = 0.66364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 049/100 ] loss = 5.77885, acc = 0.79647


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 049/100 ] loss = 10.64614, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 050/100 ] loss = 5.67847, acc = 0.80306


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 050/100 ] loss = 10.53787, acc = 0.71515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 051/100 ] loss = 5.66570, acc = 0.80489


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 051/100 ] loss = 11.75496, acc = 0.66212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 052/100 ] loss = 5.60709, acc = 0.80519


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 052/100 ] loss = 11.12235, acc = 0.68788


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 053/100 ] loss = 5.72816, acc = 0.80438


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 053/100 ] loss = 11.89360, acc = 0.65152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 054/100 ] loss = 5.62161, acc = 0.80651


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 054/100 ] loss = 12.38700, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 055/100 ] loss = 5.53050, acc = 0.80743


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 055/100 ] loss = 10.79861, acc = 0.68939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 056/100 ] loss = 5.48783, acc = 0.81128


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 056/100 ] loss = 10.38682, acc = 0.72121


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 057/100 ] loss = 5.47180, acc = 0.81047


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 057/100 ] loss = 10.78655, acc = 0.71364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 058/100 ] loss = 5.36211, acc = 0.81575


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 058/100 ] loss = 10.71666, acc = 0.69545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 059/100 ] loss = 5.43785, acc = 0.81615


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 059/100 ] loss = 10.63000, acc = 0.70606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 060/100 ] loss = 5.41136, acc = 0.81473


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 060/100 ] loss = 11.53435, acc = 0.68333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 061/100 ] loss = 5.39300, acc = 0.81666


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 061/100 ] loss = 11.52030, acc = 0.63485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 062/100 ] loss = 5.34296, acc = 0.81646


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 062/100 ] loss = 11.44986, acc = 0.66515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 063/100 ] loss = 5.26625, acc = 0.82325


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 063/100 ] loss = 10.14248, acc = 0.73485
Validation accuracy improve from 0.72727 to 0.73485 at epoch 63


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 064/100 ] loss = 5.21427, acc = 0.82518


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 064/100 ] loss = 11.16812, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 065/100 ] loss = 5.27998, acc = 0.82173


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 065/100 ] loss = 10.07425, acc = 0.71212


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 066/100 ] loss = 5.18534, acc = 0.82549


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 066/100 ] loss = 9.98901, acc = 0.73485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 067/100 ] loss = 5.13582, acc = 0.82843


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 067/100 ] loss = 12.47201, acc = 0.63939


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 068/100 ] loss = 5.17111, acc = 0.83015


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 068/100 ] loss = 9.73876, acc = 0.74091
Validation accuracy improve from 0.73485 to 0.74091 at epoch 68


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 069/100 ] loss = 5.03885, acc = 0.83391


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 069/100 ] loss = 10.98396, acc = 0.66970


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 070/100 ] loss = 4.96504, acc = 0.84091


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 070/100 ] loss = 10.88610, acc = 0.70758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 071/100 ] loss = 5.05230, acc = 0.83005


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 071/100 ] loss = 10.02645, acc = 0.74697
Validation accuracy improve from 0.74091 to 0.74697 at epoch 71


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 072/100 ] loss = 4.98124, acc = 0.83310


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 072/100 ] loss = 11.03803, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 073/100 ] loss = 4.97445, acc = 0.83442


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 073/100 ] loss = 11.55823, acc = 0.64697


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 074/100 ] loss = 4.89834, acc = 0.84010


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 074/100 ] loss = 11.58262, acc = 0.65606


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 075/100 ] loss = 4.92195, acc = 0.83847


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 075/100 ] loss = 10.56584, acc = 0.70303


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 076/100 ] loss = 4.93631, acc = 0.83827


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 076/100 ] loss = 10.64046, acc = 0.69848


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 077/100 ] loss = 4.89592, acc = 0.84213


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 077/100 ] loss = 11.21321, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 078/100 ] loss = 4.85859, acc = 0.83979


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 078/100 ] loss = 10.46632, acc = 0.70152


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 079/100 ] loss = 4.78982, acc = 0.84791


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 079/100 ] loss = 9.88113, acc = 0.72576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 080/100 ] loss = 4.74666, acc = 0.84395


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 080/100 ] loss = 11.23703, acc = 0.69394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 081/100 ] loss = 4.80998, acc = 0.84943


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 081/100 ] loss = 9.88233, acc = 0.73485


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 082/100 ] loss = 4.80262, acc = 0.85014


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 082/100 ] loss = 10.40276, acc = 0.70455


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 083/100 ] loss = 4.70982, acc = 0.84740


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 083/100 ] loss = 10.63998, acc = 0.69545


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 084/100 ] loss = 4.74104, acc = 0.84862


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 084/100 ] loss = 12.25920, acc = 0.66364


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 085/100 ] loss = 4.66158, acc = 0.84730


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 085/100 ] loss = 11.17517, acc = 0.67576


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 086/100 ] loss = 4.70745, acc = 0.84903


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 086/100 ] loss = 10.13957, acc = 0.72879


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 087/100 ] loss = 4.56025, acc = 0.85704


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 087/100 ] loss = 11.61453, acc = 0.66061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 088/100 ] loss = 4.69555, acc = 0.84984


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 088/100 ] loss = 11.30451, acc = 0.67727


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 089/100 ] loss = 4.65016, acc = 0.85217


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 089/100 ] loss = 10.45978, acc = 0.70758


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 090/100 ] loss = 4.56040, acc = 0.86059


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 090/100 ] loss = 10.95646, acc = 0.68182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 091/100 ] loss = 4.55623, acc = 0.85694


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 091/100 ] loss = 10.45857, acc = 0.69394


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 092/100 ] loss = 4.57665, acc = 0.86140


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 092/100 ] loss = 11.57801, acc = 0.67424


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 093/100 ] loss = 4.51355, acc = 0.86080


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 093/100 ] loss = 11.13937, acc = 0.68636


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 094/100 ] loss = 4.51625, acc = 0.85785


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 094/100 ] loss = 10.54026, acc = 0.71515


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 095/100 ] loss = 4.56687, acc = 0.85948


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 095/100 ] loss = 10.04887, acc = 0.73333


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 096/100 ] loss = 4.48855, acc = 0.86191


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 096/100 ] loss = 10.59498, acc = 0.68182


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 097/100 ] loss = 4.40393, acc = 0.86353


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 097/100 ] loss = 10.36945, acc = 0.72273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 098/100 ] loss = 4.48965, acc = 0.86232


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 098/100 ] loss = 10.21368, acc = 0.72273


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 099/100 ] loss = 4.46189, acc = 0.85897


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 099/100 ] loss = 11.06420, acc = 0.66061


HBox(children=(FloatProgress(value=0.0, max=154.0), HTML(value='')))


[ Train | 100/100 ] loss = 4.43767, acc = 0.86029


HBox(children=(FloatProgress(value=0.0, max=11.0), HTML(value='')))


[ Valid | 100/100 ] loss = 9.69810, acc = 0.73333


## **Testing** *(same as HW3)*

For inference, we need to make sure the model is in eval mode, and the order of the dataset should not be shuffled ("shuffle=False" in test_loader).

Last but not least, don't forget to save the predictions into a single CSV file.
The format of CSV file should follow the rules mentioned in the slides.

### **WARNING -- Keep in Mind**

Cheating includes but not limited to:
1.   using testing labels,
2.   submitting results to previous Kaggle competitions,
3.   sharing predictions with others,
4.   copying codes from any creatures on Earth,
5.   asking other people to do it for you.

Any violations bring you punishments from getting a discount on the final grade to failing the course.

It is your responsibility to check whether your code violates the rules.
When citing codes from the Internet, you should know what these codes exactly do.
You will **NOT** be tolerated if you break the rule and claim you don't know what these codes do.


In [13]:
### This block is same as HW3 ###
# Make sure the model is in eval mode.
# Some modules like Dropout or BatchNorm affect if the model is in training mode.
student_net = student_net.to(device)
student_net.load_state_dict(torch.load('./student_net_best.ckpt'))
student_net.eval()

# Initialize a list to store the predictions.
predictions = []

# Iterate the testing set by batches.
for batch in tqdm(test_loader):
    # A batch consists of image data and corresponding labels.
    # But here the variable "labels" is useless since we do not have the ground-truth.
    # If printing out the labels, you will find that it is always 0.
    # This is because the wrapper (DatasetFolder) returns images and labels for each batch,
    # so we have to create fake labels to make it work normally.
    imgs, labels = batch

    # We don't need gradient in testing, and we don't even have labels to compute loss.
    # Using torch.no_grad() accelerates the forward process.
    with torch.no_grad():
        logits = student_net(imgs.to(device))

    # Take the class with greatest logit as prediction and record it.
    predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())

HBox(children=(FloatProgress(value=0.0, max=53.0), HTML(value='')))




In [14]:
### This block is same as HW3 ###
# Save predictions into the file.
with open("predict.csv", "w") as f:

    # The first row must be "Id, Category"
    f.write("Id,Category\n")

    # For the rest of the rows, each image id corresponds to a predicted class.
    for i, pred in  enumerate(predictions):
         f.write(f"{i},{pred}\n")

In [15]:
from google.colab import files

files.download('predict.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## **Statistics**

|Baseline|Accuracy|Training Time|
|-|-|-|
|Simple Baseline |0.59856|2 Hours|
|Medium Baseline |0.65412|2 Hours|
|Strong Baseline |0.72819|4 Hours|
|Boss Baseline |0.81003|Unmeasueable|

## **Learning Curve**

![img](https://lh5.googleusercontent.com/amMLGa7dkqvXGmsJlrVN49VfSjClk5d-n7nCi_Y3ROK4himsBSHhB7SpdWe7Zm06ctRO77VdDkD9u_aKfAh1tMW-KcyYX7vF7LPlKqOo2fVtt3SyfsLv0KTYDB0YbAk6ZhyOIKT8Zfg)



## **Q&A**

If you have any question about this colab, please send a email to ntu-ml-2021spring-ta@googlegroups.com

## **Backup Links**

In [16]:
# resnet_model 
# !gdown --id '1zH1x39Y8a0XyOORG7TWzAnFf_YPY8e-m' --output resnet_model.ckpt
# !gdown --id '1VBIeQKH4xRHfToUxuDxtEPsqz0MHvrgd' --output resnet_model.ckpt
# !gdown --id '1Er2azErvXWS5m1jboKN7BLxNXnuAatYw' --output resnet_model.ckpt
# !gdown --id '1Qya0vmf3nRl11IyxxF7nudDpZI_Q4Amh' --output resnet_model.ckpt
# !gdown --id '1fGOOb5ndljraBIkRkLp3bW9orR4YN97U' --output resnet_model.ckpt
# !gdown --id '1apHLvZBZ3GYEMxXxToGKF7qDLn1XbOfJ' --output resnet_model.ckpt
# !gdown --id '1vsDylNsLaAqxonop7Mw3dBAig0EO7tlF' --output resnet_model.ckpt
# !gdown --id '1V_hXJM_V9-10i6wldRyl0SOiivPp4SNt' --output resnet_model.ckpt
# !gdown --id '11HzaJM2M2yg6KYhLaWpWy8WmPIIvJgnk' --output resnet_model.ckpt

# food-11
# !gdown --id '1qdyNN0Ek4S5yi-pAqHes1yjj5cNkENCc' --output food-11.zip
# !gdown --id '1c0Q1EP6yIx0O2rqVMIVInIt8wFjLxmRh' --output food-11.zip
# !gdown --id '1hKO054nT1R8egcXY2-tgQbwX4EjowRLz' --output food-11.zip
# !gdown --id '1_7_uC1WUvX6H51gQaYmI4q3AezdQJhud' --output food-11.zip
# !gdown --id '12bz82Zpx0_7BDGXq4nRt7E_fMFmILoc9' --output food-11.zip
# !gdown --id '1oiqRKrDQXVBM5y63MeEaHxFmCIzNXx1Q' --output food-11.zip
# !gdown --id '1qaL43sl4qUMeCT1OVpk4aOFycnLL5ZJX' --output food-11.zip

# References
KD loss implementation I refer the following materials
1. hw7 video & code from last year as suggested from TA
2. https://arxiv.org/abs/1503.02531
3. https://github.com/peterliht/knowledge-distillation-pytorch/issues/2