[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/ma/finetuner/Model_Adapter_Finetuner_Walkthrough_ResNet50_CIFAR100.ipynb)

# Model Adapter Finetuner Walkthrough DEMO
Model Adapter is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and those datasets from many domains. It mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

This demo mainly introduces the usage of Finetuner. Take image classification as an example, it shows how to apply Finetuner for ResNet50 on CIFAR100 dataset. This demo shows how to integrate Finetuner into a general training pipeline, you can find build-in simplied demo at [here](./Model_Adapter_Finetuner_builtin_ResNet50_CIFAR100.ipynb).

# Content

* [Model Adapter Finetuner Overview](#Model-Adapter-Distller-Overview)
* [1. Environment Setup](#1.-Environment-Setup)
* [2. Training with Finetuner](#2.-Training-with-Finetuner)
    * [2.1 Prepare data](#2.1-Prepare-data)
    * [2.2 Create transferrable Model](#2.2-Create-transferrable-Model)
    * [2.3 Train and evaluate](#2.3-Train-and-evaluate)

# Model Adapter Finetunner Overview
Finetuner is based on pretraining and finetuning technology, it can transfer knowledge from pretrained model to target model with same network structure. 

Pretrained models usually are generated by pretraining process, which is training specific model  on specific dataset and has been performed by DE-NAS, PyTorch, TensorFlow, or HuggingFace. Finetunner retrieves the pretrained model with same network structure, and copy pretrained weights from pretrained model to corresponding layer of target model, instead of random initialization for target mode. With finetunner, we can greatly improve training speed, and usually achieves better performance.

<img src="https://github.com/zhouyu5/e2eAIOK/blob/da-demo/demo/ma/imgs/finetuner.png?raw=1" width="50%">
<center>Model Adapter Finetuner Structure</center>

## 1. Environment Setup

### (Option 1) Use Pip install - recommend

In [None]:
!pip install e2eAIOK-ModelAdapter --pre

### (Option 2) Use Docker 

Step1. prepare code
   ``` bash
   git clone https://github.com/intel/e2eAIOK.git
   cd e2eAIOK
   git submodule update --init –recursive
   ```
    
Step2. build docker image
   ``` bash
   python3 scripts/start_e2eaiok_docker.py -b pytorch112 --dataset_path ${dataset_path} -w ${host0} ${host1} ${host2} ${host3} --proxy  "http://addr:ip"
   ```
   
Step3. run docker and start conda env
   ``` bash
   sshpass -p docker ssh ${host0} -p 12347
   conda activate pytorch-1.12.0
   ```
  
Step4. Start the jupyter notebook and tensorboard service
   ``` bash
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/e2eaiok --ip=${hostname} --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/tensorboard --host=${hostname} --port=6006 & 
   ```
   Now you can visit demso in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

# 2. Training with Finetuner

Import lib

In [None]:
import torch
from torchvision import transforms,datasets
from torch.utils.data import DataLoader
import torch.optim as optim
from timm.utils import accuracy
import timm
import transformers
import datetime

## 2.1 Prepare data

### Define transformer and dataset

In [None]:
CIFAR_TRAIN_MEAN = (0.4914, 0.4822, 0.4465)
CIFAR_TRAIN_STD = (0.2023, 0.1994, 0.2010)

train_transform = transforms.Compose([
  transforms.RandomCrop(32, padding=4),
  transforms.RandomHorizontalFlip(),
  transforms.Resize(112),
  transforms.ToTensor(),
  transforms.Normalize(CIFAR_TRAIN_MEAN, CIFAR_TRAIN_STD)
])

test_transform = transforms.Compose([
  transforms.RandomCrop(32, padding=4),
  transforms.Resize(112),
  transforms.ToTensor(),
  transforms.Normalize(CIFAR_TRAIN_MEAN, CIFAR_TRAIN_STD)
])

In [None]:
data_folder='./data' # dataset location
train_set = datasets.CIFAR100(root=data_folder, train=True, download=True, transform=train_transform)
test_set = datasets.CIFAR100(root=data_folder, train=False, download=True, transform=test_transform)

Files already downloaded and verified
Files already downloaded and verified


### Prepare dataloader

In [None]:
train_loader = DataLoader(dataset=train_set, batch_size=128, shuffle=True, num_workers=1, drop_last=False)
validate_loader = DataLoader(dataset=test_set, batch_size=128, shuffle=True, num_workers=1, drop_last=False)

## 2.2 Create transferrable Model

### Create Backbone model

In [None]:
backbone = timm.create_model('resnet50', pretrained=False, num_classes=100)

### Create pretrained model
To use finetuner, we need to prepare a pretrained model to initialize the backbone model. Here we use ResNet50 pretrained on ImageNet21k.

In [None]:
! wget https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/resnet50_miil_21k.pth && mv resnet50_miil_21k.pth data/ 

--2023-03-19 22:57:49--  https://miil-public-eu.oss-eu-central-1.aliyuncs.com/model-zoo/ImageNet_21K_P/models/resnet50_miil_21k.pth
Resolving child-prc.intel.com (child-prc.intel.com)... 10.239.120.56
Connecting to child-prc.intel.com (child-prc.intel.com)|10.239.120.56|:913... connected.
Proxy request sent, awaiting response... 200 OK
Length: 186531247 (178M) [application/octet-stream]
Saving to: ‘resnet50_miil_21k.pth’

 5% [=>                                     ] 9,744,954   72.6KB/s  eta 9m 18s ^C


In [None]:
pretrain_path = './data/resnet50_miil_21k.pth'
pretrained_model = timm.create_model('resnet50', pretrained=False, num_classes=11221)
pretrained_model.load_state_dict(torch.load(pretrain_path)['state_dict'], strict=True)

<All keys matched successfully>

### Create Finetuner

In [None]:
from e2eAIOK.ModelAdapter.engine_core.finetunner import BasicFinetunner
finetuner = BasicFinetunner(pretrained_model, is_frozen=False) # Do not freeze the weight training

### Make backbone model transferrable with Finetuner

Copy weights from pretrained model in finetuner to backbone model, an "ERROR" message will appear as the last layer can't match, which is normal and can be ignored.

In [None]:
from e2eAIOK.ModelAdapter.engine_core.transferrable_model import *
loss_fn = torch.nn.CrossEntropyLoss()
model = make_transferrable_with_finetune(backbone, loss_fn, finetuner)

ERROR:root:could not load layer: fc.weight; mismatch shape: target [torch.Size([100, 2048])] != pretrained [torch.Size([11221, 2048])]
ERROR:root:could not load layer: fc.bias; mismatch shape: target [torch.Size([100])] != pretrained [torch.Size([11221])]


could not load layer: fc.weight; mismatch shape: target [torch.Size([100, 2048])] != pretrained [torch.Size([11221, 2048])]
could not load layer: fc.bias; mismatch shape: target [torch.Size([100])] != pretrained [torch.Size([11221])]


## 2.3 Train and evaluate

### create optimizer
To optimize the learning, we can set a smaller learning rate for finetuned layer with pretrained weights, and a normal learning rate for remaining layers.

In [None]:
finetuned_lr = 0.00445 
learning_rate = 0.00753

In [None]:
finetuned_state_keys = ["backbone.%s"%name for name in finetuner.finetuned_state_keys] # add component prefix
finetuner_params = {'params':[p for (name, p) in model.named_parameters() if p.requires_grad and name in finetuned_state_keys],
                    'lr': finetuned_lr}
remain_params = {'params':[p for (name, p) in model.named_parameters() if p.requires_grad and name not in finetuned_state_keys],
                    'lr': learning_rate}
print("[%s] params set finetuner finetuned learning rate[%s]" % (len(finetuner_params['params']), finetuned_lr))
print("[%s] params set common learning rate [%s]" % (len(remain_params['params']), learning_rate))
assert len(finetuner_params) > 0,"Empty finetuner_params"
parameters = [finetuner_params,remain_params]

[159] params set finetuner finetuned learning rate[0.00445]
[2] params set common learning rate [0.00753]


In [None]:
weight_decay = 0.00115
momentum = 0.9
optimizer = optim.SGD(parameters,lr=learning_rate, weight_decay=weight_decay,momentum=momentum)

### Create scheduler

In [None]:
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=200)

### Create Trainer

In [None]:
max_epoch = 1 # max 1 epoch
print_interval = 10

In [None]:
class Trainer:
    def __init__(self, model, optimizer, scheduler):
        self._model = model
        self._optimizer = optimizer
        self._scheduler = scheduler
        
    def train(self, train_dataloader, valid_dataloader, max_epoch):
        ''' 
        :param train_dataloader: train dataloader
        :param valid_dataloader: validation dataloader
        :param max_epoch: steps per epoch
        '''
        for epoch in range(0, max_epoch):
            ################## train #####################
            self._model.train()  # set training flag
            for (cur_step,(data, label)) in enumerate(train_dataloader):
                self._optimizer.zero_grad()
                output = self._model(data)
                loss_value = self._model.loss(output, label) # transferrable model has loss attribute
                loss_value.backward()       
                if cur_step%print_interval == 0:
                    batch_acc = accuracy(output.backbone_output,label)[0] # use output.backbone_output instead of output
                    dt = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') # date time
                    print("[{}] epoch {} step {} : training batch loss {:.4f}, training batch acc {:.4f}".format(
                      dt, epoch, cur_step, loss_value.backbone_loss.item(), batch_acc.item()))
                self._optimizer.step()
            self._scheduler.step()
            ################## evaluate ######################
            self.evaluate(valid_dataloader)
            
    def evaluate(self, valid_dataloader):
        with torch.no_grad():
            self._model.eval()  
            backbone = self._model.backbone # use backbone in evaluation
            loss_cum = 0.0
            sample_num = 0
            acc_cum = 0.0
            total_step = len(valid_dataloader)
            for (cur_step,(data, label)) in enumerate(valid_dataloader):
                output = backbone(data)
                batch_size = data.size(0)
                sample_num += batch_size
                loss_cum += loss_fn(output, label).item() * batch_size
                acc_cum += accuracy(output, label)[0].item() * batch_size
                if cur_step%print_interval == 0:
                    print(f"step {cur_step}/{total_step}")
            dt = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S') # date time
            loss_value = loss_cum/sample_num
            acc_value = acc_cum/sample_num

            print("[{}] evaluation loss {:.4f}, evaluation acc {:.4f}".format(
                dt, loss_value, acc_value))

### Train and Evaluate

In [None]:
trainer = Trainer(model, optimizer, scheduler)
trainer.train(train_loader,validate_loader,max_epoch)

[2023-02-09 06:29:42] epoch 0 step 0 : training batch loss 4.6263, training batch acc 0.0000
[2023-02-09 06:30:04] epoch 0 step 10 : training batch loss 4.4492, training batch acc 6.2500
[2023-02-09 06:30:24] epoch 0 step 20 : training batch loss 4.1754, training batch acc 19.5312
[2023-02-09 06:30:43] epoch 0 step 30 : training batch loss 3.6634, training batch acc 32.8125
[2023-02-09 06:31:03] epoch 0 step 40 : training batch loss 3.1619, training batch acc 35.1562
[2023-02-09 06:31:23] epoch 0 step 50 : training batch loss 2.6438, training batch acc 49.2188
[2023-02-09 06:31:43] epoch 0 step 60 : training batch loss 1.8105, training batch acc 67.1875
[2023-02-09 06:32:03] epoch 0 step 70 : training batch loss 1.7793, training batch acc 65.6250
[2023-02-09 06:32:22] epoch 0 step 80 : training batch loss 1.5857, training batch acc 64.8438
[2023-02-09 06:32:42] epoch 0 step 90 : training batch loss 1.4807, training batch acc 64.0625
[2023-02-09 06:33:02] epoch 0 step 100 : training bat