[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/ma/distiller/Model_Adapter_Distiller_builtin_VIT_to_ResNet18_CIFAR100.ipynb)

# Model Adapter Distiller Built-in DEMO
Model Adapter is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and those datasets from many domains. It mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

This demo mainly introduces the usage of Distiller. Take image classification as an example, it shows how to integrate distiller  from VIT to ResNet18 on CIFAR100 dataset. This is a build-in usage, you can find customized detailed demo at [here](./Model_Adapter_Distiller_Walkthrough_VIT_to_ResNet18_CIFAR100.ipynb).

# Content

* [Overview](#Overview)
    * [Model Adapter Distiller Overview](#Model-Adapter-Distiller-Overview)
* [Getting Started](#Getting-Started)
    * [1. Environment Setup](#1.-Environment-Setup)
    * [2. Launch training on baseline](#2.-Launch-training-on-baseline)
    * [3. Launch training with Distiller](#3.-Launch-training-with-Distiller)

# Overview

## Model Adapter Distiller Overview
Distiller is based on knowledge distillation technology, it can transfer knowledge from a heavy model (teacher) to a light one (student) with different structure. Teacher is a large model pretrained on specific dataset, which contains sufficient knowledge for this task, while the student model has much smaller structure. Distiller trains the student not only on the dataset, but also with the help of teacher’s knowledge. With distiller, we can take use of the knowledge from the existing pretrained large models but use much less training time. It can also significantly improve the converge speed and predicting accuracy of a small model, which is very helpful for inference.

<img src="../imgs/distiller.png" width="60%">
<center>Model Adapter Distiller Structure</center>

# Getting Started

## 1. Environment Setup

### (Option 1) Use Pip install - recommend
We can directly install ModelAdapter module from Intel® End-to-End AI Optimization Kit with following command.

In [None]:
!pip install e2eAIOK-ModelAdapter --pre

### (Option 2) Use Docker 

We can also use Docker, which contains a complete environment.

Step1. prepare code
   ``` bash
   git clone https://github.com/intel/e2eAIOK.git
   cd e2eAIOK
   git submodule update --init –recursive
   ```
    
Step2. build docker image
   ``` bash
   python3 scripts/start_e2eaiok_docker.py -b pytorch112 --dataset_path ${dataset_path} -w ${host0} ${host1} ${host2} ${host3} --proxy  "http://addr:ip"
   ```
   
Step3. run docker and start conda env
   ``` bash
   sshpass -p docker ssh ${host0} -p 12347
   conda activate pytorch-1.12.0
   ```
  
Step4. Start the jupyter notebook and tensorboard service
   ``` bash
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/e2eaiok --ip=${hostname} --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/tensorboard --host=${hostname} --port=6006 & 
   ```
   Now you can visit demso in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

## 2. Launch training on baseline
First we train a vanilla ResNet18 on CIFAR100 as baseline for comparison.

### 2.1 Configuration
Let's download a configuration for ResNet18 with CIFAR100.

In [None]:
!wget https://raw.githubusercontent.com/intel/e2eAIOK/main/conf/ma/demo/baseline/cifar100_res18.yaml

--2023-03-19 23:11:25--  https://raw.githubusercontent.com/intel/e2eAIOK/main/conf/ma/demo/baseline/cifar100_res18.yaml
Resolving child-prc.intel.com (child-prc.intel.com)... 10.239.120.56
Connecting to child-prc.intel.com (child-prc.intel.com)|10.239.120.56|:913... connected.
Proxy request sent, awaiting response... 200 OK
Length: 505 [text/plain]
Saving to: ‘cifar100_res18.yaml’


2023-03-19 23:11:26 (14.5 MB/s) - ‘cifar100_res18.yaml’ saved [505/505]



Have a detailed look into the configurations.

In [None]:
! cat cifar100_res18.yaml

experiment:
    project: "demo"
    tag: "cifar100_res18"

output_dir: "./data"
train_epochs: 1

data_set: "cifar100"
data_path:  "./data"
num_workers: 0
input_size: 224

model_type: "resnet18"

optimizer: "SGD"
learning_rate: 0.1
weight_decay: 0.0001
momentum: 0.9

lr_scheduler: "ReduceLROnPlateau"
lr_scheduler_config:
    decay_rate: 0.2
    decay_patience: 10 # for ReduceLROnPlateau
  
early_stop: "EarlyStopping"
early_stop_config:
    tolerance_epoch: 15


### 2.2 Launch training
**Training resnet18 on CIFAR100 from scratch (train one epoch for example):**

We can directly train the model with only one-line command.

In [None]:
! python -u /usr/local/lib/python3.9/dist-packages/e2eAIOK/ModelAdapter/main.py --cfg cifar100_res18.yaml



Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z


If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

No Trial
configurations:
{'distiller': {'type': '', 'teacher': {'type': '', 'initial_pretrain': '', 'pretrain': '', 'frozen': True}, 'save_logits': False, 'use_saved_logits': False, 'check_logits': False, 'logits_path': '', 'logits_topk': 0, 'save_logits_start_epoch': 0}, 'learning_rate': 0.1, 'weight_decay': 0.0001, 'warmup_scheduler_epoch': 0, 'loss_weight': {'backbone': 1.0, 'distiller': 0.0, 'adapter': 0.0}, 'metric_threshold': 100.0, 'profiler': False, 'test_transform': 'default', 'data_set': 'cifar100', 'pretrain': '', 'tensorboard_dir': '/home/vmagent/app/data/tensorboard/cifar100_res18_resnet18_cifar100', 'dist_backend': '

## 3. Launch training with Distiller
Then we train ResNet18 on CIFAR100 with Distiller to show the performance imrpovement.

### 3.1 Prepare teacher model
To use distiller, we need to prepare teacher model to guide the training. Here we select pretrained [vit_base-224-in21k-ft-cifar100 from HuggingFace](https://huggingface.co/edumunozsala/vit_base-224-in21k-ft-cifar100).

### 3.2 Configuration

Now we download a configuration for Distiller with ResNet18 with CIFAR100.

In [None]:
!wget https://raw.githubusercontent.com/intel/e2eAIOK/main/conf/ma/demo/distiller/cifar100_kd_vit_res18.yaml

--2023-03-19 23:13:29--  https://raw.githubusercontent.com/intel/e2eAIOK/main/conf/ma/demo/distiller/cifar100_kd_vit_res18.yaml
Resolving child-prc.intel.com (child-prc.intel.com)... 10.239.120.56
Connecting to child-prc.intel.com (child-prc.intel.com)|10.239.120.56|:913... connected.
Proxy request sent, awaiting response... 200 OK
Length: 992 [text/plain]
Saving to: ‘cifar100_kd_vit_res18.yaml’


2023-03-19 23:13:30 (28.3 MB/s) - ‘cifar100_kd_vit_res18.yaml’ saved [992/992]



Have a detailed look into the configurations.

In [None]:
! cat cifar100_kd_vit_res18.yaml

experiment:
  project: "demo"
  tag: "cifar100_kd_vit_res18"
  strategy: "OnlyDistillationStrategy"
  
output_dir: "./data"
train_epochs: 1

### dataset
data_set: "cifar100"
data_path:  "./data"
num_workers: 4
train_transform: "vit"
test_transform: "vit"
input_size: 224

### model
model_type: "resnet18"

# loss
loss_weight:
    backbone: 0.1
    distiller: 0.9

## distiller
distiller:
    type: "kd"
    teacher: 
        type: "huggingface_vit_base_224_in21k_ft_cifar100"
        initial_pretrain: True

## optimizer
optimizer: "SGD"
learning_rate: 0.1
weight_decay: 0.0001
momentum: 0.9

### scheduler
lr_scheduler: "ReduceLROnPlateau"
lr_scheduler_config:
    decay_rate: 0.2
    decay_patience: 10 # for ReduceLROnPlateau
  
### early stop
early_stop: "EarlyStopping"
early_stop_config:
    tolerance_epoch: 15


### 3.3 Launch Training with Distiller
**Training resnet18 on CIFAR100 with Distiller (train one epoch for example):**

This will take some time(~45min), have a break and get a coffee!

You can get an optimized and accelerated training with saving logits function, refer to [logits saving demo](Model_Adapter_Distiller_customized_ResNet18_CIFAR100_train_with_logits.ipynb) and [training with saved logits demo](./Model_Adapter_Distiller_customized_ResNet18_CIFAR100_train_with_logits.ipynb) for more details.

In [None]:
! python -u /usr/local/lib/python3.9/dist-packages/e2eAIOK/ModelAdapter/main.py --cfg cifar100_kd_vit_res18.yaml



Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z


If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

No Trial
configurations:
{'eval_metric': 'accuracy', 'dist_backend': 'gloo', 'eval_batch_size': 128, 'warmup_scheduler': '', 'eval_epochs': 1, 'experiment': {'project': 'demo', 'tag': 'cifar100_kd_vit_res18', 'strategy': 'OnlyDistillationStrategy'}, 'optimizer': 'SGD', 'loss_weight': {'adapter': 0.0, 'backbone': 0.1, 'distiller': 0.9}, 'momentum': 0.9, 'num_workers': 4, 'profiler_config': {'skip_first': 1, 'wait': 1, 'warmup': 1, 'active': 2, 'repeat': 1, 'trace_file': '/home/vmagent/app/data/model/demo/cifar100_kd_vit_res18/profile/profile_resnet18_OnlyDistillationStrategy_cifar100_1676258899'}, 'drop_last': False, 'lr_scheduler_

[2023-02-13 04:10:20] rank(0) epoch(0) step (370/391) Train: total_loss = 2.0250;	backbone_loss = 3.6623;	distiller_loss = 1.8431;	accuracy = 11.7188
[2023-02-13 04:11:28] rank(0) epoch(0) step (380/391) Train: total_loss = 2.0208;	backbone_loss = 3.5435;	distiller_loss = 1.8516;	accuracy = 17.9688
[2023-02-13 04:12:33] rank(0) epoch(0) step (390/391) Train: total_loss = 1.9110;	backbone_loss = 3.4433;	distiller_loss = 1.7407;	accuracy = 18.7500
2023-02-13 04:12:44 0/391
2023-02-13 04:12:49 10/391
2023-02-13 04:12:53 20/391
2023-02-13 04:12:58 30/391
2023-02-13 04:13:03 40/391
2023-02-13 04:13:07 50/391
2023-02-13 04:13:12 60/391
2023-02-13 04:13:16 70/391
[2023-02-13 04:13:20] rank(0) epoch(0) Validation: accuracy = 17.3100;	loss = 3.5807
Best Epoch: 0, accuracy: 17.309999465942383
Epoch 0 took 2708.9147386550903 seconds
Total seconds:2708.916575
Totally take 2713.286825656891 seconds
