# Model Adapter Distiller Build-in DEMO
Model Adapter is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and those datasets from many domains. It mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

This demo mainly introduces the usage of Distiller. Take image classification as an example, it shows how to integrate distiller  from VIT to ResNet18 on CIFAR100 dataset. This is a build-in usage, you can find customized detailed demo at [here](./Model_Adapter_Distiller_customized_ResNet18_CIFAR100.ipynb).

# Content

* [Model Adapter Distiller Overview](#Model-Adapter-Distller-Overview)
* [Getting Started](#Getting-Started)
    * [Environment Setup](#Environment-Setup)
    * [Launch Training on baseline](#Launch-Training-on-baseline)
    * [Launch Training with Distiller](#Launch-Training-with-Distiller)
* [Performance](#Performance)

## Model Adapter Distiller Overview
Distiller is based on knowledge distillation technology, it can transfer knowledge from a heavy model (teacher) to a light one (student) with different structure. Teacher is a large model pretrained on specific dataset, which contains sufficient knowledge for this task, while the student model has much smaller structure. Distiller trains the student not only on the dataset, but also with the help of teacher’s knowledge. With distiller, we can take use of the knowledge from the existing pretrained large models but use much less training time. It can also significantly improve the converge speed and predicting accuracy of a small model, which is very helpful for inference.

<img src="../imgs/distiller.png" width="60%">
<center>Model Adapter Distiller Structure</center>

# Getting Started

## Environment Setup

1. prepare code
    ``` bash
    git clone https://github.com/intel/e2eAIOK.git
    cd e2eAIOK
    git submodule update --init –recursive
    ```
2. build docker image
   proxy
   ``` bash
   python3 scripts/start_e2eaiok_docker.py -b pytorch112 --dataset_path ${dataset_path} -w ${host0} ${host1} ${host2} ${host3} --proxy  "http://addr:ip"
   ```
3. run docker
   ``` bash
   sshpass -p docker ssh ${host0} -p 12347
   ```
4. Run in conda and set up e2eAIOK
   ```bash
   conda activate pytorch-1.12.0
   python setup.py sdist && pip install dist/e2eAIOK-*.*.*.tar.gz
   ```
5. Start the jupyter notebook and tensorboard service
   ``` bash
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/e2eaiok --ip=${hostname} --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/tensorboard --host=${hostname} --port=6006 & 
   ```
   Now you can visit demso in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

## Launch training on baseline
First we train a vanilla ResNet18 on CIFAR100.

### Configuration
Create a configuration for ResNet18 with CIFAR100.
```yaml
# basic experiment setting
experiment:
    project: "demo"
    tag: "cifar100_res18"
output_dir: "/home/vmagent/app/data/model"

# dataset and model setting
model_type: "resnet18"
data_set: "cifar100"
data_path:  "/home/vmagent/app/data/dataset/cifar"
num_workers: 0
input_size: 224

# training setting
train_epochs: 1
optimizer: "SGD"
learning_rate: 0.1
weight_decay: 0.0001
momentum: 0.9

lr_scheduler: "ReduceLROnPlateau"
lr_scheduler_config:
    decay_rate: 0.2
    decay_patience: 10 # for ReduceLROnPlateau
  
early_stop: "EarlyStopping"
early_stop_config:
    tolerance_epoch: 15

```
You can also find this configuration at [here](../../../conf/ma/demo/baseline/cifar100_res18_demo.yaml).

### Launch training
**Training resnet18 on CIFAR100 from scratch (train one epoch for example):**

In [6]:
! python /home/vmagent/app/e2eaiok/e2eAIOK/ModelAdapter/main.py --cfg /home/vmagent/app/e2eaiok/conf/ma/demo/baseline/cifar100_res18_demo.yaml



Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z


If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

configurations:
{'weight_decay': 0.0001, 'warmup_scheduler_epoch': 0, 'device': 'cpu', 'dist_backend': 'gloo', 'train_epochs': 1, 'pin_mem': False, 'eval_epochs': 1, 'early_stop_config': {'delta': 0.0001, 'is_max': True, 'tolerance_epoch': 15}, 'finetuner': {'type': '', 'initial_pretrain': '', 'pretrain': '', 'pretrained_num_classes': 10, 'finetuned_lr': 0.01, 'frozen': False}, 'learning_rate': 0.1, 'momentum': 0.9, 'seed': 0, 'enable_ipex': False, 'start_epoch': 0, 'tensorboard_dir': '/home/vmagent/app/data/tensorboard/cifar100_res18_resnet18_cifar100', 'warmup_scheduler': '', 'train_batch_size': 128, 'output_dir': '/home/vmagent

## Launch Training with Distiller
Then we train ResNet18 on CIFAR100 with Distiller to show the performance imrpovement.

### Prepare teacher model or logits
To use distiller, we need to prepare teacher model to guide the training. Directly download teacher model VIT pretrained on CIFAR100 from [here](https://huggingface.co/edumunozsala/vit_base-224-in21k-ft-cifar100), and put it at `${pretrain}`.

### Configuration

Create a configuration for Distiller with ResNet18 with CIFAR100.

```yaml
# basic experiment setting
experiment:
  project: "demo"
  tag: "cifar100_kd_vit_res18"
  strategy: "OnlyDistillationStrategy"
output_dir: "/home/vmagent/app/data/model"

### dataset and model setting
data_set: "cifar100"
data_path:  "/home/vmagent/app/data/dataset/cifar"
num_workers: 4
train_transform: "vit"
test_transform: "vit"
input_size: 224

model_type: "resnet18"

# distiller setting
loss_weight:
    backbone: 0.1
    distiller: 0.9

## distiller
distiller:
    type: "kd"
    teacher: 
        type: "huggingface_vit_base_224_in21k_ft_cifar100"
        initial_pretrain: True

## training setting
train_epochs: 1
optimizer: "SGD"
learning_rate: 0.1
weight_decay: 0.0001
momentum: 0.9

lr_scheduler: "ReduceLROnPlateau"
lr_scheduler_config:
    decay_rate: 0.2
    decay_patience: 10 # for ReduceLROnPlateau

early_stop: "EarlyStopping"
early_stop_config:
    tolerance_epoch: 15
```

You can also find this configuration at [here](../../../conf/ma/demo/distiller/cifar100_kd_vit_res18_demo.yaml).

### Launch Training with Distiller
**Training resnet18 on CIFAR100 with Distiller (train one epoch for example):**

This may take some time, just go to have a break~

In [7]:
! python /home/vmagent/app/e2eaiok/e2eAIOK/ModelAdapter/main.py --cfg /home/vmagent/app/e2eaiok/conf/ma/demo/distiller/cifar100_kd_vit_res18_demo.yaml



Please cite the following paper when using nnUNet:

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation." Nat Methods (2020). https://doi.org/10.1038/s41592-020-01008-z


If you have questions or suggestions, feel free to open an issue at https://github.com/MIC-DKFZ/nnUNet

configurations:
{'model_save_interval': 40, 'output_dir': '/home/vmagent/app/data/model', 'data_set': 'cifar100', 'optimizer': 'SGD', 'input_size': 112, 'learning_rate': 0.1, 'model_type': 'resnet18', 'loss_weight': {'distiller': 0.9, 'adapter': 0.0, 'backbone': 0.1}, 'lr_scheduler_config': {'T_max': 0, 'decay_rate': 0.2, 'decay_patience': 10, 'decay_stages': []}, 'profiler': False, 'train_epochs': 1, 'early_stop': 'EarlyStopping', 'criterion': 'CrossEntropyLoss', 'experiment': {'tag': 'cifar100_kd_res50_res18', 'strategy': 'OnlyDistillationStrategy', 'project': 'demo'}, 'warmup_scheduler': '', 'initial_pretrain': '', 'early_stop_

[2023-02-07 08:44:53] rank(0) epoch(0) step (360/391) Train: total_loss = 5.4664;	backbone_loss = 3.4247;	distiller_loss = 5.6932;	accuracy = 21.8750
[2023-02-07 08:44:57] rank(0) epoch(0) step (370/391) Train: total_loss = 5.1615;	backbone_loss = 3.1988;	distiller_loss = 5.3796;	accuracy = 23.4375
[2023-02-07 08:45:00] rank(0) epoch(0) step (380/391) Train: total_loss = 5.8639;	backbone_loss = 3.5364;	distiller_loss = 6.1225;	accuracy = 17.9688
[2023-02-07 08:45:04] rank(0) epoch(0) step (390/391) Train: total_loss = 5.2738;	backbone_loss = 3.4408;	distiller_loss = 5.4774;	accuracy = 22.5000
2023-02-07 08:45:10 0/391
2023-02-07 08:45:11 10/391
2023-02-07 08:45:12 20/391
2023-02-07 08:45:13 30/391
2023-02-07 08:45:14 40/391
2023-02-07 08:45:15 50/391
2023-02-07 08:45:16 60/391
2023-02-07 08:45:17 70/391
[2023-02-07 08:45:18] rank(0) epoch(0) Validation: accuracy = 19.1100;	loss = 3.6392
Best Epoch: 0, accuracy: 19.110000610351562
Epoch 0 took 277.65441823005676 seconds
Total seconds:27