[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/ma/distiller/Model_Adapter_Distiller_Walkthrough_VIT_to_ResNet18_on_CIFAR100_save_logits.ipynb)

# Model Adapter Distiller Walkthrough DEMO - Save logits
Model Adapter is a convenient framework can be used to reduce training and inference time, or data labeling cost by efficiently utilizing public advanced models and datasets. It mainly contains three components served for different cases: Finetuner, Distiller, and Domain Adapter. 

Distiller is based on knowledge distillation technology, it can transfer knowledge from a heavy model (teacher) to a light one (student) with different structure. However, during the distillation process, teacher forwarding usually takes a lot of time. We can use logits saving function in distiller to save predictions from teacher in adavance, then lots of time can be saved during student training. 

This demo mainly introduces the usage of Distiller saving logits function. Take image classification as an example, it shows how to use distiller to save logits from VIT pretrained model, which will be used to guide the learning of ResNet18 in next [demo](./Model_Adapter_Distiller_Walkthrough_VIT_to_ResNet18_CIFAR100_train_with_logits.ipynb).

To enable saving logits function, we just need to add two steps in original pipeline:
- Wrap train_dataset with DataWrapper
- Call prepare_logits() in Distiller

  
# Content

* [Overview](#Overview)
    * [Model Adapter Distiller Overview](#Model-Adapter-Distiller-Overview)
* [Getting Started](#Getting-Started)
    * [1. Environment Setup](#1.-Environment-Setup)
    * [2. Data Prepare](#2.-Data-Prepare)
    * [3. Model Prepare](#3.-Model-Prepare)
    * [4. Save Logits](#4.-Save-Logits)

# Overview

## Model Adapter Distiller Overview
Distiller is based on knowledge distillation technology, it can transfer knowledge from a heavy model (teacher) to a light one (student) with different structure. Teacher is a large model pretrained on specific dataset, which contains sufficient knowledge for this task, while the student model has much smaller structure. Distiller trains the student not only on the dataset, but also with the help of teacher’s knowledge. With distiller, we can take use of the knowledge from the existing pretrained large models but use much less training time. It can also significantly improve the converge speed and predicting accuracy of a small model, which is very helpful for inference.

<img src="../imgs/distiller.png" width="60%">
<center>Model Adapter Distiller Structure</center>

# Getting Started

## 1. Environment Setup

### (Option 1) Use Pip install
We can directly install ModelAdapter module from Intel® End-to-End AI Optimization Kit with following command.

In [None]:
!pip install e2eAIOK-ModelAdapter --pre

### (Option 2) Use Docker 

We can also use Docker, which contains a complete environment.

Step1. prepare code
   ``` bash
   git clone https://github.com/intel/e2eAIOK.git
   cd e2eAIOK
   git submodule update --init –recursive
   ```
    
Step2. build docker image
   ``` bash
   python3 scripts/start_e2eaiok_docker.py -b pytorch112 --dataset_path ${dataset_path} -w ${host0} ${host1} ${host2} ${host3} --proxy  "http://addr:ip"
   ```
   
Step3. run docker and start conda env
   ``` bash
   sshpass -p docker ssh ${host0} -p 12347
   conda activate pytorch-1.12.0
   ```
  
Step4. Start the jupyter notebook and tensorboard service
   ``` bash
   nohup jupyter notebook --notebook-dir=/home/vmagent/app/e2eaiok --ip=${hostname} --port=8899 --allow-root &
   nohup tensorboard --logdir /home/vmagent/app/data/tensorboard --host=${hostname} --port=6006 & 
   ```
   Now you can visit demso in `http://${hostname}:8899/`, and see tensorboad log in ` http://${hostname}:6006`.

## 2. Data Prepare

Let's import some required modules. We will use ResNet from Timm lib in this notebook.

In [None]:
import torch
from torchvision import transforms,datasets
from torch.utils.data import DataLoader
import timm
import sys,os

First let's define transformer for dataset, which will be needed to augment input image. 

For teacher, as pretrained model is trained on large imgage size, scale 32\*32 to 224\*224

In [None]:
IMAGE_MEAN = [0.5, 0.5, 0.5]
IMAGE_STD = [0.5, 0.5, 0.5]

train_transform = transforms.Compose([
  transforms.RandomCrop(32, padding=4),
  transforms.RandomHorizontalFlip(),
  transforms.Resize(224),  # pretrained model is trained on large imgage size, scale 32x32 to 224x224
  transforms.ToTensor(),
  transforms.Normalize(IMAGE_MEAN, IMAGE_STD)
])

Then let's define CIFAR100 dataset and download it with torchvision lib.

In [None]:
data_folder='./data' # dataset location
train_set = datasets.CIFAR100(root=data_folder, train=True, download=True, transform=train_transform)

Files already downloaded and verified


**Warp dataset with DataWrapper**

Warp train dataset with DataWrapper, which helps to save data augmentation information during the forwarding of teacher model.

In [None]:
from e2eAIOK.ModelAdapter.engine_core.distiller.utils import logits_wrap_dataset
logits_path = "./data" # path to save the logits
train_set = logits_wrap_dataset(train_set, logits_path=logits_path, num_classes=100, save_logits=True)

**Create dataloader**

Note: We need to save all the data without any sampling, make sure you have disable "channel_last" or "sampler" in dataloader, which can avoid data lossing in later logits using process.

In [None]:
train_loader = DataLoader(dataset=train_set, batch_size=128, shuffle=True, num_workers=1, drop_last=False)

## 3. Model Prepare

**Prepare teacher model**

To use distiller, we need to prepare teacher model to guide the training. Here we select pretrained [vit_base-224-in21k-ft-cifar100 from HuggingFace](https://huggingface.co/edumunozsala/vit_base-224-in21k-ft-cifar100).

In [None]:
from transformers import ViTForImageClassification
teacher_model = ViTForImageClassification.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')

**Define Distiller**

Here we define a distiller using KD algorithm, and it take a teacher model as input. If teacher comes from Hugginface, please clarify "teacher_type" with a name starting with "huggingface", otherwise no need.

In [None]:
from e2eAIOK.ModelAdapter.engine_core.distiller import KD
distiller= KD(teacher_model,teacher_type="huggingface_vit")

## 4. Save Logits

Call prepare_logits() of distiller to save the logits.

In [None]:
distiller.prepare_logits(train_loader, epochs=1)

2023-02-13 06:48:11 save 0/391
2023-02-13 06:49:03 save 10/391
2023-02-13 06:49:55 save 20/391
2023-02-13 06:50:47 save 30/391
2023-02-13 06:51:40 save 40/391
2023-02-13 06:52:31 save 50/391
2023-02-13 06:53:24 save 60/391
2023-02-13 06:54:15 save 70/391
2023-02-13 06:55:05 save 80/391
2023-02-13 06:55:56 save 90/391
2023-02-13 06:56:47 save 100/391
2023-02-13 06:57:39 save 110/391
2023-02-13 06:58:29 save 120/391
2023-02-13 06:59:20 save 130/391
2023-02-13 07:00:11 save 140/391
2023-02-13 07:01:02 save 150/391
2023-02-13 07:01:52 save 160/391
2023-02-13 07:02:44 save 170/391
2023-02-13 07:03:36 save 180/391
2023-02-13 07:04:27 save 190/391
2023-02-13 07:05:17 save 200/391
2023-02-13 07:06:08 save 210/391
2023-02-13 07:06:59 save 220/391
2023-02-13 07:07:48 save 230/391
2023-02-13 07:08:38 save 240/391
2023-02-13 07:09:31 save 250/391
2023-02-13 07:10:22 save 260/391
2023-02-13 07:11:13 save 270/391
2023-02-13 07:12:03 save 280/391
2023-02-13 07:12:54 save 290/391
2023-02-13 07:13:46 s