[![open in colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/68a5a83f25022d0f1ba6610b18f620d0cdde9d68/demo/denas/computer_vision/DENAS_ViT_DEMO.ipynb)

# AIOK DE-NAS for ViT Demo

DE-NAS is a multi-model, hardware-aware, train-free NAS to construct compact model architectures for target platform directly. DE-NAS includes CNN-based search space for CV domain and Transformer-based search space for CV/NLP/ASR domains, and leverages hardware-aware train-free scoring method to evaluate the performance of the candidate architecture without training.

This demo mainly introduces CV integration with DE-NAS to search lighter, faster, higher performance transformer-based model in a training-free way.

# Content
* [Overview](#overview)
    * [DE-NAS on CV(ViT) Domain](#de-nas-on-cvvit-domain)
    * [Performance](#performance)
* [Getting Started](#getting-started)
    * [1. Environment Setup](#1-environment-setup)
    * [2. Workflow Prepare](#2-workflow-prepare)
    * [3. Data Prepare](#3-data-prepare)
    * [4. Launch Search](#4-launch-search)
    * [5. Launch Training with Best Searched Model Structure](#5-launch-training-with-best-searched-model-structure)

# Overview
## DE-NAS on CV(ViT) Domain
For VIT models, the basic structure is the transformer based model which generated from the unified transformer based search space, the Search Space for VIT domain as following:

``` yaml
SUPERNET:
  MLP_RATIO: 4.0 # Linear layer ratio
  NUM_HEADS: 10 # num of attention heads
  EMBED_DIM: 640 # Q,K,V embedding dimision
  DEPTH: 16 # number of transformer layers
SEARCH_SPACE:
  MLP_RATIO:[3.0,3.5,4.0]
  NUM_HEADS:[3,4,5,6,7,8,9,10]
  DEPTH:[12,13,14,15,16]
  EMBED_DIM:[192,216,240,320,384,448,528,576,624]
```

## Performance
<center>
<img src="./img/denas_vit_stock.png" width="800"/><figure>DE-NAS ViT performance over stock model</figure>
</center>

<center>
<img src="./img/denas_vit_autoformer.png" width="800"/><figure>DE-NAS ViT performance over Autoformer</figure>
</center>

* Testing methodology
  * Dataset: CIFAR10, Metric: Top-1 accuracy  0.9482
  * Baseline: model AutoFormer
  * Search best model structure with highest DE-score
  * DE-NAS model is highest DE-Score model, total searched candidates is 1000
  * Training epoch: 200
* DENAS ViT delivered 35.63x search and 4.44x training speedup over SOTA NAS (AutoFormer) with 5% accuracy loss (0.90 vs. 0.9482).

# Getting Started

## 1. Environment Setup

### Option 1 Setup Environment with Pip

In [None]:
! pip install e2eAIOK-denas --pre

### Option 2 Setup Environmental with Docker

Step1. prepare code
``` bash
git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
git submodule update --init –recursive
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
```

Step2. build docker image
```bash
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
```

Step3. run docker and start conda env
``` bash
sshpass -p docker ssh ${host0} -p 12347
```

## 2. Workflow Prepare

### Configuration

* Create config file for VIT model search

``` yaml
model_type: transformer
search_engine: EvolutionarySearchEngine # Options: random search, Evolution algorithm, SigOpt search
batch_size: 64
random_max_epochs: 1000
max_epochs: 10
select_num: 50
population_num: 50
m_prob: 0.2
s_prob: 0.4
crossover_num: 25
mutation_num: 25
max_param_limits: 100
min_param_limits: 1
supernet_cfg: /home/vmagent/app/e2eaiok/conf/denas/cv/supernet_vit/supernet_large.conf
img_size: 224
patch_size: 16 # patch number for input image
drop_rate: 0.0 # dropout ratio
drop_path_rate: 0.1
max_relative_position: 14 #max distance in relative position embedding
gp: True
relative_position: True
change_qkv: True
abs_pos: True
seed: 0
expressivity_weight: 0 # weight for train free score of expressivity 
complexity_weight: 0 # weight for train free score of complexity
diversity_weight: 1 # weight for train free score of diversity score
saliency_weight: 1 # weight for train free score of salience score
latency_weight: 10000 # weight for latency setup according to different platforms
```

* Create config file for VIT model train

``` yaml
domain: vit
train_epochs: 1
eval_epochs: 1
input_size: 32
best_model_structure: /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_vit_model_structure.txt
num_classes: 10
dist_backend: gloo
train_batch_size: 128
eval_batch_size: 128
data_path: ~/data/pytorch_cifar10
data_set: CIFAR10
output_dir: ./
num_workers: 10
pin_mem: True
eval_metric: "accuracy"
learning_rate: 0.001
momentum: 0.9
weight_decay: 0.01
optimizer: "SGD"
criterion: "CrossEntropyLoss"
lr_scheduler: "CosineAnnealingLR"
print_freq: 10
mode: "train"
gp: True
change_qkv: True 
relative_position: True # whether to use relative position embedding
drop_path: 0.1 #Drop path rate
max_relative_position: 14 #max distance in relative position embedding
no_abs_pos: False
patch_size: 16 # patch size for input image
drop: 0.0
metric_threshold: 94 # early stop target accuracy
SUPERNET:
  MLP_RATIO: 4.0
  NUM_HEADS: 10
  EMBED_DIM: 640
  DEPTH: 16
```

## 3. Data Prepare

### CIFAR(10/100) Dataset Download

In [None]:
from torchvision import datasets

data_folder = "~/data/pytorch_cifar"
is_train = True
transform = None

# Download Cifar10 Dataset
dataset = datasets.CIFAR10(data_folder, train=is_train, transform=transform, download=True)

# Download Cifar100 Dataset
dataset = datasets.CIFAR100(data_folder, train=is_train, transform=transform, download=True)

## 4. Launch Search

### Launch Search for ViT
The input is the configuration for VIT domain, edit the configuration file of `e2eaiok_denas_vit.conf` for vit model search, and then run below commond line

In [12]:
from e2eAIOK.DeNas.search.utils import parse_config
from e2eAIOK.DeNas.search.SearchEngineFactory import SearchEngineFactory
from e2eAIOK.DeNas.cv.supernet_transformer import Vision_TransformerSuper


params = parse_config('/home/vmagent/app/e2eaiok/conf/denas/cv/e2eaiok_denas_vit.conf')

# construct supernet and search space
super_net = Vision_TransformerSuper(img_size=params.img_size,
                                    patch_size=params.patch_size,
                                    embed_dim=params.SUPERNET.EMBED_DIM, depth=params.SUPERNET.DEPTH,
                                    num_heads=params.SUPERNET.NUM_HEADS,mlp_ratio=params.SUPERNET.MLP_RATIO,
                                    qkv_bias=True, drop_rate=params.drop_rate,
                                    drop_path_rate=params.drop_path_rate,
                                    gp=params.gp,
                                    num_classes=params.num_classes,
                                    max_relative_position=params.max_relative_position,
                                    relative_position=params.relative_position,
                                    change_qkv=params.change_qkv, abs_pos=params.abs_pos)
search_space = {'num_heads': params.SEARCH_SPACE.NUM_HEADS, 'mlp_ratio': params.SEARCH_SPACE.MLP_RATIO, 'embed_dim': params.SEARCH_SPACE.EMBED_DIM , 'depth': params.SEARCH_SPACE.DEPTH}

# create DE-NAS searcher and trigger the search process
searcher = SearchEngineFactory.create_search_engine(params = params, super_net = super_net, search_space = search_space)

# trigger the search process
searcher.search()
best_structure = searcher.get_best_structures()
print(f"DE-NAS completed, best structure is {best_structure}")

paths: /home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas/asr/utils, /home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas/asr
['/home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas', '/opt/intel/oneapi/advisor/2022.1.0/pythonapi', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python39.zip', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python3.9', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python3.9/lib-dynload', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python3.9/site-packages', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python3.9/site-packages/warprnnt_pytorch-0.1-py3.9-linux-x86_64.egg', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/python3.9/site-packages/e2eAIOK-0.2.4-py3.9.egg', '', '/home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/e2eAIOK/DeNas', '/hom

12/02/2022 02:02:02 - INFO - DENAS -   random 30/50 structure (14, 4.0, 3.5, 3.0, 3.0, 3.5, 4.0, 3.5, 4.0, 4.0, 3.0, 3.5, 3.5, 3.0, 3.5, 9, 5, 6, 4, 5, 5, 6, 3, 4, 10, 3, 5, 3, 7, 240) nas_score 260.6387023925781 params 10.64161
12/02/2022 02:02:09 - INFO - DENAS -   random 31/50 structure (15, 3.5, 3.0, 3.5, 3.5, 3.0, 3.0, 3.5, 3.5, 4.0, 3.0, 3.5, 3.5, 4.0, 4.0, 3.0, 6, 10, 4, 5, 3, 10, 9, 6, 7, 6, 3, 4, 5, 3, 10, 384) nas_score 289.3059997558594 params 24.696586
12/02/2022 02:02:16 - INFO - DENAS -   random 32/50 structure (16, 4.0, 3.0, 3.5, 4.0, 3.5, 3.0, 4.0, 3.5, 3.0, 4.0, 3.5, 3.5, 4.0, 3.0, 3.0, 3.5, 9, 3, 3, 5, 6, 7, 7, 9, 6, 6, 3, 9, 5, 5, 5, 10, 320) nas_score 244.2968292236328 params 20.000586
12/02/2022 02:02:26 - INFO - DENAS -   random 33/50 structure (16, 3.0, 3.0, 3.0, 3.5, 3.5, 4.0, 3.5, 3.0, 4.0, 3.0, 3.0, 3.0, 3.5, 3.5, 3.5, 4.0, 4, 7, 7, 9, 4, 7, 3, 6, 9, 7, 6, 9, 10, 10, 5, 5, 576) nas_score 235.0663604736328 params 52.550986
12/02/2022 02:02:31 - INFO - DENAS -  

12/02/2022 02:06:00 - INFO - DENAS -   mutation 16/25 structure (13, 3.5, 4.0, 3.0, 3.0, 4.0, 4.0, 3.5, 3.0, 3.5, 3.5, 3.5, 4.0, 4.0, 5, 7, 3, 4, 7, 9, 5, 10, 4, 4, 3, 9, 5, 448) nas_score 369.3968505859375 params 27.87569
12/02/2022 02:06:06 - INFO - DENAS -   mutation 17/25 structure (12, 3.0, 3.5, 4.0, 3.0, 4.0, 3.5, 4.0, 3.0, 3.5, 3.0, 3.0, 4.0, 7, 3, 10, 7, 5, 7, 6, 5, 5, 3, 7, 3, 448) nas_score 416.1216735839844 params 25.051882
12/02/2022 02:06:12 - INFO - DENAS -   mutation 18/25 structure (12, 4.0, 3.5, 4.0, 3.0, 3.0, 4.0, 3.5, 3.0, 3.0, 4.0, 3.5, 3.0, 5, 5, 6, 3, 3, 3, 5, 9, 6, 7, 6, 6, 448) nas_score 309.3833923339844 params 24.592362
12/02/2022 02:06:17 - INFO - DENAS -   mutation 19/25 structure (12, 3.5, 3.0, 4.0, 4.0, 3.0, 4.0, 3.0, 3.0, 3.5, 4.0, 3.5, 3.5, 3, 4, 3, 9, 3, 4, 10, 9, 7, 7, 3, 4, 216) nas_score 297.0036926269531 params 7.90957
12/02/2022 02:06:25 - INFO - DENAS -   mutation 20/25 structure (16, 3.0, 3.0, 3.0, 3.5, 3.5, 4.0, 3.5, 3.0, 4.0, 3.0, 3.0, 3.0, 3.5

## 5. Launch Training with Best Searched Model Structure

### Train the best searched ViT model
The input is the configuration for VIT domain best model train, edit the configuration file of `e2eaiok_denas_train_vit.conf` for VIT best model train, and then run below commond line

In [19]:
import e2eAIOK.common.trainer.utils.utils as utils
from e2eAIOK.DeNas.cv.model_builder_denas_cv import ModelBuilderCVDeNas
from e2eAIOK.common.trainer.data.cv.data_builder_cifar import DataBuilderCIFAR
from e2eAIOK.common.trainer.data.cv.data_builder_imagenet import DataBuilderImageNet
from e2eAIOK.DeNas.cv.cv_trainer import CVTrainer
from e2eAIOK.DeNas.search.utils import parse_config

# parse DE-NAS train configure
cfg = parse_config("/home/vmagent/app/e2eaiok/conf/denas/cv/e2eaiok_denas_train_vit.conf")

# construct model, dataloader, optimizer, criterion, scheduler and metric
model = ModelBuilderCVDeNas(cfg).create_model()
train_dataloader, eval_dataloader = (DataBuilderImageNet(cfg) if cfg.data_set == 'ImageNet' else DataBuilderCIFAR(cfg)).get_dataloader()
optimizer = utils.create_optimizer(model, cfg)
criterion = utils.create_criterion(cfg)
scheduler = utils.create_scheduler(optimizer, cfg)
metric = utils.create_metric(cfg)

# create DE-NAS trainer
trainer = CVTrainer(cfg, model, train_dataloader, eval_dataloader, optimizer, criterion, scheduler, metric)

# trigger the training process
trainer.fit()

2022-12-02 02:33:35,564 - __main__ - INFO - MASTER_ADDR=127.0.0.1
2022-12-02 02:33:35,564 - __main__ - INFO - MASTER_PORT=29500
2022-12-02 02:33:35,565 - __main__ - INFO - I_MPI_PIN_DOMAIN=[0xfffff0,0xfffff0000000,]
2022-12-02 02:33:35,565 - __main__ - INFO - OMP_NUM_THREADS=20
2022-12-02 02:33:35,566 - __main__ - INFO - Using Intel OpenMP
2022-12-02 02:33:35,566 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0
2022-12-02 02:33:35,566 - __main__ - INFO - KMP_BLOCKTIME=1
2022-12-02 02:33:35,566 - __main__ - INFO - LD_PRELOAD=/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0/lib/libiomp5.so
2022-12-02 02:33:35,566 - __main__ - INFO - CCL_WORKER_COUNT=4
2022-12-02 02:33:35,566 - __main__ - INFO - CCL_WORKER_AFFINITY=0,1,2,3,24,25,26,27
2022-12-02 02:33:35,566 - __main__ - INFO - ['mpiexec.hydra', '-l', '-np', '2', '-ppn', '2', '-genv', 'I_MPI_PIN_DOMAIN=[0xfffff0,0xfffff0000000,]', '-genv', 'OMP_NUM_THREADS=20', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.10.0

[0] model parameters size: 26354346
[0] 12/02/2022 02:33:39 - INFO - Trainer -   model created: Vision_TransformerSuper(
[0]   (patch_embed_super): PatchembedSuper(
[0]     (proj): Conv2d(3, 640, kernel_size=(16, 16), stride=(16, 16))
[0]   )
[0]   (blocks): ModuleList(
[0]     (0): TransformerEncoderLayer(
[0]       (drop_path): Identity()
[0]       (attn): AttentionSuper(
[0]         (qkv): qkv_super(in_features=640, out_features=1920, bias=True)
[0]         (rel_pos_embed_k): RelativePosition2D_super()
[0]         (rel_pos_embed_v): RelativePosition2D_super()
[0]         (proj): LinearSuper(in_features=640, out_features=640, bias=True)
[0]         (attn_drop): Dropout(p=0.0, inplace=False)
[0]         (proj_drop): Dropout(p=0.0, inplace=False)
[0]       )
[0]       (attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
[0]       (ffn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
[0]       (fc1): LinearSuper(in_featur

[1] Files already downloaded and verified
[0] Files already downloaded and verified
[1] Files already downloaded and verified
[0] Files already downloaded and verified
[1] model:Vision_TransformerSuper(
[1]   (patch_embed_super): PatchembedSuper(
[1]     (proj): Conv2d(3, 640, kernel_size=(16, 16), stride=(16, 16))
[1]   )
[1]   (blocks): ModuleList(
[1]     (0): TransformerEncoderLayer(
[1]       (drop_path): Identity()
[1]       (attn): AttentionSuper(
[1]         (qkv): qkv_super(in_features=640, out_features=1920, bias=True)
[1]         (rel_pos_embed_k): RelativePosition2D_super()
[1]         (rel_pos_embed_v): RelativePosition2D_super()
[1]         (proj): LinearSuper(in_features=640, out_features=640, bias=True)
[1]         (attn_drop): Dropout(p=0.0, inplace=False)
[1]         (proj_drop): Dropout(p=0.0, inplace=False)
[1]       )
[1]       (attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
[1]       (ffn_layer_norm): LayerNormSuper((640,), eps=1e-05,

[0] model:Vision_TransformerSuper(
[0]   (patch_embed_super): PatchembedSuper(
[0]     (proj): Conv2d(3, 640, kernel_size=(16, 16), stride=(16, 16))
[0]   )
[0]   (blocks): ModuleList(
[0]     (0): TransformerEncoderLayer(
[0]       (drop_path): Identity()
[0]       (attn): AttentionSuper(
[0]         (qkv): qkv_super(in_features=640, out_features=1920, bias=True)
[0]         (rel_pos_embed_k): RelativePosition2D_super()
[0]         (rel_pos_embed_v): RelativePosition2D_super()
[0]         (proj): LinearSuper(in_features=640, out_features=640, bias=True)
[0]         (attn_drop): Dropout(p=0.0, inplace=False)
[0]         (proj_drop): Dropout(p=0.0, inplace=False)
[0]       )
[0]       (attn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
[0]       (ffn_layer_norm): LayerNormSuper((640,), eps=1e-05, elementwise_affine=True)
[0]       (fc1): LinearSuper(in_features=640, out_features=2560, bias=True)
[0]       (fc2): LinearSuper(in_features=2560, o

[1]   distance_mat_v = (range_vec_k[None, :] // int(length_q ** 0.5 )  - range_vec_q[:, None] // int(length_q ** 0.5 ))
[0]   distance_mat_v = (range_vec_k[None, :] // int(length_q ** 0.5 )  - range_vec_q[:, None] // int(length_q ** 0.5 ))
[1] Epoch: [1]  [  0/196]  eta: 0:22:23  lr: 0.001000  loss: 2.3453 (2.3453)  time: 6.8559  data: 5.2393
[0] Epoch: [1]  [  0/196]  eta: 0:22:25  lr: 0.001000  loss: 2.3404 (2.3404)  time: 6.8649  data: 5.5334
[1] Epoch: [1]  [ 10/196]  eta: 0:03:40  lr: 0.001000  loss: 2.2744 (2.2759)  time: 1.1846  data: 0.4948
[0] Epoch: [1]  [ 10/196]  eta: 0:03:40  lr: 0.001000  loss: 2.2699 (2.2903)  time: 1.1846  data: 0.5214
[1] Epoch: [1]  [ 20/196]  eta: 0:02:40  lr: 0.001000  loss: 2.2375 (2.2317)  time: 0.6148  data: 0.0201
[0] Epoch: [1]  [ 20/196]  eta: 0:02:40  lr: 0.001000  loss: 2.2439 (2.2509)  time: 0.6144  data: 0.0201
[1] Epoch: [1]  [ 30/196]  eta: 0:02:16  lr: 0.001000  loss: 2.1719 (2.2012)  time: 0.6193  data: 0.0195
[0] Epoch: [1]  [ 30/196]