[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/denas/computer_vision/DENAS_CNN_DEMO.ipynb)

# AIOK DE-NAS for CNN Demo

DE-NAS is a multi-model, hardware-aware, train-free NAS to construct compact model architectures for target platform directly. DE-NAS includes CNN-based search space for CV domain and Transformer-based search space for CV/NLP/ASR domains, and leverages hardware-aware train-free scoring method to evaluate the performance of the candidate architecture without training.

This demo mainly introduces CV integration with DE-NAS to search lighter, faster, higher performance cnn-based and transformer-based ASR model in a training-free way.

# Content
* [Overview](#Overview)
    * [DE-NAS on CV(CNN) Domain](#DE-NAS-on-CV(CNN)-Domain)
    * [Performance](#Performance)
* [Getting Started](#Getting-Started)
    * [1. Environment Setup](#1.-Environment-Setup)
    * [2. Workflow Prepare](#2.-Workflow-Prepare)
    * [3. Data Prepare](#3.-Data-Prepare)
    * [4. Search](#4.-Search)
    * [5. Train](#5.-Train)

# Overview

## DE-NAS on CV(CNN) Domain

For CNN models, the basic structure is constructure by Residual block or Bottleneck block, the Search Space for CNN domain as following:
``` yaml
SUPERNET structure:
    "SuperConvK3BNRELU(3,8,1,1)SuperResK3K3(8,16,1,8,1)
SuperResK3K3(16,32,2,16,1)SuperResK3K3(32,64,2,32,1)SuperResK3K3(64,64,2,32,1)SuperConvK1BNRELU(64,128,1,1)"
SEARCH SPACE:
    number of layer:[18, 35, 50, 101]
    Convolutional layer kernel size:[1x1, 3x3, 5x5, 7x7, 9x9, 11x11]
    Number of filters:[8,16,32,64,128]
```

## Performance

<img src="./img/denas_cnn_stock.png" width="900"/>
<img src="./img/denas_cnn_zennas.png" width="900"/>

* Testing methodology
    * Dataset: CIFAR10, Metric: Top-1 accuracy 0.9579
    * Baseline: ResNet50, Zen-NAS
    * Training epoch: 200
* DE-NAS CNN delivered 9.86x training speedup over ResNet50
* DE-NAS CNN delivered 40.73x search and 82.57x training speedup over Zen-NAS with 39% model size reduction and 5% accuracy loss (0.91 vs. 0.9579)

# Getting Started

## 1. Environment Setup

### Option 1 Setup Environment with Pip

In [None]:
%%bash
pip install e2eAIOK-denas --pre
pip install torchsummary joblib

### Option 2 Setup Environment with Docker

``` bash
# Setup ENV
git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
# Enter Docker
sshpass -p docker ssh ${host0} -p 12347
```

## 2. Workflow Prepare

### Search Configuration

``` yaml
model_type: cnn
search_engine: EvolutionarySearchEngine # Options: random search, Evolution algorithm, SigOpt search
batch_size: 64
random_max_epochs: 1000
max_epochs: 1
select_num: 50
population_num: 50 # population number size for EA search
crossover_num: 0
mutation_num: 50 # mutation number size for EA search
budget_num_layers: 18 # number of layer threshold
budget_model_size: 1000000 # model parameter size threshold
budget_flops: 10000000 # model FLOPs threshold
img_size: 32 # input image size
num_classes: 100
plainnet_struct: "SuperConvK3BNRELU(3,8,1,1)SuperResK3K3(8,16,1,8,1)SuperResK3K3(16,32,2,16,1)SuperResK3K3(32,64,2,32,1)SuperResK3K3(64,64,2,32,1)SuperConvK1BNRELU(64,128,1,1)" # This is the supernet architechture
no_reslink: False
no_BN: False
use_se: False
seed: 0
expressivity_weight: 1 # weight for train free score of expressivity 
complexity_weight: 0 # weight for train free score of complexity
diversity_weight: 0 # weight for train free score of diversity score
saliency_weight: 1 # weight for train free score of salience score
latency_weight: 0 # weight for latency setup according to different platforms
```

### Training Configuration

``` yaml
domain: cnn
train_epochs: 1
eval_epochs: 1
input_size: 32
best_model_structure: /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_cnn_model_structure.txt
num_classes: 10
dist_backend: gloo
train_batch_size: 128
eval_batch_size: 128
data_path: ~/data/pytorch_cifar10
data_set: CIFAR10
output_dir: ./
num_workers: 10
pin_mem: True
eval_metric: "accuracy"
learning_rate: 0.001
momentum: 0.9
weight_decay: 0.01
optimizer: "SGD"
criterion: "CrossEntropyLoss"
lr_scheduler: "CosineAnnealingLR"
print_freq: 10
metric_threshold: 94
mode: "train"
```

## 3. Data Prepare

In [None]:
from torchvision import datasets

data_folder = "~/data/pytorch_cifar"
is_train = True
transform = None

# Download Cifar10 Dataset
dataset = datasets.CIFAR10(data_folder, train=is_train, transform=transform, download=True)

## 4. Search
The input is the configuration for CNN domain, edit the configuration file of `e2eaiok_denas_cnn.conf` for CNN model search, and then run below commond line

In [3]:
import yaml
from easydict import EasyDict as edict
from e2eAIOK.DeNas.cv.third_party.ZenNet import DeSearchSpaceXXBL, DeMainNet
from e2eAIOK.DeNas.search.SearchEngineFactory import SearchEngineFactory

# create common settings
settings = {}
settings["domain"] = "cnn"
# load search settings
with open("/home/vmagent/app/e2eaiok/conf/denas/cv/e2eaiok_denas_cnn.conf") as f:
    conf = yaml.load(f, Loader=yaml.FullLoader)
settings.update(conf)
settings["max_epochs"] = 1
settings["population_num"] = 1
settings["crossover_num"] = 1
settings["mutation_num"] = 1
params = edict(settings)

# create supernet and search space
super_net = DeMainNet
search_space = DeSearchSpaceXXBL

# create search engine and launch search
searcher = SearchEngineFactory.create_search_engine(params = params, super_net = super_net, search_space = search_space)
searcher.search()
# get best searched structure
best_structure = searcher.get_best_structures()
print(f"DE-NAS completed, best structure is {best_structure}")

03/27/2023 01:37:08 - INFO - DENAS -   epoch = 0
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:3041.)
  eigenvalues, _ = torch.symeig(ntk)  # ascending
03/27/2023 01:37:10 - INFO - DENAS -   random 1/1 structure SuperConvK3BNRELU(3,8,1,1)SuperResK3K3(8,16,1,8,1)SuperResK3K3(16,32,2,16,1)SuperResK3K3(32,64,2,32,1)SuperResK5K5(64,64,2,48,2)SuperConvK1BNRELU(64,128,1,1) nas_score -7.420200824737549 params 373164
03/27/2023 01:37:10 - INFO - DENAS -   random_num = 1
03/27/2023 01:37:12 - INFO - DENAS -   mutation 1/1 structure SuperConvK3BNRELU(3,8,1,1)SuperResK3K3(8,16,1,8,1)SuperResK1K3K1

DE-NAS completed, best structure is SuperConvK3BNRELU(3,8,1,1)SuperResK3K3(8,16,1,8,1)SuperResK1K3K1(16,40,2,16,1)SuperResK5K5(40,96,2,40,2)SuperResK3K3(96,64,2,32,1)SuperConvK1BNRELU(64,128,1,1)


## 5. Train
The input is the configuration for CNN domain best model train, edit the configuration file of `e2eaiok_denas_train_cnn.conf` for CNN best model train, and then run below commond line

In [7]:
import yaml
from easydict import EasyDict as edict
import torch
from e2eAIOK.DeNas.cv.model_builder_denas_cv import ModelBuilderCVDeNas
from e2eAIOK.common.trainer.data.cv.data_builder_imagenet import DataBuilderImageNet
from e2eAIOK.common.trainer.data.cv.data_builder_cifar import DataBuilderCIFAR
import e2eAIOK.common.trainer.utils.utils as utils
from e2eAIOK.DeNas.cv.cv_trainer import CVTrainer

# create common settings
settings = {}
settings["domain"] = "cnn"
# load training settings
with open("/home/vmagent/app/e2eaiok/conf/denas/cv/e2eaiok_denas_train_cnn.conf") as f:
    conf = yaml.load(f, Loader=yaml.FullLoader)
settings.update(conf)
settings["train_epochs"] = 1
settings["best_model_structure"] = "best_model_structure.txt"
cfg = edict(settings)

# create CNN model builder and create CNN model
model = ModelBuilderCVDeNas(cfg).create_model()
# get training and evaluation dataloader
train_dataloader, eval_dataloader = (DataBuilderImageNet(cfg) if cfg.data_set == 'ImageNet' else DataBuilderCIFAR(cfg)).get_dataloader()
# create optimizer
optimizer = utils.create_optimizer(model, cfg)
criterion = utils.create_criterion(cfg)
scheduler = utils.create_scheduler(optimizer, cfg)
metric = utils.create_metric(cfg)
# create CNN trainer
trainer = CVTrainer(cfg, model, train_dataloader, eval_dataloader, optimizer, criterion, scheduler, metric)
# start model training and evaluation
trainer.fit()

03/27/2023 01:50:13 - INFO - Trainer -   building model
03/27/2023 01:50:13 - INFO - Trainer -   model created


Files already downloaded and verified
Files already downloaded and verified


03/27/2023 01:50:14 - INFO - Trainer -   Trainer config: {'domain': 'cnn', 'train_epochs': 1, 'eval_epochs': 1, 'input_size': 32, 'best_model_structure': 'best_model_structure.txt', 'num_classes': 10, 'dist_backend': 'gloo', 'train_batch_size': 128, 'eval_batch_size': 128, 'data_path': '~/data/pytorch_cifar10', 'data_set': 'CIFAR10', 'output_dir': './', 'num_workers': 10, 'pin_mem': True, 'eval_metric': 'accuracy', 'learning_rate': 0.01, 'momentum': 0.9, 'weight_decay': 0.0005, 'optimizer': 'SGD', 'criterion': 'CrossEntropyLoss', 'lr_scheduler': 'CosineAnnealingLR', 'print_freq': 10, 'metric_threshold': 94, 'mode': 'train'}


Epoch: [1]  [  0/391]  eta: 0:16:27  lr: 0.010000  loss: 2.3024 (2.3024)  time: 2.5243  data: 2.1748
Epoch: [1]  [ 10/391]  eta: 0:01:47  lr: 0.010000  loss: 2.2757 (2.2825)  time: 0.2826  data: 0.1989
Epoch: [1]  [ 20/391]  eta: 0:01:04  lr: 0.010000  loss: 2.2494 (2.2445)  time: 0.0570  data: 0.0011
Epoch: [1]  [ 30/391]  eta: 0:00:49  lr: 0.010000  loss: 2.1454 (2.1947)  time: 0.0559  data: 0.0011
Epoch: [1]  [ 40/391]  eta: 0:00:40  lr: 0.010000  loss: 2.0676 (2.1526)  time: 0.0554  data: 0.0013
Epoch: [1]  [ 50/391]  eta: 0:00:35  lr: 0.010000  loss: 1.9885 (2.1133)  time: 0.0567  data: 0.0013
Epoch: [1]  [ 60/391]  eta: 0:00:32  lr: 0.010000  loss: 1.9324 (2.0797)  time: 0.0578  data: 0.0013
Epoch: [1]  [ 70/391]  eta: 0:00:29  lr: 0.010000  loss: 1.8920 (2.0458)  time: 0.0559  data: 0.0014
Epoch: [1]  [ 80/391]  eta: 0:00:27  lr: 0.010000  loss: 1.8261 (2.0187)  time: 0.0561  data: 0.0014
Epoch: [1]  [ 90/391]  eta: 0:00:25  lr: 0.010000  loss: 1.8120 (1.9946)  time: 0.0591  dat



Test:[ 0/79]eta: 0:03:20loss: 1.3118 (1.3118)acc1: 52.3438 (52.3438)time: 2.5371data: 2.1760
Test:[10/79]eta: 0:00:17loss: 1.3664 (1.3635)acc1: 51.5625 (50.7812)time: 0.2604data: 0.1996
Test:[20/79]eta: 0:00:08loss: 1.3579 (1.3512)acc1: 50.0000 (50.3720)time: 0.0254data: 0.0013
Test:[30/79]eta: 0:00:05loss: 1.3553 (1.3536)acc1: 50.0000 (50.3024)time: 0.0221data: 0.0047
Test:[40/79]eta: 0:00:03loss: 1.3501 (1.3513)acc1: 50.7812 (50.7812)time: 0.0241data: 0.0068
Test:[50/79]eta: 0:00:02loss: 1.3693 (1.3575)acc1: 50.0000 (50.6281)time: 0.0231data: 0.0052
Test:[60/79]eta: 0:00:01loss: 1.3693 (1.3553)acc1: 50.7812 (50.8197)time: 0.0234data: 0.0057
Test:[70/79]eta: 0:00:00loss: 1.3627 (1.3589)acc1: 51.5625 (50.7923)time: 0.0206data: 0.0038


03/27/2023 01:50:43 - INFO - Trainer -   Evaluate time:4.431807279586792
03/27/2023 01:50:43 - INFO - Trainer -   Epoch 1 training time:29.34877920150757
03/27/2023 01:50:43 - INFO - Trainer -   Total time:29.349584817886353
03/27/2023 01:50:43 - INFO - Trainer -   Trainer complete


Test:[78/79]eta: 0:00:00loss: 1.3352 (1.3558)acc1: 52.3438 (50.8999)time: 0.0199data: 0.0016
Test: Total time: 0:00:04 (0.0560 s / it)
* Acc@1 50.900 loss 1.356


50.89992088607595