[![open in colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/denas/bert/DENAS_BERT_DEMO.ipynb)

# AIOK DE-NAS BERT Demo

DE-NAS is a multi-model, hardware-aware, train-free NAS to construct compact model architectures for target platform directly. DE-NAS includes CNN-based search space for CV domain and Transformer-based search space for CV/NLP/ASR domains, and leverages hardware-aware train-free scoring method to evaluate the performance of the candidate architecture without training.

This demo mainly introduces NLP integration with DE-NAS to search lighter, faster, higher performance transformer-based NLP model in a training-free way.

# Content

* [Overview](#overview)
    * [DE-NAS on NLP BERT Domain](#DE-NAS-on-NLP-BERT-Domain)
    * [Performance](#Performance)
* [Getting Started](#Getting-Started)
    * [1. Enviroment Setup](#1-environment-setup)
    * [2. Workflow Prepare](#2-workflow-prepare)
    * [3. Data Prepare](#3-data-prepare)
    * [4. Launch Search](#4-launch-search)
    * [5. Launch Training with Best Searched Model Structure](#5-launch-training-with-best-searched-model-structure)

# Overview 

## DE-NAS on NLP BERT Domain

### DE-NAS on BERT Search Space and Supernet
Transformer-based search space consists of number of transformer layer, number of attention head, size of query/key/value, size of MLP, and dimension of embeddings, and the supernet of DE-NAS on BERT is a BERT-based structure, which are shown as the below figure.

<center>
<img src="./img/NLP_Search_Space.png" width="800"/><figure>DE-NAS on BERT search space</figure>
</center>

### DE-NAS Searched BERT Architecture
By deploying the train-free EA search engine on DE-NAS BERT search space and supernet, the DE-NAS BERT delivered the architecture that was more compact than the BERT-Base model as shown in the below figure:

<center>
<img src="./img/DENAS BERT Architecture.png" width="500"/><figure>DE-NAS Searched BERT Architecture</figure>
</center>

## Performance

<center>
<img src="./img/Performance.png" width="500"/><figure>DE-NAS BERT Performance</figure>
</center>

As shown in the above figure, DE-NAS BERT searched model delivered totally 7.68x training speedup over stock model (BERT) within 5% F1 score regression.
* DE-NAS model optimization delivered 2.30x training speedup, where the benefit comes from the compact model (1.61x param reduction and 1.51x Flops reduction).
* Hardware scaling delivered 3.34x training speedup, where the benefit comes from 1 node to 4 nodes scaling.

# Getting Started
Noted: Need to download dataset and pretrained model manually to run this demo.

## 1. Environment Setup

### (Option 1) Use Pip install

In [None]:
! pip install e2eAIOK-denas --pre

### (Option 2) Use Docker

Step1. prepare code

``` shell
### Build docker image ###
# clone the e2eaiok repo
git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
git submodule update --init --recursive
```

Step2. build docker image

```shell
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
```

Step3. run docker and start conda env

``` shell
sshpass -p docker ssh ${host0} -p 12347
```

## 2. Workflow Prepare

* Conf for BERT DE-NAS Search

```yaml
# conf for bert
model_type: bert
search_engine: EvolutionarySearchEngine #supported search engine are Random/Evolutionary/SigoptSearchEngine
batch_size: 32
supernet_cfg: ../../conf/denas/nlp/supernet-bert-base.yaml
pretrained_bert: /home/vmagent/app/dataset/bert-base-uncased
pretrained_bert_config: /home/vmagent/app/dataset/bert-base-uncased/config.json

# conf for evolutionary search engine
random_max_epochs: 1000 #random search max epochs
max_epochs: 10 #search epoch
select_num: 50
population_num: 50
m_prob: 0.2
s_prob: 0.4
crossover_num: 25
mutation_num: 25
img_size: 128
max_param_limits: 110
min_param_limits: 55
seed: 0

# enable/disable each DE-Score
expressivity_weight: 0
complexity_weight: 0
diversity_weight: 0.00001
saliency_weight: 1
latency_weight: 0.01
```

The above yaml-format file shows the DE-NAS search relevant configuration on BERT, which was placed on the `/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_bert.conf`. It determines the type of search engine, search hyparameter (etc., batch_size, select_num and population_num), DE-Score parameters (etc., expressivity score weight and latency weight) and supernet/search space configuration (etc., supernet_cfg).

* Conf for BERT Supernet and Search Space

```yaml
# BERT supernet definition
SUPERNET:
    LAYER_NUM: 12
    NUM_ATTENTION_HEADS: 12
    HIDDEN_SIZE: 768
    INTERMEDIATE_SIZE: 3072
    QKV_SIZE: 768

# BERT search space definition
SEARCH_SPACE:
    LAYER_NUM:
        bounds:
            min: 4
            max: 12
            step: 1
        type: int
    HIDDEN_SIZE:
        bounds:
            min: 128
            max: 768
            step: 16
        type: int
    QKV_SIZE:
        bounds:
            min: 180
            max: 768
            step: 12
        type: int
    HEAD_NUM:
        bounds:
            min: 8
            max: 12
            step: 1
        type: int
    INTERMEDIATE_SIZE:
        bounds:
            min: 128
            max: 3072
            step: 32
        type: int
```

The above yaml-format file describes the details of BERT-base supernet and search space configuration, which was also placed on the `/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_bert.conf`. The "LAYER_NUM", "HIDDEN_SIZE", "QKV_SIZE", "HEAD_NUM" and "INTERMEDIATE_SIZE" of BERT-base supernet are determined, and the search space contains the available model parameters used in the DE-NAS search process.

* Conf for BERT DE-NAS Train

```yaml
# model configuration
domain: bert
best_model_structure: /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt
model: /home/vmagent/app/dataset/bert-base-uncased/ #pretrained-model config dir
model_dir: /home/vmagent/app/dataset/bert-base-uncased/ #pretrained-model weight dir

# task/data configuration
task_name: squad1
data_set: SQuADv1.1
num_train_examples: 87599
data_dir: /home/vmagent/app/dataset/SQuAD/
output_dir: /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/nlp/
eval_metric: "qa_f1"
do_lower_case: True
version_2_with_negative: 0
null_score_diff_threshold: 0.0
num_labels: 2

# training hyper-parameters
dist_backend: gloo
gradient_accumulation_steps: 1
warmup_proportion: 0.1
learning_rate: 0.00006
weight_decay: 0.01
train_epochs: 2
max_seq_length: 384
doc_stride: 128
train_batch_size: 32
eval_batch_size: 8
eval_step: 500
n_best_size: 20
max_answer_length: 30
max_query_length: 64
criterion: "CrossEntropyQALoss"
optimizer: "BertAdam"
lr_scheduler: "warmup_linear"
num_workers: 1
pin_mem: True
verbose_logging: False
no_cuda: True
```

The above yaml-format file is used in the DE-NAS training process, which was placed on the `/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_train_bert.conf`. It describes the dataset/task settings (etc., task_name and data_dir), model settings (etc., model and model dir) and training hyper-parameters (etc., learning_rate and train_epochs).

* Download pre-trained model from Hugging Face
    * Download and extract one of BERT-Base-Uncased pretrained models from [Hugging Face repository](https://huggingface.co/bert-base-uncased/tree/main) to `/home/vmagent/app/dataset/bert-base-uncased/`

## 3. Data Prepare

* Prepare Dataset
    * Download Dataset: Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.
    * Download from below path to `/home/vmagent/app/dataset/SQuAD`
        * Train Data: [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
        * Test Data: [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
``` bash
Data Format:
{
    "answers": {
        "answer_start": [1],
        "text": ["This is a test text"]
    },
    "context": "This is a test context.",
    "id": "1",
    "question": "Is this a test?",
    "title": "train test"
}
```

## 4. Launch Search

Launch DE-NAS search process on NLP domain BERT with the input of overall search configuration `/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_bert.conf`, and will produce the best model structure as a tuple `(layer_num, head_num, qkv_size, hidden_size, intermediate_size)` in the `best_model_structure.txt` file.

In [8]:
from e2eAIOK.DeNas.search.utils import parse_config
from e2eAIOK.DeNas.nlp.supernet_bert import SuperBertModel, BertConfig
from e2eAIOK.DeNas.nlp.utils import generate_search_space
from e2eAIOK.DeNas.search.SearchEngineFactory import SearchEngineFactory

# parse DE-NAS search configure
params = parse_config('/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_bert.conf')

# construct supernet and search space
config = BertConfig.from_json_file(params.pretrained_bert_config)
super_net = SuperBertModel.from_pretrained(params.pretrained_bert, config)
search_space = generate_search_space(params["SEARCH_SPACE"])

# create DE-NAS searcher
searcher = SearchEngineFactory.create_search_engine(params = params, super_net = super_net, search_space = search_space)

# trigger the search process
searcher.search()
best_structure = searcher.get_best_structures()
print(f"DE-NAS completed, best structure is {best_structure}")

paths: /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/asr/utils, /home/vmagent/app/e2eaiok/e2eAIOK/DeNas/asr
['/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/opt/intel/oneapi/advisor/2022.3.0/pythonapi', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/lib/python39.zip', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/lib/python3.9', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/lib/python3.9/lib-dynload', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/lib/python3.9/site-packages', '/opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/lib/python3.9/site-packages/e2eAIOK-0.2.1-py3.9.egg', '', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas', '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/asr']
loading archive file /home/vmagent/app/dataset/bert-base-uncased
12/01/2022 13:43:12 - INFO - nlp.super

## 5. Launch Training with Best Searched Model Structure

Launch DE-NAS training process on NLP domain BERT with the input of overall training configuration `/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_train_bert.conf`, and will produce the fine-tuned BERT performance on SQuADv1.1 task.

Noted: Below performance result is using sample dataset and small iterations to demonstrate its function. The actual performance result please refers to the [performance section](#performance).

In [13]:
from e2eAIOK.DeNas.nlp.model_builder_denas_nlp import ModelBuilderNLPDeNas
from e2eAIOK.common.trainer.data.nlp.data_builder_squad import DataBuilderSQuAD
from e2eAIOK.DeNas.nlp.utils import bert_create_optimizer, bert_create_criterion, bert_create_scheduler, bert_create_metric
from e2eAIOK.DeNas.nlp.bert_trainer import BERTTrainer
from e2eAIOK.DeNas.search.utils import parse_config


# parse DE-NAS train configure
cfg = parse_config("/home/vmagent/app/e2eaiok/conf/denas/nlp/e2eaiok_denas_train_bert.conf")

# construct model, dataloader, optimizer, criterion, scheduler and metric
model = ModelBuilderNLPDeNas(cfg).create_model()
train_dataloader, eval_dataloader, other_data = DataBuilderSQuAD(cfg).get_dataloader()
cfg.num_train_steps = len(train_dataloader)
optimizer = bert_create_optimizer(model, cfg)
criterion = bert_create_criterion(cfg)
scheduler = bert_create_scheduler(cfg)
metric = bert_create_metric(cfg)

# create DE-NAS trainer
trainer = BERTTrainer(cfg, model, train_dataloader, eval_dataloader, other_data, optimizer, criterion, scheduler, metric)

# trigger the training process
trainer.fit()

2022-12-01 13:52:41,414 - __main__ - INFO - MASTER_ADDR=127.0.0.1
2022-12-01 13:52:41,414 - __main__ - INFO - MASTER_PORT=29500
2022-12-01 13:52:41,415 - __main__ - INFO - I_MPI_PIN_DOMAIN=[0xffffffffffff0,]
2022-12-01 13:52:41,415 - __main__ - INFO - OMP_NUM_THREADS=48
2022-12-01 13:52:41,415 - __main__ - INFO - Using Intel OpenMP
2022-12-01 13:52:41,416 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0
2022-12-01 13:52:41,416 - __main__ - INFO - KMP_BLOCKTIME=1
2022-12-01 13:52:41,416 - __main__ - INFO - LD_PRELOAD=/opt/intel/oneapi/intelpython/latest/lib/libiomp5.so
2022-12-01 13:52:41,416 - __main__ - INFO - CCL_WORKER_COUNT=4
2022-12-01 13:52:41,416 - __main__ - INFO - CCL_WORKER_AFFINITY=0,1,2,3
2022-12-01 13:52:41,416 - __main__ - INFO - mpiexec.hydra -l -np 1 -ppn 1 -genv I_MPI_PIN_DOMAIN=[0xffffffffffff0,] -genv OMP_NUM_THREADS=48 /opt/intel/oneapi/intelpython/latest/envs/pytorch-1.12.0/bin/python -u train.py --domain bert --conf /home/vmagent/app/e2eaiok/conf/dena

[0] 12/01/2022 13:52:44 - INFO - e2eAIOK.common.trainer.data.data_builder_squad -   load 1027 examples!
[0] 12/01/2022 13:52:46 - INFO - e2eAIOK.DeNas.module.nlp.tokenization -   loading vocabulary file
[0] 12/01/2022 13:52:47 - INFO - e2eAIOK.common.trainer.data.data_builder_squad -   load 1680 examples!
[0] 12/01/2022 13:52:49 - INFO - Trainer -   Trainer config: {'domain': 'bert', 'task_name': 'squad1', 'data_set': 'SQuADv1.1', 'num_train_examples': 87599, 'best_model_structure': '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt', 'model': '/home/vmagent/app/dataset/bert-base-uncased/', 'model_dir': '/home/vmagent/app/dataset/bert-base-uncased/', 'data_dir': '/home/vmagent/app/dataset/SQuAD/', 'output_dir': '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/nlp/', 'dist_backend': 'gloo', 'gradient_accumulation_steps': 1, 'warmup_proportion': 0.1, 'learning_rate': 3e-05, 'weight_decay': 0.0001, 'train_epochs': 4, 'max_seq_length': 384, 'doc_stride': 128, 'train_batch_size': 12,