[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/denas/asr/DENAS_ASR_DEMO.ipynb)

# AIOK DE-NAS ASR DEMO

DE-NAS is a multi-model, hardware-aware, train-free NAS to construct compact model architectures for target platform directly. DE-NAS includes CNN-based search space for CV domain and Transformer-based search space for CV/NLP/ASR domains, and leverages hardware-aware train-free scoring method to evaluate the performance of the candidate architecture without training.

This demo mainly introduces ASR integration with DE-NAS to search lighter, faster, higher performance transformer-based ASR model in a training-free way.

# Content
* [Overview](#Overview)
    * [DE-NAS on ASR Domain](#DE-NAS-on-ASR-Domain)
    * [Performance](#Performance)
* [Getting Started](#Getting-Started)
    * [1. Environment Setup](#1.-Environment-Setup)
    * [2. Workflow Prepare](#2.-Workflow-Prepare)
    * [3. Data Prepare](#3.-Data-Prepare)
    * [4. Search](#4.-Launch-Search)
    * [5. Train](#5.-Train)

# Overview

## DE-NAS on ASR Domain

Recently, Transformer has achieved remarkable success in several automatic speech recognition tasks. The progresses are highly relevant to the architecture design, then it is worthwhile to propose Transformer based Neural Architecture Search to search for better automatically. We will propose an unified effective method to synaptic diversity of MSA(multi-head self-attention) and synaptic saliency of MLP, which are the basic component of transformer.

Transformer based search space consists of attention layer, layer normalization and feed forward layer, the search space can be controled by setting network depth, number attention heads, MLP layer ratio and layer dimension.

<center>
<img src="./img/asr_search_space.png" width="80%"/><figure>DE-NAS ASR Search Space and Supernet</figure>
</center>

## Performance

<img src="./img/denas_asr_perf.png" width="900"/>

* Testing methodology
    * Dataset: LibriSpeech, Metrics: WER 5.8%
    * Baseline: RNN-T model 
    * Early stop at WER 5.8%
* DE-NAS ASR searched model delivered 59.12x training speedup over stock model (RNN-T).
* Distributed training delivered 3.81x speedup with HW scaling from 1 node to 4 nodes

# Getting Started

## 1. Environment Setup

### Option 1 Setup Environment with Pip

In [None]:
%%bash
pip install e2eAIOK-denas --pre
pip install torchsummary joblib

### Option 2 Setup Environment with Docker

``` bash
# Setup ENV
git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
# Enter Docker
sshpass -p docker ssh ${host0} -p 12347
```

## 2. Workflow Prepare

### search configuration

```yaml
# conf for transformer based asr
model_type: asr
search_engine: RandomSearchEngine #supported search engine are Random/Evolutionary/SigoptSearchEngine
batch_size: 32
random_max_epochs: 10 #random search max epochs

#evolutionary search engine configs
max_epochs: 10
select_num: 50
population_num: 50
m_prob: 0.2
s_prob: 0.4
crossover_num: 25
mutation_num: 25

#searched model parameter limit
max_param_limits: 40
min_param_limits: 1

supernet_cfg: ../../conf/denas/asr/supernet_large.conf
img_size: 224
seed: 0

#enable/disable NAS scores
expressivity_weight: 0
complexity_weight: 0
diversity_weight: 1
saliency_weight: 1
latency_weight: 0
```

### supernet and search space

```yaml
SUPERNET:
  MLP_RATIO: 4.0
  NUM_HEADS: 4
  EMBED_DIM: 512
  DEPTH: 12
SEARCH_SPACE:
  MLP_RATIO:
    - 3.0
    - 3.5
    - 4.0
    - 4.5
    - 5.0
  NUM_HEADS:
    - 2
    - 3
    - 4
  DEPTH:
    - 5
    - 6
    - 7
    - 8
    - 9
    - 10
    - 11
    - 12
  EMBED_DIM:
    - 192
    - 216
    - 240
    - 324
    - 384
    - 444
```

### training configuration

```yaml
#edit /home/vmagent/app/e2eaiok/conf/denas/asr/e2eaiok_denas_train.conf 
train_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
valid_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
test_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
tokenizer_ckpt: "/home/vmagent/app/dataset/LibriSpeech/tokenizer.ckpt"
train_epochs: 1
```

## 3. Data Prepare

``` bash
# Download Dataset
# Download and unzip dataset from https://www.openslr.org/12 to /home/vmagent/app/dataset/LibriSpeech
# Download tokenizer from https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech/blob/main/tokenizer.ckpt to /home/vmagent/app/dataset/LibriSpeech

# Process audio data
cd ${e2eaiok_install_dir}/e2eAIOK/DeNas/asr
conda activate pytorch
bash scripts/preprocess_librispeech.sh
```

## 4. Search

Launch DENAS search on asr domain based on configs in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_asr.conf`, searched best model structure will be saved in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt`.

In [2]:
import yaml
from easydict import EasyDict as edict
from e2eAIOK.DeNas.asr.supernet_asr import TransformerASRSuper
from e2eAIOK.DeNas.search.SearchEngineFactory import SearchEngineFactory

# create common settings
settings = {}
settings["domain"] = "asr"
# load search settings
with open("/home/vmagent/app/e2eaiok/conf/denas/asr/e2eaiok_denas_asr.conf") as f:
    conf = yaml.load(f, Loader=yaml.FullLoader)
settings.update(conf)
settings["max_epochs"] = 1
settings["population_num"] = 1
settings["crossover_num"] = 1
settings["mutation_num"] = 1
params = edict(settings)

# create supernet and search space
super_net = TransformerASRSuper
search_space = {'num_heads': params.SEARCH_SPACE.NUM_HEADS, 'mlp_ratio': params.SEARCH_SPACE.MLP_RATIO, 'embed_dim': params.SEARCH_SPACE.EMBED_DIM , 'depth': params.SEARCH_SPACE.DEPTH}

# create search engine and launch search
searcher = SearchEngineFactory.create_search_engine(params = params, super_net = super_net, search_space = search_space)
searcher.search()
# get best searched structure
best_structure = searcher.get_best_structures()
print(f"DE-NAS completed, best structure is {best_structure}")

03/23/2023 08:38:34 - INFO - DENAS -   epoch = 0
03/23/2023 08:38:35 - INFO - DENAS -   random 1/1 structure (5, 3.0, 5.0, 4.0, 5.0, 3.0, 3, 3, 3, 4, 2, 216) nas_score 0.6014334317005705 params 11.750592
03/23/2023 08:38:35 - INFO - DENAS -   random_num = 1
03/23/2023 08:38:36 - INFO - DENAS -   mutation 1/1 structure (5, 4.5, 4.5, 4.0, 4.5, 3.0, 3, 3, 3, 4, 2, 216) nas_score 0.29840024128498044 params 11.797356
03/23/2023 08:38:36 - INFO - DENAS -   mutation_num = 1
03/23/2023 08:38:36 - INFO - DENAS -   crossover_num = 0
03/23/2023 08:38:36 - INFO - DENAS -   best structure (5, 3.0, 5.0, 4.0, 5.0, 3.0, 3, 3, 3, 4, 2, 216) nas_score 0.6014334317005705 params 11.750592


DE-NAS completed, best structure is (5, 3.0, 5.0, 4.0, 5.0, 3.0, 3, 3, 3, 4, 2, 216)


## 5. Train

Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`

In [5]:
import yaml
from easydict import EasyDict as edict
import sentencepiece as sp
import torch
from e2eAIOK.DeNas.asr.model_builder_denas_asr import ModelBuilderASRDeNas
from e2eAIOK.common.trainer.data.asr.data_builder_librispeech import DataBuilderLibriSpeech
from e2eAIOK.DeNas.asr.trainer.schedulers import NoamScheduler
from e2eAIOK.DeNas.asr.trainer.losses import ctc_loss, kldiv_loss
from e2eAIOK.DeNas.asr.utils.metric_stats import ErrorRateStats
from e2eAIOK.DeNas.asr.asr_trainer import ASRTrainer

# create common settings
settings = {}
settings["domain"] = "asr"
# load training settings
with open("/home/vmagent/app/e2eaiok/conf/denas/asr/e2eaiok_denas_train_asr.conf") as f:
    conf = yaml.load(f, Loader=yaml.FullLoader)
settings.update(conf)
settings["train_epochs"] = 1
settings["best_model_structure"] = "best_model_structure.txt"
cfg = edict(settings)

# create ASR model builder and create ASR model
model = ModelBuilderASRDeNas(cfg).create_model()
tokenizer = sp.SentencePieceProcessor()
# get training and evaluation dataloader
train_dataloader, eval_dataloader = DataBuilderLibriSpeech(cfg, tokenizer).get_dataloader()
# create optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=cfg["lr_adam"], betas=(0.9, 0.98), eps=0.000000001)
criterion = {"ctc_loss": ctc_loss, "seq_loss": kldiv_loss}
scheduler = NoamScheduler(lr_initial=cfg["lr_adam"], n_warmup_steps=cfg["n_warmup_steps"])
metric = ErrorRateStats()
# create ASR trainer
trainer = ASRTrainer(cfg, model, train_dataloader, eval_dataloader, optimizer, criterion, scheduler, metric, tokenizer)
# start model training and evaluation
trainer.fit()

03/23/2023 08:49:03 - INFO - Trainer -   building model
03/23/2023 08:49:03 - INFO - Trainer -   model created
03/23/2023 08:49:03 - INFO - Trainer -   Trainer config: {'domain': 'asr', 'seed': '74443', 'output_folder': 'results/transformer/74443', 'save_folder': 'results/transformer/74443/save', 'device': 'cpu', 'dist_backend': 'gloo', 'mode': 'train', 'best_model_structure': 'best_model_structure.txt', 'data_folder': '/home/vmagent/app/dataset/LibriSpeech', 'skip_prep': False, 'train_csv': '/home/vmagent/app/dataset/LibriSpeech/train-clean-100.csv', 'valid_csv': '/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv', 'test_csv': '/home/vmagent/app/dataset/LibriSpeech/test-clean.csv', 'tokenizer_ckpt': '/home/vmagent/app/dataset/LibriSpeech/tokenizer.ckpt', 'ckpt_interval_minutes': 30, 'train_epochs': 1, 'eval_epochs': 1, 'train_batch_size': 32, 'eval_batch_size': 1, 'num_workers': 1, 'ctc_weight': 0.3, 'grad_accumulation_factor': 1, 'max_grad_norm': 5.0, 'loss_reduction': 'batchmean',