[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/intel/e2eAIOK/blob/main/demo/denas/asr/DENAS_ASR_DEMO.ipynb)

# AIOK DE-NAS ASR DEMO

DE-NAS is a multi-model, hardware-aware, train-free NAS to construct compact model architectures for target platform directly. DE-NAS includes CNN-based search space for CV domain and Transformer-based search space for CV/NLP/ASR domains, and leverages hardware-aware train-free scoring method to evaluate the performance of the candidate architecture without training.

This demo mainly introduces ASR integration with DE-NAS to search lighter, faster, higher performance transformer-based ASR model in a training-free way.

# Content

* [DE-NAS on ASR Domain](#DE-NAS-on-ASR-Domain)
* [Getting Started](#Getting-Started)
    * [Environment Setup](#Environment-Setup)
    * [Workflow Prepare](#Workflow-Prepare)
    * [Configuration](#Configuration)
    * [Launch Search](#Launch-Search)
    * [Launch Training with Best Searched Model Structure](#Launch-Training-with-Best-Searched-Model-Structure)

# DE-NAS on ASR Domain

Recently, Transformer has achieved remarkable success in several automatic speech recognition tasks. The progresses are highly relevant to the architecture design, then it is worthwhile to propose Transformer based Neural Architecture Search to search for better automatically. We will propose an unified effective method to synaptic diversity of MSA(multi-head self-attention) and synaptic saliency of MLP, which are the basic component of transformer.

Transformer based search space consists of attention layer, layer normalization and feed forward layer, the search space can be controled by setting network depth, number attention heads, MLP layer ratio and layer dimension.

<center>
<img src="./img/asr_search_space.png" width="80%"/><figure>DE-NAS ASR Search Space and Supernet</figure>
</center>

# Getting Started

## Environment Setup

### Option 1 Setup Environment with Docker

``` bash
# Setup ENV
git clone https://github.com/intel/e2eAIOK.git
cd e2eAIOK
python3 scripts/start_e2eaiok_docker.py -b pytorch112 -w ${host0} ${host1} ${host2} ${host3} --proxy ""
# Enter Docker
sshpass -p docker ssh ${host0} -p 12347
```

### Option 2 Setup Environment with Pip

In [1]:
!pip install e2eAIOK-denas --pre

Collecting e2eAIOK-denas
  Using cached e2eAIOK_denas-1.0.1b2023031303-py3-none-any.whl (258 kB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting easydict
  Using cached easydict-1.10-py3-none-any.whl
Collecting scikit-image
  Using cached scikit_image-0.20.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.2 MB)
Collecting opencv-python
  Using cached opencv_python-4.7.0.72-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.8 MB)
Collecting transformers
  Using cached transformers-4.27.1-py3-none-any.whl (6.7 MB)
Collecting sigopt
  Using cached sigopt-8.7.0-py2.py3-none-any.whl (211 kB)
Collecting tensorboard
  Using cached tensorboard-2.12.0-py3-none-any.whl (5.6 MB)
Collecting torchaudio~=0.12.0
  Using cached torchaudio-0.12.1-cp39-cp39-manylinux1_x86_64.whl (3.7 MB)
Collecting thop
  Using cached thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting timm
  Using 

  Attempting uninstall: PyYAML
    Found existing installation: PyYAML 6.0
    Uninstalling PyYAML-6.0:
      Successfully uninstalled PyYAML-6.0
Successfully installed GitPython-3.1.31 PyWavelets-1.4.1 PyYAML-5.4.1 absl-py-1.4.0 backoff-1.11.1 boto3-1.26.92 botocore-1.29.92 cachetools-5.3.0 charset-normalizer-3.1.0 click-8.1.3 e2eAIOK-denas-1.0.1b2023031303 easydict-1.10 filelock-3.10.0 gitdb-4.0.10 google-auth-2.16.2 google-auth-oauthlib-0.4.6 grpcio-1.53.0rc2 huggingface-hub-0.13.2 imageio-2.26.0 jmespath-1.0.1 lazy_loader-0.1 markdown-3.4.1 networkx-3.0 numpy-1.24.2 oauthlib-3.2.2 opencv-python-4.7.0.72 pillow-9.4.0 protobuf-4.22.1 ptflops-0.6.9 pyasn1-0.5.0rc2 pyasn1-modules-0.3.0rc1 pypng-0.20220715.0 regex-2022.10.31 requests-2.28.2 requests-oauthlib-1.3.1 rsa-4.9 s3transfer-0.6.0 safetensors-0.3.0 scikit-image-0.20.0 scipy-1.9.1 sentencepiece-0.1.97 sigopt-8.7.0 smmap-5.0.0 tensorboard-2.12.0 tensorboard-data-server-0.7.0 tensorboard-plugin-wit-1.8.1 thop-0.1.1.post2209072238 t

## Workflow Prepare

``` bash
# Download Dataset
# Download and unzip dataset from https://www.openslr.org/12 to /home/vmagent/app/dataset/LibriSpeech
# Download tokenizer from https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech/blob/main/tokenizer.ckpt to /home/vmagent/app/dataset/LibriSpeech

# Process audio data
cd ${e2eaiok_install_dir}/e2eAIOK/DeNas/asr
conda activate pytorch
bash scripts/preprocess_librispeech.sh
```

## Configuration

* DE-NAS ASR search configuration

```yaml
# conf for transformer based asr
model_type: asr
search_engine: RandomSearchEngine #supported search engine are Random/Evolutionary/SigoptSearchEngine
batch_size: 32
random_max_epochs: 10 #random search max epochs

#evolutionary search engine configs
max_epochs: 10
select_num: 50
population_num: 50
m_prob: 0.2
s_prob: 0.4
crossover_num: 25
mutation_num: 25

#searched model parameter limit
max_param_limits: 40
min_param_limits: 1

supernet_cfg: ../../conf/denas/asr/supernet_large.conf
img_size: 224
seed: 0

#enable/disable NAS scores
expressivity_weight: 0
complexity_weight: 0
diversity_weight: 1
saliency_weight: 1
latency_weight: 0
```

* DE-NAS ASR supernet and search space

```yaml
SUPERNET:
  MLP_RATIO: 4.0
  NUM_HEADS: 4
  EMBED_DIM: 512
  DEPTH: 12
SEARCH_SPACE:
  MLP_RATIO:
    - 3.0
    - 3.5
    - 4.0
    - 4.5
    - 5.0
  NUM_HEADS:
    - 2
    - 3
    - 4
  DEPTH:
    - 5
    - 6
    - 7
    - 8
    - 9
    - 10
    - 11
    - 12
  EMBED_DIM:
    - 192
    - 216
    - 240
    - 324
    - 384
    - 444
```

* DE-NAS ASR training configuration

```yaml
#edit /home/vmagent/app/e2eaiok/conf/denas/asr/e2eaiok_denas_train.conf 
train_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
valid_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
test_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
tokenizer_ckpt: "/home/vmagent/app/dataset/LibriSpeech/tokenizer.ckpt"
train_epochs: 1
```

## Launch Search

Launch DENAS search on asr domain based on configs in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_asr.conf`, searched best model structure will be saved in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt`.

In [4]:
%%bash
cd ${e2eAIOK_install_dir}/e2eAIOK/DeNas
sed -i '/max_epochs:/ s/:.*/: 1/' ../../conf/denas/asr/e2eaiok_denas_asr.conf
sed -i '/population_num:/ s/:.*/: 1/' ../../conf/denas/asr/e2eaiok_denas_asr.conf
sed -i '/crossover_num:/ s/:.*/: 1/' ../../conf/denas/asr/e2eaiok_denas_asr.conf
sed -i '/mutation_num:/ s/:.*/: 1/' ../../conf/denas/asr/e2eaiok_denas_asr.conf
python -u search.py --domain asr --conf ../../conf/denas/asr/e2eaiok_denas_asr.conf

03/16/2023 07:29:18 - INFO - DENAS -   epoch = 0
03/16/2023 07:29:22 - INFO - DENAS -   random 1/1 structure (11, 4.5, 3.0, 4.0, 5.0, 4.5, 4.5, 4.0, 4.5, 4.0, 5.0, 3.5, 4, 2, 3, 2, 2, 4, 3, 4, 4, 4, 2, 240) nas_score 11.435986093827523 params 18.131448
03/16/2023 07:29:22 - INFO - DENAS -   random_num = 1
03/16/2023 07:29:25 - INFO - DENAS -   mutation 1/1 structure (12, 4.5, 3.0, 4.0, 5.0, 4.5, 4.5, 4.0, 4.5, 4.0, 3.0, 3.5, 4.0, 3, 2, 3, 2, 2, 4, 3, 4, 4, 4, 2, 2, 240) nas_score 12.15141583251534 params 18.594888
03/16/2023 07:29:25 - INFO - DENAS -   mutation_num = 1
03/16/2023 07:29:25 - INFO - DENAS -   crossover_num = 0
03/16/2023 07:29:25 - INFO - DENAS -   best structure (12, 4.5, 3.0, 4.0, 5.0, 4.5, 4.5, 4.0, 4.5, 4.0, 3.0, 3.5, 4.0, 3, 2, 3, 2, 2, 4, 3, 4, 4, 4, 2, 2, 240) nas_score 12.15141583251534 params 18.594888


DE-NAS search best structure took 6.812132492021192 sec
DE-NAS completed, best structure is (12, 4.5, 3.0, 4.0, 5.0, 4.5, 4.5, 4.0, 4.5, 4.0, 3.0, 3.5, 4.0, 3, 2, 3, 2, 2, 4, 3, 4, 4, 4, 2, 2, 240)


## Launch Training with Best Searched Model Structure

Edit `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train.conf`

```yaml
train_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
valid_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
test_csv: "/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv"
tokenizer_ckpt: "/home/vmagent/app/dataset/LibriSpeech/tokenizer.ckpt"
train_epochs: 1
```

Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`

In [10]:
%%bash
cd ${e2eAIOK_install_dir}/e2eAIOK/DeNas
sed -i '/train_epochs:/ s/:.*/: 1/' ../../conf/denas/asr/e2eaiok_denas_train_asr.conf
python train.py --domain asr --conf ../../conf/denas/asr/e2eaiok_denas_train_asr.conf --random_seed 74443

03/16/2023 07:40:54 - INFO - Trainer -   building model
03/16/2023 07:40:54 - INFO - Trainer -   model created
03/16/2023 07:40:54 - INFO - Trainer -   Trainer config: {'domain': 'asr', 'conf': '../../conf/denas/asr/e2eaiok_denas_train_asr.conf', 'random_seed': 74443, 'seed': '74443', 'output_folder': 'results/transformer/74443', 'save_folder': 'results/transformer/74443/save', 'device': 'cpu', 'dist_backend': 'gloo', 'mode': 'train', 'best_model_structure': '/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt', 'data_folder': '/home/vmagent/app/dataset/LibriSpeech', 'skip_prep': False, 'train_csv': '/home/vmagent/app/dataset/LibriSpeech/train-clean-100.csv', 'valid_csv': '/home/vmagent/app/dataset/LibriSpeech/dev-clean.csv', 'test_csv': '/home/vmagent/app/dataset/LibriSpeech/test-clean.csv', 'tokenizer_ckpt': '/home/vmagent/app/dataset/LibriSpeech/tokenizer.ckpt', 'ckpt_interval_minutes': 30, 'train_epochs': 1, 'eval_epochs': 1, 'train_batch_size': 32, 'eval_batch_size': 