<a href="https://colab.research.google.com/github/shunsukemlab/espnet_onnx/blob/master/demo/simple_asr_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# espnet_onnx demonstration

This notebook provides a simple demonstration of how to export your trained model and use it for inference.

see also:
- ESPnet: https://github.com/espnet/espnet
- espnet_onnx: https://github.com/Masao-Someki/espnet_onnx

Author: [Masao Someki](https://github.com/Masao-Someki)


## Table of Contents

- Install Dependency
- Export your model
- Inference with onnx

# Install Dependency
To run this demo, you need to install the following packages.
- espnet_onnx
- torch >= 1.11.0 (already installed in Colab)
- espnet
- espnet_model_zoo
- onnx

`torch`, `espnet`, `espnet_model_zoo`, `onnx` is required to run the exportation demo.

In [1]:
!pip install -U espnet_onnx espnet espnet_model_zoo onnx --no-cache-dir

# in this demo, we need to update scipy to avoid an error
!pip install -U scipy numpy==1.23.5 pyworld==0.3.2

Collecting scipy
  Using cached scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
Collecting pyworld==0.3.2
  Using cached pyworld-0.3.2.tar.gz (214 kB)
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Downloading scipy-1.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 MB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: pyworld
  Building wheel for pyworld (pyproject.toml) ... [?25l[?25hdone
  Created wheel for pyworld: filename=pyworld-0.3.2-cp310-cp310-linux_x86_64.whl size=859798 sha256=283e3bdd851ea3d2bf011284cf0de2936e18b8438fde8f7bde8d358d762a5132
  Stored in directory: /root/.cache/pip/wheels/35/48/7e/e25bdd25fda4326d47010c157709436a6ee7a1423e18a24195
Successfully built pyworld
Insta

And we need additional dependency `onnxruntime-gpu` to run inference on the GPU.

In [2]:
!pip install onnxruntime-gpu

Collecting onnxruntime-gpu
  Using cached onnxruntime_gpu-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Using cached onnxruntime_gpu-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (291.5 MB)
Installing collected packages: onnxruntime-gpu
Successfully installed onnxruntime-gpu-1.20.1


# Export your model

## Export model from espnet_model_zoo

The easiest way to export a model is to use `espnet_model_zoo`. You can download, unpack, and export the pretrained models with `export_from_pretrained` method.
`espnet_onnx` will save the onnx models into cache directory, which is `${HOME}/.cache/espnet_onnx` in default.

In [4]:
!wget https://raw.githubusercontent.com/espnet/espnet_onnx/master/espnet_onnx/export/convert_map.yml -O /usr/local/lib/python3.10/dist-packages/espnet_onnx/export/convert_map.yml


--2024-12-23 20:21:25--  https://raw.githubusercontent.com/espnet/espnet_onnx/master/espnet_onnx/export/convert_map.yml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1961 (1.9K) [text/plain]
Saving to: ‘/usr/local/lib/python3.10/dist-packages/espnet_onnx/export/convert_map.yml’


2024-12-23 20:21:25 (22.1 MB/s) - ‘/usr/local/lib/python3.10/dist-packages/espnet_onnx/export/convert_map.yml’ saved [1961/1961]



In [15]:
# export the model.
from espnet_onnx.export import ASRModelExport

tag_name = 'reazon-research/reazonspeech-espnet-v2'

m = ASRModelExport()
m.set_export_config(
    max_seq_len=5000,
)
m.export_from_pretrained(
    tag_name,  optimize=True,
  quantize=False
)

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

  torch.load(model_file, map_location=device),


# Inference with onnxruntime
Now, let's use the exported models for inference.
Please enable the GPU resource to run the following codes.

In [22]:
# please provide the tag_name to specify exported model.
tag_name = 'reazon-research/reazonspeech-espnet-v2'
export_dir = f'/root/.cache/espnet_onnx/{tag_name}'
# upload wav file and let's inference!
import librosa
from google.colab import files

wav_file = files.upload()
y, sr = librosa.load(list(wav_file.keys())[0], sr=16000)

# Use the exported onnx file to inference.
from espnet_onnx import Speech2Text

speech2text = Speech2Text(model_dir=export_dir, providers=['CPUExecutionProvider'])
nbest = speech2text(y)
print(nbest[0][0])

Saving common_voice_ja_19482491.wav to common_voice_ja_19482491 (5).wav


RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'/model/encoders/encoders.0/self_attn/Add_4' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 146 by 291


In [14]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [18]:
!ls /root/.cache/espnet_onnx/reazon-research/reazonspeech-espnet-v2/

config.yaml  feats_stats.npz  full  quantize


'common_voice_ja_19482491 (1).wav'   common_voice_ja_19482491.wav   sample_data
