Skip to content

Latest commit

 

History

History
1373 lines (857 loc) · 48.6 KB

zipformer-transducer-models.rst

File metadata and controls

1373 lines (857 loc) · 48.6 KB

Zipformer-transducer-based Models

Hint

Please refer to install_sherpa_onnx to install sherpa-onnx before you read this section.

sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12 (Chinese)

Training code for this model can be found at k2-fsa/icefall#1369. It supports only Chinese.

Please refer to https://github.com/k2-fsa/icefall/tree/master/egs/multi_zh-hans/ASR#included-training-sets for the detailed information about the training data. In total, there are 14k hours of training data.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12.tar.bz2

tar xf sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12.tar.bz2
rm sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12.tar.bz2
ls -lh sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12

The output is given below:

$ ls -lh sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12
total 668864
-rw-r--r--  1 fangjun  staff    28B Dec 12 18:59 README.md
-rw-r--r--  1 fangjun  staff   131B Dec 12 18:59 bpe.model
-rw-r--r--  1 fangjun  staff   1.2M Dec 12 18:59 decoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff   4.9M Dec 12 18:59 decoder-epoch-20-avg-1-chunk-16-left-128.onnx
-rw-r--r--  1 fangjun  staff    67M Dec 12 18:59 encoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff   249M Dec 12 18:59 encoder-epoch-20-avg-1-chunk-16-left-128.onnx
-rw-r--r--  1 fangjun  staff   1.0M Dec 12 18:59 joiner-epoch-20-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff   3.9M Dec 12 18:59 joiner-epoch-20-avg-1-chunk-16-left-128.onnx
drwxr-xr-x  8 fangjun  staff   256B Dec 12 18:59 test_wavs
-rw-r--r--  1 fangjun  staff    18K Dec 12 18:59 tokens.txt

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.onnx \
  ./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/test_wavs/DEV_T0000000000.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/test_wavs/DEV_T0000000000.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/encoder-epoch-20-avg-1-chunk-16-left-128.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/decoder-epoch-20-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-multi-zh-hans-2023-12-12/joiner-epoch-20-avg-1-chunk-16-left-128.int8.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615 (Chinese)

This model is from

https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615

which supports only Chinese as it is trained on the WenetSpeech corpus.

If you are interested in how the model is trained, please refer to k2-fsa/icefall#1130.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/icefall-asr-zipformer-streaming-wenetspeech-20230615.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/icefall-asr-zipformer-streaming-wenetspeech-20230615.tar.bz2

tar xvf icefall-asr-zipformer-streaming-wenetspeech-20230615.tar.bz2
rm icefall-asr-zipformer-streaming-wenetspeech-20230615.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

icefall-asr-zipformer-streaming-wenetspeech-20230615 fangjun$ ls -lh exp/*chunk-16-left-128.*onnx
-rw-r--r--  1 fangjun  staff    11M Jun 26 15:42 exp/decoder-epoch-12-avg-4-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff    12M Jun 26 15:42 exp/decoder-epoch-12-avg-4-chunk-16-left-128.onnx
-rw-r--r--  1 fangjun  staff    68M Jun 26 15:42 exp/encoder-epoch-12-avg-4-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff   250M Jun 26 15:43 exp/encoder-epoch-12-avg-4-chunk-16-left-128.onnx
-rw-r--r--  1 fangjun  staff   2.7M Jun 26 15:42 exp/joiner-epoch-12-avg-4-chunk-16-left-128.int8.onnx
-rw-r--r--  1 fangjun  staff    11M Jun 26 15:42 exp/joiner-epoch-12-avg-4-chunk-16-left-128.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char/tokens.txt \
  --encoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/encoder-epoch-12-avg-4-chunk-16-left-128.onnx \
  --decoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/decoder-epoch-12-avg-4-chunk-16-left-128.onnx \
  --joiner=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/joiner-epoch-12-avg-4-chunk-16-left-128.onnx \
  ./icefall-asr-zipformer-streaming-wenetspeech-20230615/test_wavs/DEV_T0000000000.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/icefall-asr-zipformer-streaming-wenetspeech-20230615.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char/tokens.txt \
  --encoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/encoder-epoch-12-avg-4-chunk-16-left-128.int8.onnx \
  --decoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/decoder-epoch-12-avg-4-chunk-16-left-128.onnx \
  --joiner=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/joiner-epoch-12-avg-4-chunk-16-left-128.int8.onnx \
  ./icefall-asr-zipformer-streaming-wenetspeech-20230615/test_wavs/DEV_T0000000000.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/icefall-asr-zipformer-streaming-wenetspeech-20230615-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char/tokens.txt \
  --encoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/encoder-epoch-12-avg-4-chunk-16-left-128.int8.onnx \
  --decoder=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/decoder-epoch-12-avg-4-chunk-16-left-128.onnx \
  --joiner=./icefall-asr-zipformer-streaming-wenetspeech-20230615/exp/joiner-epoch-12-avg-4-chunk-16-left-128.int8.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-26 (English)

This model is converted from

https://huggingface.co/Zengwei/icefall-asr-librispeech-streaming-zipformer-2023-05-17

which supports only English as it is trained on the LibriSpeech corpus.

If you are interested in how the model is trained, please refer to k2-fsa/icefall#1058.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-2023-06-26.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes below.

-rw-r--r-- 1 1001 127  240K Apr 23 06:45 bpe.model
-rw-r--r-- 1 1001 127  1.3M Apr 23 06:45 decoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r-- 1 1001 127  2.0M Apr 23 06:45 decoder-epoch-99-avg-1-chunk-16-left-128.onnx
-rw-r--r-- 1 1001 127   68M Apr 23 06:45 encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r-- 1 1001 127  250M Apr 23 06:45 encoder-epoch-99-avg-1-chunk-16-left-128.onnx
-rwxr-xr-x 1 1001 127   814 Apr 23 06:45 export-onnx-zipformer-online.sh
-rw-r--r-- 1 1001 127  254K Apr 23 06:45 joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx
-rw-r--r-- 1 1001 127 1003K Apr 23 06:45 joiner-epoch-99-avg-1-chunk-16-left-128.onnx
-rw-r--r-- 1 1001 127   216 Apr 23 06:45 README.md
drwxr-xr-x 2 1001 127  4.0K Apr 23 06:45 test_wavs
-rw-r--r-- 1 1001 127  5.0K Apr 23 06:45 tokens.txt

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-06-26.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-06-26/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-06-26-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-26/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/encoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-26/decoder-epoch-99-avg-1-chunk-16-left-128.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-26/joiner-epoch-99-avg-1-chunk-16-left-128.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-06-21 (English)

This model is converted from

https://huggingface.co/marcoyang/icefall-libri-giga-pruned-transducer-stateless7-streaming-2023-04-04

which supports only English as it is trained on the LibriSpeech and GigaSpeech corpus.

If you are interested in how the model is trained, please refer to k2-fsa/icefall#984.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-21.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-06-21.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-en-2023-06-21.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-2023-06-21.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-en-2023-06-21 fangjun$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   1.2M Jun 21 15:34 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   2.0M Jun 21 15:34 decoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   179M Jun 21 15:36 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   337M Jun 21 15:37 encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   253K Jun 21 15:34 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   1.0M Jun 21 15:34 joiner-epoch-99-avg-1.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-21/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-06-21/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-06-21.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-21/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-06-21/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-06-21-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-06-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-06-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-06-21/joiner-epoch-99-avg-1.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-en-2023-02-21 (English)

This model is converted from

https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29

which supports only English as it is trained on the LibriSpeech corpus.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

GitHub

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-2023-02-21.tar.bz2

ModelScope

cd /path/to/sherpa-onnx

GIT_LFS_SKIP_SMUDGE=1 git clone https://www.modelscope.cn/pkufool/sherpa-onnx-streaming-zipformer-en-2023-02-21.git
cd sherpa-onnx-streaming-zipformer-en-2023-02-21
git lfs pull --include "*.onnx"

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-en-2023-02-21$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root  1.3M Mar 31 23:06 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  2.0M Feb 21 20:51 decoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root  180M Mar 31 23:07 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  338M Feb 21 20:51 encoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root  254K Mar 31 23:06 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root 1003K Feb 21 20:51 joiner-epoch-99-avg-1.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-02-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-02-21/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-02-21/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-02-21.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-02-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-02-21/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-en-2023-02-21/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-2023-02-21-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-en-2023-02-21/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-2023-02-21/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-2023-02-21/joiner-epoch-99-avg-1.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 (Bilingual, Chinese + English)

This model is converted from

https://huggingface.co/csukuangfj/k2fsa-zipformer-chinese-english-mixed

which supports both Chinese and English. The model is contributed by the community and is trained on tens of thousands of some internal dataset.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2
rm sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20$ ls -lh *.onnx
-rw-r--r-- 1 kuangfangjun root  13M Mar 31 21:11 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  14M Feb 20 20:13 decoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root 174M Mar 31 21:11 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root 315M Feb 20 20:13 encoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 kuangfangjun root 3.1M Mar 31 21:11 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 kuangfangjun root  13M Feb 20 20:13 joiner-epoch-99-avg-1.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/1.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20.txt

int8

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.int8.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/test_wavs/1.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20/joiner-epoch-99-avg-1.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

shaojieli/sherpa-onnx-streaming-zipformer-fr-2023-04-14 (French)

This model is converted from

https://huggingface.co/shaojieli/icefall-asr-commonvoice-fr-pruned-transducer-stateless7-streaming-2023-04-02

which supports only French as it is trained on the CommonVoice corpus. In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-fr-2023-04-14.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-fr-2023-04-14.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-fr-2023-04-14.tar.bz2
rm sherpa-onnx-streaming-zipformer-fr-2023-04-14.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-fr-2023-04-14 shaojieli$ ls -lh *.bin

-rw-r--r-- 1 lishaojie Students  1.3M 4月  14 14:09 decoder-epoch-29-avg-9-with-averaged-model.int8.onnx
-rw-r--r-- 1 lishaojie Students  2.0M 4月  14 14:09 decoder-epoch-29-avg-9-with-averaged-model.onnx
-rw-r--r-- 1 lishaojie Students  121M 4月  14 14:09 encoder-epoch-29-avg-9-with-averaged-model.int8.onnx
-rw-r--r-- 1 lishaojie Students  279M 4月  14 14:09 encoder-epoch-29-avg-9-with-averaged-model.onnx
-rw-r--r-- 1 lishaojie Students  254K 4月  14 14:09 joiner-epoch-29-avg-9-with-averaged-model.int8.onnx
-rw-r--r-- 1 lishaojie Students 1003K 4月  14 14:09 joiner-epoch-29-avg-9-with-averaged-model.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/encoder-epoch-29-avg-9-with-averaged-model.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/decoder-epoch-29-avg-9-with-averaged-model.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/joiner-epoch-29-avg-9-with-averaged-model.onnx \
  ./sherpa-onnx-streaming-zipformer-fr-2023-04-14/test_wavs/common_voice_fr_19364697.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-fr-2023-04-14.txt

int8

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/encoder-epoch-29-avg-9-with-averaged-model.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/decoder-epoch-29-avg-9-with-averaged-model.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/joiner-epoch-29-avg-9-with-averaged-model.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-fr-2023-04-14/test_wavs/common_voice_fr_19364697.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-fr-2023-04-14-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx
./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/encoder-epoch-29-avg-9-with-averaged-model.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/decoder-epoch-29-avg-9-with-averaged-model.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-fr-2023-04-14/joiner-epoch-29-avg-9-with-averaged-model.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16 (Bilingual, Chinese + English)

Hint

It is a small model.

This model is converted from

https://huggingface.co/csukuangfj/k2fsa-zipformer-bilingual-zh-en-t

which supports both Chinese and English. The model is contributed by the community and is trained on tens of thousands of some internal dataset.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2

tar xf sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
rm sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16 fangjun$ ls -lh *.onnx
total 158M
drwxr-xr-x 2 1001 127 4.0K Mar 20 13:11 64
drwxr-xr-x 2 1001 127 4.0K Mar 20 13:11 96
-rw-r--r-- 1 1001 127 240K Mar 20 13:11 bpe.model
-rw-r--r-- 1 1001 127 3.4M Mar 20 13:11 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 1001 127  14M Mar 20 13:11 decoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 1001 127  41M Mar 20 13:11 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 1001 127  85M Mar 20 13:11 encoder-epoch-99-avg-1.onnx
-rw-r--r-- 1 1001 127 3.1M Mar 20 13:11 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r-- 1 1001 127  13M Mar 20 13:11 joiner-epoch-99-avg-1.onnx
drwxr-xr-x 2 1001 127 4.0K Mar 20 13:11 test_wavs
-rw-r--r-- 1 1001 127  55K Mar 20 13:11 tokens.txt

Hint

There are two sub-folders in the model directory: 64 and 96. The number represents chunk size. The larger the number, the lower the RTF. The default chunk size is 32.

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16.int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner-epoch-99-avg-1.int8.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 (Chinese)

Hint

It is a small model.

This model is from

https://huggingface.co/marcoyang/sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/

which supports only Chinese as it is trained on the WenetSpeech corpus.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
rm sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 fangjun$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   1.8M Sep 10 15:31 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   7.2M Sep 10 15:31 decoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff    21M Sep 10 15:31 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    39M Sep 10 15:31 encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   1.7M Sep 10 15:31 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   6.8M Sep 10 15:31 joiner-epoch-99-avg-1.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23/joiner-epoch-99-avg-1.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.

csukuangfj/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 (English)

Hint

It is a small model.

This model is from

https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small

which supports only English as it is trained on the LibriSpeech corpus.

In the following, we describe how to download it and use it with sherpa-onnx.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-onnx

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2

# For Chinese users, you can use the following mirror
# wget https://hub.nuaa.cf/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2

tar xvf sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2
rm sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.tar.bz2

Please check that the file sizes of the pre-trained models are correct. See the file sizes of *.onnx files below.

sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 fangjun$ ls -lh *.onnx
-rw-r--r--  1 fangjun  staff   527K Sep 10 17:06 decoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   2.0M Sep 10 17:06 decoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff    41M Sep 10 17:06 encoder-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff    85M Sep 10 17:06 encoder-epoch-99-avg-1.onnx
-rw-r--r--  1 fangjun  staff   253K Sep 10 17:06 joiner-epoch-99-avg-1.int8.onnx
-rw-r--r--  1 fangjun  staff   1.0M Sep 10 17:06 joiner-epoch-99-avg-1.onnx

Decode a single wave file

Hint

It supports decoding only wave files of a single channel with 16-bit encoded samples, while the sampling rate does not need to be 16 kHz.

fp32

The following code shows how to use fp32 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/joiner-epoch-99-avg-1.onnx \
  ./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17.txt

int8

The following code shows how to use int8 models to decode a wave file:

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/encoder-epoch-99-avg-1.int8.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/joiner-epoch-99-avg-1.int8.onnx \
  ./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/test_wavs/0.wav

Note

Please use ./build/bin/Release/sherpa-onnx.exe for Windows.

You should see the following output:

./code-zipformer/sherpa-onnx-streaming-zipformer-en-20M-2023-02-17-int8.txt

Real-time speech recognition from a microphone

cd /path/to/sherpa-onnx

./build/bin/sherpa-onnx-microphone \
  --tokens=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/tokens.txt \
  --encoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/encoder-epoch-99-avg-1.onnx \
  --decoder=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/decoder-epoch-99-avg-1.onnx \
  --joiner=./sherpa-onnx-streaming-zipformer-en-20M-2023-02-17/joiner-epoch-99-avg-1.onnx

Hint

If your system is Linux (including embedded Linux), you can also use sherpa-onnx-alsa to do real-time speech recognition with your microphone if sherpa-onnx-microphone does not work for you.