LSTM-transducer-based Models

Hint

Please refer to install_sherpa_ncnn to install sherpa-ncnn before you read this section.

marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13 (Bilingual, Chinese + English)

This model is a small version of lstm-transducer trained in icefall.

It only has 13.3 million parameters and can be deployed on embedded devices for real-time speech recognition. You can find the models in fp16 format at https://huggingface.co/marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13.

The model is trained on a bi-lingual dataset tal_csasr (Chinese + English), so it can be used for both Chinese and English.

In the following, we show you how to download it and deploy it with sherpa-ncnn.

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2

Note

Please refer to sherpa-ncnn-embedded-linux-arm-install for how to compile sherpa-ncnn for a 32-bit ARM platform.

Decode a single wave file with ./build/bin/sherpa-ncnn

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav

Note

The default option uses 4 threads and greedy_search for decoding.

Note

Please use ./build/bin/Release/sherpa-ncnn.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

csukuangfj/sherpa-ncnn-2022-09-05 (English)

This is a model trained using the GigaSpeech and the LibriSpeech dataset.

Please see k2-fsa/icefall#558 for how the model is trained.

You can find the training code at

https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/lstm_transducer_stateless2

In the following, we describe how to download it and use it with sherpa-ncnn.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-05.tar.bz2
tar xvf sherpa-ncnn-2022-09-05.tar.bz2

Decode a single wave file

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-2022-09-05/tokens.txt \
    ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-05/test_wavs/1089-134686-0001.wav \
    2 \
    $method
done

You should see the following output:

./code-lstm/2022-09-05.txt

Note

Please use ./build/bin/Release/sherpa-ncnn.exe for Windows.

Real-time speech recognition from a microphone

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn-microphone \
  ./sherpa-ncnn-2022-09-05/tokens.txt \
  ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-05/joiner_jit_trace-pnnx.ncnn.bin \
  2 \
  greedy_search

Note

Please use ./build/bin/Release/sherpa-ncnn-microphone.exe for Windows.

It will print something like below:

Number of threads: 4
num devices: 4
Use default device: 2
  Name: MacBook Pro Microphone
  Max input channels: 1
Started

Speak and it will show you the recognition result in real-time.

You can find a demo below:

m6ynSxycpX0

csukuangfj/sherpa-ncnn-2022-09-30 (Chinese)

This is a model trained using the WenetSpeech dataset.

Please see k2-fsa/icefall#595 for how the model is trained.

In the following, we describe how to download it and use it with sherpa-ncnn.

Download the model

Please use the following commands to download it.

cd /path/to/sherpa-ncnn

wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-2022-09-30.tar.bz2
tar xvf sherpa-ncnn-2022-09-30.tar.bz2

Decode a single wave file

Hint

It supports decoding only wave files with a single channel and the sampling rate should be 16 kHz.

cd /path/to/sherpa-ncnn

for method in greedy_search modified_beam_search; do
  ./build/bin/sherpa-ncnn \
    ./sherpa-ncnn-2022-09-30/tokens.txt \
    ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
    ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
    ./sherpa-ncnn-2022-09-30/test_wavs/0.wav \
    2 \
    $method
done

You should see the following output:

./code-lstm/2022-09-30.txt

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

Real-time speech recognition from a microphone

cd /path/to/sherpa-ncnn

./build/bin/sherpa-ncnn-microphone \
  ./sherpa-ncnn-2022-09-30/tokens.txt \
  ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-2022-09-30/joiner_jit_trace-pnnx.ncnn.bin \
  2 \
  greedy_search

Note

Please use ./build/bin/Release/sherpa-ncnn-microphone.exe for Windows.

Caution

If you use Windows and get encoding issues, please run:

CHCP 65001

in your commandline.

You can find a demo below:

bbQfoRT75oM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lstm-transducer-models.rst

lstm-transducer-models.rst

LSTM-transducer-based Models

marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13 (Bilingual, Chinese + English)

Decode a single wave file with ./build/bin/sherpa-ncnn

csukuangfj/sherpa-ncnn-2022-09-05 (English)

Download the model

Decode a single wave file

Real-time speech recognition from a microphone

csukuangfj/sherpa-ncnn-2022-09-30 (Chinese)

Download the model

Decode a single wave file

Real-time speech recognition from a microphone

Files

lstm-transducer-models.rst

Latest commit

History

lstm-transducer-models.rst

File metadata and controls

LSTM-transducer-based Models

marcoyang/sherpa-ncnn-lstm-transducer-small-2023-02-13 (Bilingual, Chinese + English)

Decode a single wave file with ./build/bin/sherpa-ncnn

csukuangfj/sherpa-ncnn-2022-09-05 (English)

Download the model

Decode a single wave file

Real-time speech recognition from a microphone

csukuangfj/sherpa-ncnn-2022-09-30 (Chinese)

Download the model

Decode a single wave file

Real-time speech recognition from a microphone