<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_streaming_paraformer_gpu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebook shows how to use [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) to run streaming [Paraformer](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) models on **GPU**.

Please refer to
https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_streaming_paraformer_cpu.ipynb
for how to run on CPU.

# Install sherpa-onnx

Please refer to
https://k2-fsa.github.io/sherpa/onnx/install/linux.html
for how to build sherpa-onnx with GPU support.

In [1]:
%%shell


git clone https://github.com/k2-fsa/sherpa-onnx
cd sherpa-onnx
mkdir build
cd build
cmake \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_SHARED_LIBS=ON \
  -DSHERPA_ONNX_ENABLE_GPU=ON \
  -DSHERPA_ONNX_ENABLE_PYTHON=ON ..
make -j6


Cloning into 'sherpa-onnx'...
remote: Enumerating objects: 2855, done.[K
remote: Counting objects: 100% (1332/1332), done.[K
remote: Compressing objects: 100% (532/532), done.[K
remote: Total 2855 (delta 915), reused 1006 (delta 780), pack-reused 1523[K
Receiving objects: 100% (2855/2855), 2.02 MiB | 3.02 MiB/s, done.
Resolving deltas: 100% (1647/1647), done.
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
  Compiling for NVIDIA GPU is enabled.  Please make sure cudatoolkit

  is installed on your system.  Otherwise, you will get err



Check that `sherpa-onnx` is built successfully:

In [2]:
%%shell

export PATH=$PWD/sherpa-onnx/build/bin:$PATH
export PYTHONPATH=$PWD/sherpa-onnx/build/lib:$PYTHONPATH
export PYTHONPATH=$PWD/sherpa-onnx/sherpa-onnx/python:$PYTHONPATH

python3 -c "import sherpa_onnx; print(sherpa_onnx.__file__)"

sherpa-onnx --help

/content/sherpa-onnx/sherpa-onnx/python/sherpa_onnx/__init__.py
/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:PrintUsage:402 

Usage:

  ./bin/sherpa-onnx \
    --tokens=/path/to/tokens.txt \
    --encoder=/path/to/encoder.onnx \
    --decoder=/path/to/decoder.onnx \
    --joiner=/path/to/joiner.onnx \
    --provider=cpu \
    --num-threads=2 \
    --decoding-method=greedy_search \
    /path/to/foo.wav [bar.wav foobar.wav ...]

Note: It supports decoding multiple files in batches

Default value for num_threads is 2.
Valid values for decoding_method: greedy_search (default), modified_beam_search.
Valid values for provider: cpu (default), cuda, coreml.
foo.wav should be of single channel, 16-bit PCM encoded wave file; its
sampling rate can be arbitrary and does not need to be 16kHz.

Please refer to
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html
for a list of pre-trained models to download.

Options:
  --max-active-paths          : beam size used in modified b



# Download pre-trained streaming paraformer model

Please see
https://k2-fsa.github.io/sherpa/onnx/pretrained_models/online-paraformer/index.html
for details

In [4]:
%%shell

sudo apt-get install git-lfs
git clone https://huggingface.co/csukuangfj/sherpa-onnx-streaming-paraformer-bilingual-zh-en
ls -lh sherpa-onnx-streaming-paraformer-bilingual-zh-en

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Cloning into 'sherpa-onnx-streaming-paraformer-bilingual-zh-en'...
remote: Enumerating objects: 18, done.[K
remote: Counting objects: 100% (18/18), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 18 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (18/18), 949.80 KiB | 10.10 MiB/s, done.
Filtering content: 100% (4/4), 1.02 GiB | 71.57 MiB/s, done.
total 1.1G
-rw-r--r-- 1 root root  69M Aug 14 13:26 decoder.int8.onnx
-rw-r--r-- 1 root root 218M Aug 14 13:26 decoder.onnx
-rw-r--r-- 1 root root 158M Aug 14 13:26 encoder.int8.onnx
-rw-r--r-- 1 root root 607M Aug 14 13:26 encoder.onnx
-rw-r--r-- 1 root root  415 Aug 14 13:26 README.md
drwxr-xr-x 2 root root 4.0K Aug 14 13:26 test_wavs
-rw-r--r-- 1 root root  74K Aug 14 13:26 



# Real-time factor (RTF) test

## float32 (CPU)

In [5]:
%%shell
export PATH=$PWD/sherpa-onnx/build/bin:$PATH

sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
  --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx \
  --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx \
  --num-threads=1 \
  --provider=cpu \
  ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav


/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx --num-threads=1 --provider=cpu ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_con



## float32 (GPU)

In [6]:
%%shell
export PATH=$PWD/sherpa-onnx/build/bin:$PATH

sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
  --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx \
  --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx \
  --num-threads=1 \
  --provider=cuda \
  ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav

/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx --num-threads=1 --provider=cuda ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cuda", model_type=""), lm_config=OnlineLMConfig(model="", scale=0.5), endpoint_c



## int8 (CPU)

In [7]:
%%shell
export PATH=$PWD/sherpa-onnx/build/bin:$PATH

sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
  --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx \
  --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx \
  --num-threads=1 \
  --provider=cpu \
  ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav

/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx --num-threads=1 --provider=cpu ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cpu", model_type=""), lm_config=OnlineLMConfig(model="", scal



## int8 (GPU)

In [8]:
%%shell
export PATH=$PWD/sherpa-onnx/build/bin:$PATH

sherpa-onnx \
  --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt \
  --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx \
  --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx \
  --num-threads=1 \
  --provider=cuda \
  ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav

/content/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 sherpa-onnx --tokens=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt --paraformer-encoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx --paraformer-decoder=./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx --num-threads=1 --provider=cuda ./sherpa-onnx-streaming-paraformer-bilingual-zh-en/test_wavs/0.wav 

OnlineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="", decoder="", joiner=""), paraformer=OnlineParaformerModelConfig(encoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/encoder.int8.onnx", decoder="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/decoder.int8.onnx"), tokens="./sherpa-onnx-streaming-paraformer-bilingual-zh-en/tokens.txt", num_threads=1, debug=False, provider="cuda", model_type=""), lm_config=OnlineLMConfig(model="", sc



Please refer to
https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_streaming_paraformer_cpu.ipynb
for other use cases.