<a href="https://colab.research.google.com/github/csukuangfj/colab/blob/master/sherpa_standalone_offline_transducer_speech_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebook demonstrates how to use [sherpa][sherpa]
for offline (i.e., non-streaming) speech recognition.

It includes:
- How to setup the environment
- How to download pre-trained models
- How to use the pre-trained models for speech recognition


[sherpa]: https://github.com/k2-fsa/sherpa

# Setup the environment

## Install PyTorch

Colab has already instaed PyTorch for us. Let us check the version of PyTorch.

In [1]:
! python3 -c "import torch; print(torch.__version__)"

2.0.1+cu118


## Install k2

We follow https://k2-fsa.github.io/k2/installation/from_wheels.html to install [k2][k2]

Since the installed torch is of version `2.0.1+cu118`, we have to install a version of `k2` that is compiled against `torch 2.0.1+cu118`.

From https://k2-fsa.github.io/k2/cuda.html we know the latest version is `k2-1.24.3.dev20230718+cuda11.8.torch2.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl`, so we use the following command to install [k2][k2]:

[k2]: https://github.com/k2-fsa/k2

In [2]:
! pip install k2==1.24.3.dev20230718+cuda11.8.torch2.0.1 -f https://k2-fsa.github.io/k2/cuda.html

Looking in links: https://k2-fsa.github.io/k2/cuda.html
Collecting k2==1.24.3.dev20230718+cuda11.8.torch2.0.1
  Downloading https://huggingface.co/csukuangfj/k2/resolve/main/cuda/k2-1.24.3.dev20230718%2Bcuda11.8.torch2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m117.9/117.9 MB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: k2
Successfully installed k2-1.24.3.dev20230718+cuda11.8.torch2.0.1


Check that [k2][k2] has been successfully installed:

[k2]: https://github.com/k2-fsa/k2

In [3]:
! python3 -m k2.version

Collecting environment information...

k2 version: 1.24.3
Build type: Release
Git SHA1: e400fa3b456faf8afe0ee5bfe572946b4921a3db
Git date: Sat Jul 15 04:21:50 2023
Cuda used to build k2: 11.8
cuDNN used to build k2: 
Python version used to build k2: 3.10
OS used to build k2: CentOS Linux release 7.9.2009 (Core)
CMake version: 3.26.4
GCC version: 9.3.1
CMAKE_CUDA_FLAGS:  -Wno-deprecated-gpu-targets   -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_35,code=sm_35  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_50,code=sm_50  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_60,code=sm_60  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_61,code=sm_61  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_70,code=sm_7

## Install Kaldifeat

We use the method from https://csukuangfj.github.io/kaldifeat/installation/from_wheels.html#linux-cuda
to install `kaldifeat`.

In [4]:
! pip install kaldifeat==1.24.dev20230722+cuda11.8.torch2.0.1 -f https://csukuangfj.github.io/kaldifeat/cuda.html

Looking in links: https://csukuangfj.github.io/kaldifeat/cuda.html
Collecting kaldifeat==1.24.dev20230722+cuda11.8.torch2.0.1
  Downloading https://huggingface.co/csukuangfj/kaldifeat/resolve/main/ubuntu-cuda/kaldifeat-1.24.dev20230722%2Bcuda11.8.torch2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (574 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m574.0/574.0 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: kaldifeat
Successfully installed kaldifeat-1.24.dev20230722+cuda11.8.torch2.0.1


Check that `kaldifeat` has been installed successfully:

In [5]:
! python3 -c "import kaldifeat; print(kaldifeat.__version__)"

1.24.dev20230722+cuda11.8.torch2.0.1


## Install sherpa

In [6]:
! pip install https://huggingface.co/csukuangfj/sherpa/resolve/main/ubuntu-cuda/k2_sherpa-1.3.dev20230725%2Bcuda11.8.torch2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Collecting k2-sherpa==1.3.dev20230725+cuda11.8.torch2.0.1
  Downloading https://huggingface.co/csukuangfj/sherpa/resolve/main/ubuntu-cuda/k2_sherpa-1.3.dev20230725%2Bcuda11.8.torch2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: k2-sherpa
Successfully installed k2-sherpa-1.3.dev20230725+cuda11.8.torch2.0.1


Verify that we have installed `sherpa` sucessfully:

In [7]:
! sherpa-offline --help

[I] /var/www/sherpa/csrc/parse-options.cc:536:void sherpa::ParseOptions::PrintUsage(bool) const 2023-07-25 07:32:42.318 

Offline (non-streaming) automatic speech recognition with sherpa.

Usage:
(1) View help information.

  ./bin/sherpa-offline --help

(2) Use a pretrained model for recognition

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    foo.wav \
    bar.wav

(3) Decode wav.scp

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    --use-wav-scp=true \
    scp:wav.scp \
    ark,scp,t:results.ark,results.scp

(4) Decode feats.scp

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    --use-feats-scp=true \
    scp:feats.scp \
    ark,scp,t:results.ark,results.scp

Caution: Models from icefall use normalized audio samples, i.e., samples in
the range [-1, 1), to compute f

In [8]:
! python3 -c "import sherpa; print(sherpa.__file__)"

/usr/local/lib/python3.10/dist-packages/sherpa/__init__.py


In [9]:
! python3 -c "import sherpa; print(sherpa.__version__)"

1.3.dev20230725+cuda11.8.torch2.0.1


In [10]:
! which sherpa-online

/usr/local/bin/sherpa-online


# Download pre-trained models

We have a lot of pre-trained models listed at
https://k2-fsa.github.io/sherpa/cpp/pretrained_models/offline_transducer.html#
for downloading.

In the following, we use
<https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02>
for demonstration. You can replace it with other pre-trained models if you like.

In [11]:
! GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02
! cd icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02 && \
  git lfs pull --include "exp/cpu_jit-torch-1.10.pt" && \
  git lfs pull --include "data/lang_bpe_500/LG.pt"


Cloning into 'icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02'...
remote: Enumerating objects: 2217, done.[K
remote: Total 2217 (delta 0), reused 0 (delta 0), pack-reused 2217[K
Receiving objects: 100% (2217/2217), 15.14 MiB | 24.27 MiB/s, done.
Resolving deltas: 100% (1861/1861), done.
Updating files: 100% (2530/2530), done.


# Speech recognition

## greedy_search

In [12]:
! sherpa-offline \
  --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method=greedy_search \
  --num-active-paths=4 \
  --use-gpu=false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

[I] /var/www/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2023-07-25 07:33:43.089 sherpa-offline --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt --decoding-method=greedy_search --num-active-paths=4 --use-gpu=false ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav 

[I] /var/www/sherpa/cpp_api/bin/offline-recognizer.cc:125:int main(int, char**) 2023-07-25 07:33:43.089 OfflineRecognizerConfig(ctc_decoder_config=OfflineCtcDecoderConfig(modified=True, hlg="", lm_scale=1, search_beam=20, output_beam=8, min_active_states=30, max_active_states=10000), feat_

## modified_beam_search

In [16]:
! sherpa-offline \
  --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method=modified_beam_search \
  --num-active-paths=4 \
  --use-gpu=false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

[I] /var/www/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2023-07-25 07:35:51.541 sherpa-offline --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt --decoding-method=modified_beam_search --num-active-paths=4 --use-gpu=false ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav 

[I] /var/www/sherpa/cpp_api/bin/offline-recognizer.cc:125:int main(int, char**) 2023-07-25 07:35:51.541 OfflineRecognizerConfig(ctc_decoder_config=OfflineCtcDecoderConfig(modified=True, hlg="", lm_scale=1, search_beam=20, output_beam=8, min_active_states=30, max_active_states=10000)

## fast_beam_search (without LG)

In [17]:
! sherpa-offline \
  --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method=fast_beam_search \
  --num-active-paths=4 \
  --use-gpu=false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

[I] /var/www/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2023-07-25 07:36:26.071 sherpa-offline --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt --decoding-method=fast_beam_search --num-active-paths=4 --use-gpu=false ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav 

[I] /var/www/sherpa/cpp_api/bin/offline-recognizer.cc:125:int main(int, char**) 2023-07-25 07:36:26.071 OfflineRecognizerConfig(ctc_decoder_config=OfflineCtcDecoderConfig(modified=True, hlg="", lm_scale=1, search_beam=20, output_beam=8, min_active_states=30, max_active_states=10000), fe

## fast_beam_search (with LG)

In [19]:
! sherpa-offline --help


[I] /var/www/sherpa/csrc/parse-options.cc:536:void sherpa::ParseOptions::PrintUsage(bool) const 2023-07-25 07:36:51.401 

Offline (non-streaming) automatic speech recognition with sherpa.

Usage:
(1) View help information.

  ./bin/sherpa-offline --help

(2) Use a pretrained model for recognition

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    foo.wav \
    bar.wav

(3) Decode wav.scp

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    --use-wav-scp=true \
    scp:wav.scp \
    ark,scp,t:results.ark,results.scp

(4) Decode feats.scp

  ./bin/sherpa-offline \
    --nn-model=/path/to/cpu_jit.pt \
    --tokens=/path/to/tokens.txt \
    --use-gpu=false \
    --use-feats-scp=true \
    scp:feats.scp \
    ark,scp,t:results.ark,results.scp

Caution: Models from icefall use normalized audio samples, i.e., samples in
the range [-1, 1), to compute f

In [20]:
! sherpa-offline \
  --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method=fast_beam_search \
  --num-active-paths=4 \
  --use-gpu=false \
  --lg=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/LG.pt \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

[I] /var/www/sherpa/csrc/parse-options.cc:495:int sherpa::ParseOptions::Read(int, const char* const*) 2023-07-25 07:37:44.988 sherpa-offline --nn-model=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt --tokens=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt --decoding-method=fast_beam_search --num-active-paths=4 --use-gpu=false --lg=./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/LG.pt ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav 

[I] /var/www/sherpa/cpp_api/bin/offline-recognizer.cc:125:int main(int, char**) 2023-07-25 07:37:44.988 OfflineRecognizerConfig(ctc_decoder_config=OfflineCtcDecoderConfig(modified=True, hlg=""