<a href="https://colab.research.google.com/github/csukuangfj/colab/blob/sherpa-offline-transducer-standalone-2023-01-05/sherpa_standalone_offline_transducer_speech_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebook demonstrates how to use [sherpa][sherpa]
for offline (i.e., non-streaming) speech recognition.

It includes:
- How to setup the environment
- How to download pre-trained models
- How to use the pre-trained models for speech recognition

**Caution**: We will use CPU for demonstration, though [sherpa][sherpa] supports GPU.
The reason for not using GPU in this notebook is that Google colab  changes constantly the installed PyTorch and CUDA version, which would break this colab notebook if we were using GPU.

[sherpa]: https://github.com/k2-fsa/sherpa

# Setup the environment

## Install PyTorch

Instead of using the default installed PyTorch, we use the latest PyTorch, `torch 1.13.1`, as of today (2023.01.05) so that this colab notebook will still work in case Google changes the default version of installed PyTorch.

In [1]:
! python3 --version

Python 3.8.16


First, uninstall the default PyTorch installed by Google colab:

In [2]:
! pip uninstall -y torch torchaudio torchvision torchtext fastai

Found existing installation: torch 1.13.0+cu116
Uninstalling torch-1.13.0+cu116:
  Successfully uninstalled torch-1.13.0+cu116
Found existing installation: torchaudio 0.13.0+cu116
Uninstalling torchaudio-0.13.0+cu116:
  Successfully uninstalled torchaudio-0.13.0+cu116
Found existing installation: torchvision 0.14.0+cu116
Uninstalling torchvision-0.14.0+cu116:
  Successfully uninstalled torchvision-0.14.0+cu116
Found existing installation: torchtext 0.14.0
Uninstalling torchtext-0.14.0:
  Successfully uninstalled torchtext-0.14.0
Found existing installation: fastai 2.7.10
Uninstalling fastai-2.7.10:
  Successfully uninstalled fastai-2.7.10


Second, let us install the latest PyTorch `1.13.1` as of today (2023.01.05). You can change it as you wish.

In [3]:
! pip install torch torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cpu
Collecting torch
  Downloading https://download.pytorch.org/whl/cpu/torch-1.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl (199.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio
  Downloading https://download.pytorch.org/whl/cpu/torchaudio-0.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl (4.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch, torchaudio
Successfully installed torch-1.13.1+cpu torchaudio-0.13.1+cpu


## Install k2

Since we have install torch 1.13.1, we need to install a version compiled agains torch 1.13.1. If you change the torch version, please also change the following command.

In [4]:
! pip install k2==1.23.3.dev20230105+cpu.torch1.13.1 -f https://k2-fsa.org/nightly/index.html

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://k2-fsa.org/nightly/index.html
Collecting k2==1.23.3.dev20230105+cpu.torch1.13.1
  Downloading https://k2-fsa.org/nightly/whl/k2-1.23.3.dev20230105%2Bcpu.torch1.13.1-cp38-cp38-linux_x86_64.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: k2
Successfully installed k2-1.23.3.dev20230105+cpu.torch1.13.1


## Install Kaldifeat

In [5]:
! pip install --verbose kaldifeat

Using pip 22.0.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaldifeat
  Downloading kaldifeat-1.21.tar.gz (482 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m482.4/482.4 KB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-ne9cb9an/kaldifeat.egg-info
  writing /tmp/pip-pip-egg-info-ne9cb9an/kaldifeat.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-ne9cb9an/kaldifeat.egg-info/dependency_links.txt
  writing top-level names to /tmp/pip-pip-egg-info-ne9cb9an/kaldifeat.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-ne9cb9an/kaldifeat.egg-info/SOURCES.txt'
  Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
  Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt
  rea

## Install sherpa

In [6]:
! git clone https://github.com/k2-fsa/sherpa && \
  cd sherpa && \
  pip install -r ./requirements.txt && \
  python3 setup.py install

Cloning into 'sherpa'...
remote: Enumerating objects: 5106, done.[K
remote: Counting objects: 100% (1363/1363), done.[K
remote: Compressing objects: 100% (594/594), done.[K
remote: Total 5106 (delta 892), reused 1014 (delta 720), pack-reused 3743[K
Receiving objects: 100% (5106/5106), 13.02 MiB | 20.22 MiB/s, done.
Resolving deltas: 100% (3058/3058), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting websockets
  Downloading websockets-10.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.0/107.0 KB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece>=0.1.96
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m33.8 MB/s[0m eta [36m0:00:00[0m
Install

Verify that we have installed `sherpa` sucessfully:

In [7]:
! python3 -c "import sherpa; print(sherpa.__version__)"

1.1


# Download pre-trained models

We have a lot of pre-trained models listed at
https://k2-fsa.github.io/sherpa/cpp/pretrained_models/offline_transducer.html#
for downloading.

In the following, we use
<https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02>
for demonstration. You can replace it with other pre-trained models if you like.

In [8]:
! GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/WeijiZhuang/icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02
! cd icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02 && \
  git lfs pull --include "exp/cpu_jit-torch-1.10.pt" && \
  git lfs pull --include "data/lang_bpe_500/LG.pt"


Cloning into 'icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02'...
remote: Enumerating objects: 2217, done.[K
remote: Counting objects: 100% (2217/2217), done.[K
remote: Compressing objects: 100% (105/105), done.[K
remote: Total 2217 (delta 1861), reused 2201 (delta 1857), pack-reused 0[K
Receiving objects: 100% (2217/2217), 15.14 MiB | 17.76 MiB/s, done.
Resolving deltas: 100% (1861/1861), done.
Checking out files: 100% (2530/2530), done.
Git LFS: (1 of 1 files) 268.69 MB / 268.69 MB
Git LFS: (1 of 1 files) 238.28 MB / 238.28 MB


In [9]:
! mv icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02 sherpa/

# Speech recognition

## greedy_search

In [10]:
! cd sherpa && python3 ./sherpa/bin/offline_transducer_asr.py \
  --nn-model ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method greedy_search \
  --num-active-paths 4 \
  --use-gpu false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

2023-01-05 08:53:16,842 INFO [offline_transducer_asr.py:317] {'nn_model': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt', 'tokens': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt', 'decoding_method': 'greedy_search', 'num_active_paths': 4, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4, 'use_gpu': False, 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:127:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-01-05 08:53:17 WarmUp begins
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-trans

## modified_beam_search

In [11]:
! cd sherpa && python3 ./sherpa/bin/offline_transducer_asr.py \
  --nn-model ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method modified_beam_search \
  --num-active-paths 4 \
  --use-gpu false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

2023-01-05 08:53:25,515 INFO [offline_transducer_asr.py:317] {'nn_model': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt', 'tokens': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt', 'decoding_method': 'modified_beam_search', 'num_active_paths': 4, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4, 'use_gpu': False, 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:127:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-01-05 08:53:26 WarmUp begins
[I] /content/sherpa/sherpa/cpp_api/offline-recognize

## fast_beam_search (without LG)

In [12]:
! cd sherpa && ./sherpa/bin/offline_transducer_asr.py \
  --nn-model ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method fast_beam_search \
  --max-contexts 8 \
  --max-states 64 \
  --allow-partial true \
  --beam 4 \
  --use-gpu false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

2023-01-05 08:53:33,089 INFO [offline_transducer_asr.py:317] {'nn_model': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt', 'tokens': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt', 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': '', 'ngram_lm_scale': 0.01, 'beam': 4.0, 'use_gpu': False, 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:127:void sherpa::OfflineRecognizerTransducerImpl::WarmUp() 2023-01-05 08:53:34 WarmUp begins
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-

## fast_beam_search (with LG)

In [13]:
! cd sherpa && ./sherpa/bin/offline_transducer_asr.py \
  --nn-model ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt \
  --tokens ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt \
  --decoding-method fast_beam_search \
  --max-contexts 8 \
  --max-states 64 \
  --allow-partial true \
  --beam 4 \
  --LG ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/LG.pt \
  --ngram-lm-scale 0.01 \
  --use-gpu false \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav

2023-01-05 08:53:40,615 INFO [offline_transducer_asr.py:317] {'nn_model': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/exp/cpu_jit-torch-1.10.pt', 'tokens': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/tokens.txt', 'decoding_method': 'fast_beam_search', 'num_active_paths': 4, 'max_contexts': 8, 'max_states': 64, 'allow_partial': True, 'LG': './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/data/lang_bpe_500/LG.pt', 'ngram_lm_scale': 0.01, 'beam': 4.0, 'use_gpu': False, 'sound_files': ['./icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1089-134686-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0001.wav', './icefall-asr-librispeech-pruned-transducer-stateless8-2022-12-02/test_wavs/1221-135766-0002.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-transducer-impl.h:127:void sherpa::OfflineRecognizerTransducerImpl::WarmUp()