<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/sherpa/sherpa-offline-ctc-standalone-2023-01-07/sherpa/sherpa_standalone_offline_ctc_speech_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebook demonstrates how to use [sherpa][sherpa]
for offline (i.e., non-streaming) speech recognition.

It includes:
- How to setup the environment
- How to download pre-trained models
- How to use the pre-trained models for speech recognition

**Caution**: We will use CPU for demonstration, though [sherpa][sherpa] supports GPU.
The reason for not using GPU in this notebook is that Google colab  changes constantly the installed PyTorch and CUDA version, which would break this colab notebook if we were using GPU.

[sherpa]: https://github.com/k2-fsa/sherpa

# Setup the environment

## Install PyTorch

Instead of using the default installed PyTorch, we use the latest PyTorch, `torch 1.13.1`, as of today (2023.01.06) so that this colab notebook will still work in case Google changes the default version of installed PyTorch.

In [1]:
! python3 --version

Python 3.8.16


First, uninstall the default PyTorch installed by Google colab:

In [2]:
! pip uninstall -y torch torchaudio torchvision torchtext fastai

Found existing installation: torch 1.13.0+cu116
Uninstalling torch-1.13.0+cu116:
  Successfully uninstalled torch-1.13.0+cu116
Found existing installation: torchaudio 0.13.0+cu116
Uninstalling torchaudio-0.13.0+cu116:
  Successfully uninstalled torchaudio-0.13.0+cu116
Found existing installation: torchvision 0.14.0+cu116
Uninstalling torchvision-0.14.0+cu116:
  Successfully uninstalled torchvision-0.14.0+cu116
Found existing installation: torchtext 0.14.0
Uninstalling torchtext-0.14.0:
  Successfully uninstalled torchtext-0.14.0
Found existing installation: fastai 2.7.10
Uninstalling fastai-2.7.10:
  Successfully uninstalled fastai-2.7.10


Second, let us install the latest PyTorch `1.13.1` as of today (2023.01.06). You can change it as you wish.

In [3]:
! pip install torch==1.13.1 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cpu
Collecting torch==1.13.1
  Downloading https://download.pytorch.org/whl/cpu/torch-1.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl (199.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting torchaudio==0.13.1
  Downloading https://download.pytorch.org/whl/cpu/torchaudio-0.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl (4.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m74.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torch, torchaudio
Successfully installed torch-1.13.1+cpu torchaudio-0.13.1+cpu


## Install k2

Since we have install torch 1.13.1, we need to install a version compiled agains torch 1.13.1. If you change the torch version, please also change the following command.

In [4]:
! pip install k2==1.23.3.dev20230106+cpu.torch1.13.1 -f https://k2-fsa.org/nightly/index.html

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://k2-fsa.org/nightly/index.html
Collecting k2==1.23.3.dev20230106+cpu.torch1.13.1
  Downloading https://k2-fsa.org/nightly/whl/k2-1.23.3.dev20230106%2Bcpu.torch1.13.1-cp38-cp38-linux_x86_64.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: k2
Successfully installed k2-1.23.3.dev20230106+cpu.torch1.13.1


## Install Kaldifeat

In [5]:
! pip install --verbose kaldifeat

Using pip 22.0.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaldifeat
  Downloading kaldifeat-1.21.tar.gz (482 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m482.4/482.4 KB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Running command python setup.py egg_info
  running egg_info
  creating /tmp/pip-pip-egg-info-5gv1cxur/kaldifeat.egg-info
  writing /tmp/pip-pip-egg-info-5gv1cxur/kaldifeat.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-5gv1cxur/kaldifeat.egg-info/dependency_links.txt
  writing top-level names to /tmp/pip-pip-egg-info-5gv1cxur/kaldifeat.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-5gv1cxur/kaldifeat.egg-info/SOURCES.txt'
  Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
  Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt
  rea

## Install sherpa

In [None]:
! git clone https://github.com/k2-fsa/sherpa && \
  cd sherpa && \
  pip install -r ./requirements.txt && \
  python3 setup.py install

Cloning into 'sherpa'...
remote: Enumerating objects: 5280, done.[K
remote: Counting objects: 100% (31/31), done.[K
remote: Compressing objects: 100% (31/31), done.[K
remote: Total 5280 (delta 3), reused 4 (delta 0), pack-reused 5249[K
Receiving objects: 100% (5280/5280), 13.11 MiB | 21.58 MiB/s, done.
Resolving deltas: 100% (3188/3188), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting websockets
  Downloading websockets-10.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m107.0/107.0 KB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece>=0.1.96
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m47.8 MB/s[0m eta [36m0:00:00[0m
Installing collected

Verify that we have installed `sherpa` sucessfully:

In [None]:
! python3 -c "import sherpa; print(sherpa.__version__)"

# Usage

We have a lot of pre-trained models listed at
https://k2-fsa.github.io/sherpa/cpp/pretrained_models/offline_ctc.html
for downloading.

In the following, we demonstrate how to use models from
- [icefall](https://github.com/k2-fsa/icefall)
- [wenet](https://github.com/wenet-e2e/wenet)
- [torchaudio](https://github.com/pytorch/audio), (wav2vec 2.0)

## Models from icefall

### English

#### icefall-asr-gigaspeech-conformer-ctc

In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/wgb14/icefall-asr-gigaspeech-conformer-ctc && \
cd icefall-asr-gigaspeech-conformer-ctc && \
git lfs pull --include "exp/cpu_jit.pt" && \
git lfs pull --include "data/lang_bpe_500/HLG.pt" && \
git lfs pull --include "data/lang_bpe_500/tokens.txt" && \
mkdir test_wavs && \
cd test_wavs && \
wget https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio/resolve/main/test_wavs/1089-134686-0001.wav && \
wget https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio/resolve/main/test_wavs/1221-135766-0001.wav && \
wget https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio/resolve/main/test_wavs/1221-135766-0002.wav


##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-gigaspeech-conformer-ctc/exp/cpu_jit.pt \
  --tokens ./icefall-asr-gigaspeech-conformer-ctc/data/lang_bpe_500/tokens.txt \
  --use-gpu false \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1221-135766-0002.wav

##### Decoding with HLG

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-gigaspeech-conformer-ctc/exp/cpu_jit.pt \
  --tokens ./icefall-asr-gigaspeech-conformer-ctc/data/lang_bpe_500/tokens.txt \
  --HLG ./icefall-asr-gigaspeech-conformer-ctc/data/lang_bpe_500/HLG.pt \
  --use-gpu false \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-gigaspeech-conformer-ctc/test_wavs/1221-135766-0002.wav

#### icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09

In [None]:
! cd sherpa && \
  GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 && \
  cd icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09 && \
  git lfs pull --include "exp/cpu_jit.pt" && \
  git lfs pull --include "data/lang_bpe_500/tokens.txt" && \
  git lfs pull --include "data/lang_bpe_500/HLG.pt"


##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --tokens ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/tokens.txt \
  --use-gpu false \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

##### Decoding with HLG

In [None]:
! cd sherpa && ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/exp/cpu_jit.pt \
  --tokens ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/tokens.txt \
  --HLG ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/data/lang_bpe_500/HLG.pt \
  --lm-scale 0.9 \
  --use-gpu false \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1089-134686-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0001.wav \
  ./icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/test_wavs/1221-135766-0002.wav

#### icefall-asr-tedlium3-conformer-ctc2

In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/videodanchik/icefall-asr-tedlium3-conformer-ctc2 && \
cd icefall-asr-tedlium3-conformer-ctc2 && \
git lfs pull --include "exp/cpu_jit.pt" && \
git lfs pull --include "data/lang_bpe/HLG.pt" && \
git lfs pull --include "data/lang_bpe/tokens.txt" && \
git lfs pull --include "test_wavs/DanBarber_2010-219.wav" && \
git lfs pull --include "test_wavs/DanielKahneman_2010-157.wav" && \
git lfs pull --include "test_wavs/RobertGupta_2010U-15.wav"

##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-tedlium3-conformer-ctc2/exp/cpu_jit.pt \
  --tokens ./icefall-asr-tedlium3-conformer-ctc2/data/lang_bpe/tokens.txt \
  --use-gpu false \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/DanBarber_2010-219.wav \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/DanielKahneman_2010-157.wav \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/RobertGupta_2010U-15.wav

##### Decoding with HLG

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-tedlium3-conformer-ctc2/exp/cpu_jit.pt \
  --tokens ./icefall-asr-tedlium3-conformer-ctc2/data/lang_bpe/tokens.txt \
  --HLG ./icefall-asr-tedlium3-conformer-ctc2/data/lang_bpe/HLG.pt \
  --use-gpu false \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/DanBarber_2010-219.wav \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/DanielKahneman_2010-157.wav \
  ./icefall-asr-tedlium3-conformer-ctc2/test_wavs/RobertGupta_2010U-15.wav

#### icefall_asr_librispeech_conformer_ctc

In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/pkufool/icefall_asr_librispeech_conformer_ctc && \
cd icefall_asr_librispeech_conformer_ctc && \
git lfs pull --include "exp/cpu_jit.pt" && \
git lfs pull --include "data/lang_bpe/HLG.pt"

##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall_asr_librispeech_conformer_ctc/exp/cpu_jit.pt \
  --tokens ./icefall_asr_librispeech_conformer_ctc/data/lang_bpe/tokens.txt \
  --use-gpu false \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.wav \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.wav \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.wav

##### Decoding with HLG

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall_asr_librispeech_conformer_ctc/exp/cpu_jit.pt \
  --tokens ./icefall_asr_librispeech_conformer_ctc/data/lang_bpe/tokens.txt \
  --HLG ./icefall_asr_librispeech_conformer_ctc/data/lang_bpe/HLG.pt \
  --use-gpu false \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.wav \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.wav \
  ./icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.wav

### Chinese

#### icefall_asr_aishell_conformer_ctc


In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/pkufool/icefall_asr_aishell_conformer_ctc && \
cd icefall_asr_aishell_conformer_ctc && \
git lfs pull --include "exp/cpu_jit.pt" && \
git lfs pull --include "data/lang_char/HLG.pt"

##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt \
  --tokens ./icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt \
  --use-gpu false \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav

##### Decoding with HLG

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall_asr_aishell_conformer_ctc/exp/cpu_jit.pt \
  --tokens ./icefall_asr_aishell_conformer_ctc/data/lang_char/tokens.txt \
  --HLG ./icefall_asr_aishell_conformer_ctc/data/lang_char/HLG.pt \
  --use-gpu false \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0121.wav \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0122.wav \
  ./icefall_asr_aishell_conformer_ctc/test_waves/BAC009S0764W0123.wav

### Arabic

#### icefall-asr-mgb2-conformer_ctc-2022-27-06

In [23]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06 && \
cd icefall-asr-mgb2-conformer_ctc-2022-27-06 && \
git lfs pull --include "exp/cpu_jit.pt" && \
git lfs pull --include "data/lang_bpe_5000/HLG.pt" && \
git lfs pull --include "data/lang_bpe_5000/tokens.txt"

Cloning into 'icefall-asr-mgb2-conformer_ctc-2022-27-06'...
remote: Enumerating objects: 2471, done.[K
remote: Counting objects: 100% (2471/2471), done.[K
remote: Compressing objects: 100% (224/224), done.[K
remote: Total 2471 (delta 2259), reused 2389 (delta 2238), pack-reused 0[K
Receiving objects: 100% (2471/2471), 94.72 MiB | 22.07 MiB/s, done.
Resolving deltas: 100% (2259/2259), done.
Checking out files: 100% (2419/2419), done.
Git LFS: (1 of 1 files) 366.09 MB / 366.09 MB
Git LFS: (1 of 1 files) 2.97 GB / 2.97 GB
Git LFS: (1 of 1 files) 83.03 KB / 83.03 KB


##### Decoding with H

In [24]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-mgb2-conformer_ctc-2022-27-06/exp/cpu_jit.pt \
  --tokens ./icefall-asr-mgb2-conformer_ctc-2022-27-06/data/lang_bpe_5000/tokens.txt \
  --use-gpu false \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0053813:0054281.wav \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0051454:0052244.wav \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0052244:0053004.wav

2023-01-07 05:16:34,792 INFO [offline_ctc_asr.py:312] {'nn_model': './icefall-asr-mgb2-conformer_ctc-2022-27-06/exp/cpu_jit.pt', 'tokens': './icefall-asr-mgb2-conformer_ctc-2022-27-06/data/lang_bpe_5000/tokens.txt', 'HLG': None, 'lm_scale': 1.0, 'modified': True, 'search_beam': 20.0, 'output_beam': 8.0, 'min_active_states': 30, 'max_active_states': 10000, 'use_gpu': False, 'normalize_samples': True, 'sound_files': ['./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0053813:0054281.wav', './icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0051454:0052244.wav', './icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0052244:0053004.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-ctc-impl.h:155:void sherpa::OfflineRecognizerCtcImpl::WarmUp() 2023-01-07 05:16:38 WarmUp begins
[I] /content/sherpa/sherpa/cpp_api/offline-recog

##### Decoding with HLG

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./icefall-asr-mgb2-conformer_ctc-2022-27-06/exp/cpu_jit.pt \
  --tokens ./icefall-asr-mgb2-conformer_ctc-2022-27-06/data/lang_bpe_5000/tokens.txt \
  --HLG ./icefall-asr-mgb2-conformer_ctc-2022-27-06/data/lang_bpe_5000/HLG.pt \
  --use-gpu false \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0053813:0054281.wav \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0051454:0052244.wav \
  ./icefall-asr-mgb2-conformer_ctc-2022-27-06/test_wavs/94D37D38-B203-4FC0-9F3A-538F5C174920_spk-0001_seg-0052244:0053004.wav

## Models from wenet

### English

#### wenet-english-model

In [26]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wenet-english-model && \
cd wenet-english-model && \
git lfs pull --include "final.zip"

Cloning into 'wenet-english-model'...
remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 19 (delta 3), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (19/19), done.
Git LFS: (1 of 1 files) 221.22 MB / 221.22 MB


##### Decoding with H

In [None]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./wenet-english-model/final.zip \
  --tokens ./wenet-english-model/units.txt \
  --use-gpu false \
  --normalize-samples false \
  ./wenet-english-model/test_wavs/1089-134686-0001.wav \
  ./wenet-english-model/test_wavs/1221-135766-0001.wav \
  ./wenet-english-model/test_wavs/1221-135766-0002.wav

**Caution**: In the above command, we are using
`--normalize-samples false`. The reason is that models from wenet expect audio samples in the range `[-32768, 32767]`, so there is no need to normalize them to the range `[-1, 1]`.

### Chinese

#### wenet-chinese-model

In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wenet-chinese-model && \
cd wenet-chinese-model && \
git lfs pull --include "final.zip"

##### Decoding with H

In [29]:
! cd sherpa && python3 ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./wenet-chinese-model/final.zip \
  --tokens ./wenet-chinese-model/units.txt \
  --use-gpu false \
  --normalize-samples false \
  ./wenet-chinese-model/test_wavs/BAC009S0764W0121.wav \
  ./wenet-chinese-model/test_wavs/BAC009S0764W0122.wav \
  ./wenet-chinese-model/test_wavs/BAC009S0764W0123.wav \
  ./wenet-chinese-model/test_wavs/DEV_T0000000000.wav \
  ./wenet-chinese-model/test_wavs/DEV_T0000000001.wav \
  ./wenet-chinese-model/test_wavs/DEV_T0000000002.wav

2023-01-07 05:18:11,065 INFO [offline_ctc_asr.py:312] {'nn_model': './wenet-chinese-model/final.zip', 'tokens': './wenet-chinese-model/units.txt', 'HLG': None, 'lm_scale': 1.0, 'modified': True, 'search_beam': 20.0, 'output_beam': 8.0, 'min_active_states': 30, 'max_active_states': 10000, 'use_gpu': False, 'normalize_samples': False, 'sound_files': ['./wenet-chinese-model/test_wavs/BAC009S0764W0121.wav', './wenet-chinese-model/test_wavs/BAC009S0764W0122.wav', './wenet-chinese-model/test_wavs/BAC009S0764W0123.wav', './wenet-chinese-model/test_wavs/DEV_T0000000000.wav', './wenet-chinese-model/test_wavs/DEV_T0000000001.wav', './wenet-chinese-model/test_wavs/DEV_T0000000002.wav']}
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-ctc-impl.h:155:void sherpa::OfflineRecognizerCtcImpl::WarmUp() 2023-01-07 05:18:14 WarmUp begins
[I] /content/sherpa/sherpa/cpp_api/offline-recognizer-ctc-impl.h:168:void sherpa::OfflineRecognizerCtcImpl::WarmUp() 2023-01-07 05:18:14 WarmUp ended
./wenet-chines

## Models from torchaudio

### English

In [None]:
! cd sherpa && \
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/wav2vec2.0-torchaudio && \
cd wav2vec2.0-torchaudio && \
git lfs pull --include "wav2vec2_asr_base_10m.pt"

##### Decoding with H

In [None]:
! cd sherpa && ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./wav2vec2.0-torchaudio/wav2vec2_asr_base_10m.pt \
  --tokens ./wav2vec2.0-torchaudio/tokens.txt \
  --use-gpu false \
  ./wav2vec2.0-torchaudio/test_wavs/1089-134686-0001.wav \
  ./wav2vec2.0-torchaudio/test_wavs/1221-135766-0001.wav \
  ./wav2vec2.0-torchaudio/test_wavs/1221-135766-0002.wav

#### German

Note: We have already downloaded the pre-trained models in the above.

In [32]:
! cd sherpa/wav2vec2.0-torchaudio && \
git lfs pull --include "voxpopuli_asr_base_10k_de.pt"

Git LFS: (1 of 1 files) 360.19 MB / 360.19 MB


##### Decoding with H

In [None]:
! cd sherpa && ./sherpa/bin/offline_ctc_asr.py \
  --nn-model ./wav2vec2.0-torchaudio/voxpopuli_asr_base_10k_de.pt \
  --tokens ./wav2vec2.0-torchaudio/tokens-de.txt \
  --use-gpu false \
  ./wav2vec2.0-torchaudio/test_wavs/20120315-0900-PLENARY-14-de_20120315.wav \
  ./wav2vec2.0-torchaudio/test_wavs/20170517-0900-PLENARY-16-de_20170517.wav