<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/master/sherpa-onnx/sherpa_onnx_python_api_examples.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebooks shows how to use Python APIs of [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for speech recongition.

# Install sherpa-onnx

In [1]:
! pip install sherpa-onnx

Collecting sherpa-onnx
  Downloading sherpa_onnx-1.7.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece==0.1.96 (from sherpa-onnx)
  Downloading sentencepiece-0.1.96-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m47.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece, sherpa-onnx
Successfully installed sentencepiece-0.1.96 sherpa-onnx-1.7.11


# With non-streaming transducer

In [2]:
%%shell

# Download a model
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/sherpa-onnx-zipformer-en-2023-06-26
cd sherpa-onnx-zipformer-en-2023-06-26
git lfs pull --include "*.onnx"

Cloning into 'sherpa-onnx-zipformer-en-2023-06-26'...
remote: Enumerating objects: 20, done.[K
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20[K
Unpacking objects: 100% (20/20), 666.78 KiB | 3.57 MiB/s, done.




In [3]:
import torchaudio # for reading wave files
import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_transducer(
    tokens="sherpa-onnx-zipformer-en-2023-06-26/tokens.txt",
    encoder="sherpa-onnx-zipformer-en-2023-06-26/encoder-epoch-99-avg-1.onnx",
    decoder="sherpa-onnx-zipformer-en-2023-06-26/decoder-epoch-99-avg-1.onnx",
    joiner="sherpa-onnx-zipformer-en-2023-06-26/joiner-epoch-99-avg-1.onnx",
)
samples, sample_rate = torchaudio.load("sherpa-onnx-zipformer-en-2023-06-26/test_wavs/0.wav")

s = recognizer.create_stream()
s.accept_waveform(sample_rate, samples[0].contiguous().numpy())
recognizer.decode_stream(s)

print(s.result.text)


 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS


In [4]:
# to decode multiple files in parallel
samples0, sample_rate0 = torchaudio.load("sherpa-onnx-zipformer-en-2023-06-26/test_wavs/0.wav")
s0 = recognizer.create_stream()
s0.accept_waveform(sample_rate0, samples0[0].contiguous().numpy())

samples1, sample_rate1 = torchaudio.load("sherpa-onnx-zipformer-en-2023-06-26/test_wavs/1.wav")
s1 = recognizer.create_stream()
s1.accept_waveform(sample_rate1, samples1[0].contiguous().numpy())

recognizer.decode_streams([s0, s1])
print('0.wav', s0.result.text)
print('1.wav', s1.result.text)


0.wav  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS
1.wav  GOD AS A DIRECT CONSEQUENCE OF THE SIN WHICH MAN THUS PUNISHED HAD GIVEN HER A LOVELY CHILD WHOSE PLACE WAS ON THAT SAME DISHONORED BOSOM TO CONNECT HER PARENT FOREVER WITH THE RACE AND DESCENT OF MORTALS AND TO BE FINALLY A BLESSED SOUL IN HEAVEN


# With non-streaming whisper

In [5]:
%%shell

# Download a model
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/sherpa-onnx-whisper-tiny.en
cd sherpa-onnx-whisper-tiny.en
git lfs pull --include "*.onnx"

Cloning into 'sherpa-onnx-whisper-tiny.en'...
remote: Enumerating objects: 46, done.[K
remote: Counting objects: 100% (46/46), done.[K
remote: Compressing objects: 100% (45/45), done.[K
remote: Total 46 (delta 5), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (46/46), 1.00 MiB | 4.96 MiB/s, done.




In [6]:
import torchaudio # for reading wave files
import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_whisper(
    tokens="./sherpa-onnx-whisper-tiny.en/tiny.en-tokens.txt",
    encoder="./sherpa-onnx-whisper-tiny.en/tiny.en-encoder.onnx",
    decoder="./sherpa-onnx-whisper-tiny.en/tiny.en-decoder.onnx",
)
samples, sample_rate = torchaudio.load("./sherpa-onnx-whisper-tiny.en/test_wavs/0.wav")

s = recognizer.create_stream()
s.accept_waveform(sample_rate, samples[0].contiguous().numpy())
recognizer.decode_stream(s)

print(s.result.text)

 After early nightfall, the yellow lamps would light up here and there, the squalid quarter of the brothels.


In [7]:
# to decode multiple files in parallel
samples0, sample_rate0 = torchaudio.load("./sherpa-onnx-whisper-tiny.en/test_wavs/0.wav")
s0 = recognizer.create_stream()
s0.accept_waveform(sample_rate0, samples0[0].contiguous().numpy())

samples1, sample_rate1 = torchaudio.load("./sherpa-onnx-whisper-tiny.en/test_wavs/0.wav")
s1 = recognizer.create_stream()
s1.accept_waveform(sample_rate1, samples1[0].contiguous().numpy())

recognizer.decode_streams([s0, s1])
print('0.wav', s0.result.text)
print('1.wav', s1.result.text)

0.wav  After early nightfall, the yellow lamps would light up here and there, the squalid quarter of the brothels.
1.wav  After early nightfall, the yellow lamps would light up here and there, the squalid quarter of the brothels.


# With non-streaming NeMo CTC model

In [8]:
%%shell

# Download a model
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/sherpa-onnx-nemo-ctc-en-conformer-medium
cd sherpa-onnx-nemo-ctc-en-conformer-medium
git lfs pull --include "*.onnx"

Cloning into 'sherpa-onnx-nemo-ctc-en-conformer-medium'...
remote: Enumerating objects: 20, done.[K
remote: Total 20 (delta 0), reused 0 (delta 0), pack-reused 20[K
Unpacking objects: 100% (20/20), 671.36 KiB | 2.57 MiB/s, done.




In [9]:
import torchaudio # for reading wave files
import sherpa_onnx

recognizer = sherpa_onnx.OfflineRecognizer.from_nemo_ctc(
    tokens="sherpa-onnx-nemo-ctc-en-conformer-medium/tokens.txt",
    model="sherpa-onnx-nemo-ctc-en-conformer-medium/model.onnx",
)
samples, sample_rate = torchaudio.load("sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav")

s = recognizer.create_stream()
s.accept_waveform(sample_rate, samples[0].contiguous().numpy())
recognizer.decode_stream(s)

print(s.result.text)

 after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels


In [10]:
# to decode multiple files in parallel
samples0, sample_rate0 = torchaudio.load("./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/0.wav")
s0 = recognizer.create_stream()
s0.accept_waveform(sample_rate0, samples0[0].contiguous().numpy())

samples1, sample_rate1 = torchaudio.load("./sherpa-onnx-nemo-ctc-en-conformer-medium/test_wavs/1.wav")
s1 = recognizer.create_stream()
s1.accept_waveform(sample_rate1, samples1[0].contiguous().numpy())

recognizer.decode_streams([s0, s1])
print('0.wav', s0.result.text)
print('1.wav', s1.result.text)

0.wav  after early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels
1.wav  god as a direct consequence of the sin which man thus punished had given her a lovely child whose place was on that same dishonored bosom to connect her parent for ever with the race and descent of mortals and to be finally a blessed soul in heaven
