<a href="https://colab.research.google.com/github/k2-fsa/colab/blob/master/sherpa-onnx/RTF_comparison_betwen_whisper_and_moonshine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This colab notebooks compares the RTF of Whisper tiny.en and Moonshine tiny
by using them to generate subtitles with silero-vad.

We use [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) as the runtime. The underlying implementation is based on
C++.

||Moonshine tiny|Whisper tiny.en|
|---|---|---|
|RTF|0.058|0.191|

# Install sherpa-onnx

Please see
https://k2-fsa.github.io/sherpa/onnx/install/index.html
and
https://k2-fsa.github.io/sherpa/onnx/python/index.html


In [1]:
%%shell

pip install sherpa-onnx

Collecting sherpa-onnx
  Downloading sherpa_onnx-1.10.30-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (23 kB)
Downloading sherpa_onnx-1.10.30-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m51.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sherpa-onnx
Successfully installed sherpa-onnx-1.10.30




# Download model files

In [2]:
%%shell

wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-int8.tar.bz2

wget -q https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/Obama.wav

tar xf sherpa-onnx-whisper-tiny.en.tar.bz2
tar xf sherpa-onnx-moonshine-tiny-en-int8.tar.bz2

ls -lh sherpa-onnx-whisper-tiny.en
echo "---"
ls -lh sherpa-onnx-moonshine-tiny-en-int8

total 244M
drwxr-xr-x 2 501 staff 4.0K Jul 13 13:31 test_wavs
-rw-r--r-- 1 501 staff  86M Jul 13 13:31 tiny.en-decoder.int8.onnx
-rw-r--r-- 1 501 staff 110M Jul 13 13:31 tiny.en-decoder.onnx
-rw-r--r-- 1 501 staff  13M Jul 13 13:31 tiny.en-encoder.int8.onnx
-rw-r--r-- 1 501 staff  36M Jul 13 13:31 tiny.en-encoder.onnx
-rw-r--r-- 1 501 staff 816K Jul 13 13:31 tiny.en-tokens.txt
---
total 119M
-rw-r--r-- 1 501 staff  44M Oct 26 01:42 cached_decode.int8.onnx
-rw-r--r-- 1 501 staff  18M Oct 26 01:42 encode.int8.onnx
-rw-r--r-- 1 501 staff 1.1K Oct 26 01:42 LICENSE
-rw-r--r-- 1 501 staff 6.5M Oct 26 01:42 preprocess.onnx
-rw-r--r-- 1 501 staff  175 Oct 26 01:42 README.md
drwxr-xr-x 2 501 staff 4.0K Oct 26 01:42 test_wavs
-rw-r--r-- 1 501 staff 427K Oct 26 01:42 tokens.txt
-rw-r--r-- 1 501 staff  51M Oct 26 01:42 uncached_decode.int8.onnx




In [3]:
# We will use fmpeg to decode Obama.wav
! sudo apt-get install -q ffmpeg

Reading package lists...
Building dependency tree...
Reading state information...
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.


# Run it!

In [4]:
! git clone https://github.com/k2-fsa/sherpa-onnx

Cloning into 'sherpa-onnx'...
remote: Enumerating objects: 16828, done.[K
remote: Counting objects: 100% (5835/5835), done.[K
remote: Compressing objects: 100% (1470/1470), done.[K
remote: Total 16828 (delta 4746), reused 4854 (delta 4293), pack-reused 10993 (from 1)[K
Receiving objects: 100% (16828/16828), 7.20 MiB | 12.85 MiB/s, done.
Resolving deltas: 100% (11458/11458), done.


## Moonshine tiny

In [5]:
%%shell

python3 sherpa-onnx/python-api-examples/generate-subtitles.py  \
  --silero-vad-model=./silero_vad.onnx \
  --moonshine-preprocessor=./sherpa-onnx-moonshine-tiny-en-int8/preprocess.onnx \
  --moonshine-encoder=./sherpa-onnx-moonshine-tiny-en-int8/encode.int8.onnx \
  --moonshine-uncached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/uncached_decode.int8.onnx \
  --moonshine-cached-decoder=./sherpa-onnx-moonshine-tiny-en-int8/cached_decode.int8.onnx \
  --tokens=./sherpa-onnx-moonshine-tiny-en-int8/tokens.txt \
  --num-threads=1 \
  ./Obama.wav

Started!
Saved to Obama.srt
Audio duration:	335.235 s
Elapsed:	19.581 s
RTF = 19.581/335.235 = 0.058
Done!




In [6]:
%%shell

cat Obama.srt

1
0:00:09,286 --> 0:00:12,486
 everybody on everybody go ahead and have a seat

2
0:00:13,094 --> 0:00:15,014
 How's everybody doing today?

3
0:00:18,694 --> 0:00:20,742
 How about Tim Spicer?

4
0:00:25,894 --> 0:00:31,452
 I am here with students at Wade Field High School in Arlington, Virginia.

5
0:00:32,710 --> 0:00:39,580
 And we've got students tuning in from all across America, from kindergarten through 12th grade.

6
0:00:40,166 --> 0:00:48,060
 And I am just so glad that all could join us today. And I want to thank Wakefield for being such an outstanding host. Give yourselves a big honor to us.

7
0:00:54,406 --> 0:00:59,324
 Now I know that for many of you today is the first day of school.

8
0:00:59,590 --> 0:01:05,916
 And for those of you in kindergarten or starting middle or high school, it's your first day in a new school.

9
0:01:06,310 --> 0:01:09,798
 So it's understandable if you're a little nervous.

10
0:01:10,630 --> 0:01:15,676
 I imagine there's some seniors o



## whisper tiny.en

In [7]:
%%shell

python3 sherpa-onnx/python-api-examples/generate-subtitles.py  \
  --silero-vad-model=./silero_vad.onnx \
  --whisper-encoder=./sherpa-onnx-whisper-tiny.en/tiny.en-encoder.int8.onnx \
  --whisper-decoder=./sherpa-onnx-whisper-tiny.en/tiny.en-decoder.int8.onnx \
  --tokens=./sherpa-onnx-whisper-tiny.en/tiny.en-tokens.txt \
  --num-threads=1 \
  ./Obama.wav

Started!
Saved to Obama.srt
Audio duration:	335.235 s
Elapsed:	64.083 s
RTF = 64.083/335.235 = 0.191
Done!




In [8]:
%%shell
cat Obama.srt

1
0:00:09,286 --> 0:00:12,486
 everybody. All right, everybody go ahead and have a seat.

2
0:00:13,094 --> 0:00:15,014
 How's everybody doing today?

3
0:00:18,694 --> 0:00:20,742
 How about Tim Spicer?

4
0:00:25,894 --> 0:00:31,452
 I am here with students at Wakefield High School in Arlington, Virginia.

5
0:00:32,710 --> 0:00:39,580
 And we've got students tuning in from all across America, from kindergarten through 12th grade.

6
0:00:40,166 --> 0:00:48,060
 And I am just so glad that all could join us today. And I want to thank Wakefield for being such an outstanding host. Give yourselves a big round of applause.

7
0:00:54,406 --> 0:00:59,324
 And I know that for many of you, today is the first day of school.

8
0:00:59,590 --> 0:01:05,916
 And for those of you in kindergarten or starting middle or high school, it's your first day in a new school.

9
0:01:06,310 --> 0:01:09,798
 So it's understandable if you're a little nervous.

10
0:01:10,630 --> 0:01:15,676
 I imagine there'

