<a href="https://colab.research.google.com/github/SlangLabs/asr-wer-bench/blob/main/asr_wer_bench.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ASR WER Benchmark

Infrastructure to measure Word Error Rate for an offline ASR engine on a given audio data set.

## Setup

In [1]:
!python --version

Python 3.6.9


### SCTK

Install `sclite`.

In [2]:
!git clone https://github.com/usnistgov/SCTK.git

Cloning into 'SCTK'...
remote: Enumerating objects: 5115, done.[K
remote: Total 5115 (delta 0), reused 0 (delta 0), pack-reused 5115[K
Receiving objects: 100% (5115/5115), 7.26 MiB | 3.84 MiB/s, done.
Resolving deltas: 100% (3658/3658), done.


In [3]:
!ls -l SCTK

total 52
-rw-r--r--  1 root root 16498 Oct 20 15:52 CHANGELOG
-rw-r--r--  1 root root   788 Oct 20 15:52 DISCLAIMER
drwxr-xr-x  4 root root  4096 Oct 20 15:52 doc
-rw-r--r--  1 root root  2273 Oct 20 15:52 LICENSE.md
-rw-r--r--  1 root root  1673 Oct 20 15:52 makefile
-rw-r--r--  1 root root  6440 Oct 20 15:52 README.md
drwxr-xr-x 26 root root  4096 Oct 20 15:52 src
-rw-r--r--  1 root root  1484 Oct 20 15:52 TODO


In [4]:
!cd SCTK && make config &> /dev/null

In [5]:
!cd SCTK && make all &> /dev/null

In [6]:
!cd SCTK && make check &> /dev/null

In [7]:
!cd SCTK  && make install &> /dev/null

In [8]:
!ls -la SCTK/bin/sclite

-rwxr-xr-x 1 root root 344296 Oct 20 15:54 SCTK/bin/sclite


### ASR WER Bench

Clone repo and set env

In [9]:
!git clone https://github.com/SlangLabs/asr-wer-bench.git

Cloning into 'asr-wer-bench'...
remote: Enumerating objects: 29, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 29 (delta 2), reused 25 (delta 2), pack-reused 0[K
Unpacking objects: 100% (29/29), done.


In [10]:
# Run for CPU
!pip install -r asr-wer-bench/requirements.txt

Collecting deepspeech==0.8.1
[?25l  Downloading https://files.pythonhosted.org/packages/29/71/f03e10d4c4141436d5be5c050fdd3a732e81709905e8081e5bbfe6b7dde0/deepspeech-0.8.1-cp36-cp36m-manylinux1_x86_64.whl (8.3MB)
[K     |████████████████████████████████| 8.3MB 6.9MB/s 
Installing collected packages: deepspeech
Successfully installed deepspeech-0.8.1


In [None]:
# Run for GPU
!pip install -r asr-wer-bench/requirements-gpu.txt

### Get DeepSpeech Models

In [11]:
!mkdir -p ./models/deepspeech/en-US/
!cd ./models/deepspeech/en-US/ && curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.8.1/deepspeech-0.8.1-models.pbmm
!cd ./models/deepspeech/en-US/ && curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.8.1/deepspeech-0.8.1-models.scorer

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   652  100   652    0     0   2950      0 --:--:-- --:--:-- --:--:--  2936
100  180M  100  180M    0     0  28.7M      0  0:00:06  0:00:06 --:--:-- 35.9M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   654  100   654    0     0   2893      0 --:--:-- --:--:-- --:--:--  2893
100  909M  100  909M    0     0  34.5M      0  0:00:26  0:00:26 --:--:-- 36.2M


In [12]:
!ls -l ./models/deepspeech/en-US/

total 1115516
-rw-r--r-- 1 root root 188915984 Oct 20 15:55 deepspeech-0.8.1-models.pbmm
-rw-r--r-- 1 root root 953363776 Oct 20 15:56 deepspeech-0.8.1-models.scorer


In [13]:
# Verify DeepSpeech

!deepspeech \
  --model models/deepspeech/en-US/deepspeech-0.8.1-models.pbmm \
  --scorer models/deepspeech/en-US/deepspeech-0.8.1-models.scorer \
  --audio asr-wer-bench/data/en-US/audio/2830-3980-0043.wav

Loading model from file models/deepspeech/en-US/deepspeech-0.8.1-models.pbmm
TensorFlow: v2.2.0-24-g1c1b2b9
DeepSpeech: v0.8.1-0-gfa883eb
2020-10-20 15:56:46.150464: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.00878s.
Loading scorer from files models/deepspeech/en-US/deepspeech-0.8.1-models.scorer
Loaded scorer in 0.000213s.
Running inference.
experience proves this
Inference took 1.665s for 1.975s audio file.


In [14]:
# Expected transcript
!cat asr-wer-bench/data/en-US/audio/2830-3980-0043.txt

experience proves this


## Run Test Bench

In [15]:
!PYTHONPATH=asr-wer-bench python asr-wer-bench/werbench/asr/engine.py \
  --engine deepspeech \
  --model-path-prefix ./models/deepspeech/en-US/deepspeech-0.8.1-models \
  --input-dir ./asr-wer-bench/data/en-US/audio \
  --output-path-prefix ./deepspeech-out

TensorFlow: v2.2.0-24-g1c1b2b9
DeepSpeech: v0.8.1-0-gfa883eb
2020-10-20 15:56:59.366816: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA


Compare using `sclite`:

In [16]:
!./SCTK/bin/sclite -r deepspeech-out.ref trn -h deepspeech-out.hyp trn -i rm

sclite: 2.10 TK Version 1.3
Begin alignment of Ref File: 'deepspeech-out.ref' and Hyp File: 'deepspeech-out.hyp'
    Alignment# 1 for speaker 2830          
    Alignment# 1 for speaker 8455          
    Alignment# 1 for speaker 4507          




                     SYSTEM SUMMARY PERCENTAGES by SPEAKER                      

       ,----------------------------------------------------------------.
       |                       deepspeech-out.hyp                       |
       |----------------------------------------------------------------|
       | SPKR   | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
       |--------+-------------+-----------------------------------------|
       | 2830   |    1      3 |100.0    0.0    0.0    0.0    0.0    0.0 |
       |--------+-------------+-----------------------------------------|
       | 8455   |    1      6 |100.0    0.0    0.0    0.0    0.0    0.0 |
       |--------+-------------+-----------------------------------------|


---
&copy; 2020 Slang Labs Private Limited. All rights reserved.