# Inference using multilingual ASR model

(Adapted from FAIR's notebook: https://colab.research.google.com/github/facebookresearch/flashlight/blob/master/flashlight/app/asr/tutorial/notebooks/InferenceAndAlignmentCTC.ipynb) 

## Install `Flashlight`
First we need to install `Flashlight` and its dependencies. `Flashlight` is installed from source, it takes **~16 minutes**. 

For installation out of colab notebook please use [link](https://github.com/flashlight/flashlight#building).

In [2]:
# First, choose backend to build with
backend = 'CPU' #@param ["CPU", "CUDA"]
# Clone Flashlight
!git clone https://github.com/flashlight/flashlight.git
# install all dependencies for colab notebook
!source flashlight/scripts/colab/colab_install_deps.sh

Cloning into 'flashlight'...
remote: Enumerating objects: 19957, done.[K
remote: Counting objects: 100% (3365/3365), done.[K
remote: Compressing objects: 100% (660/660), done.[K
remote: Total 19957 (delta 2928), reused 2711 (delta 2703), pack-reused 16592[K
Receiving objects: 100% (19957/19957), 14.15 MiB | 18.95 MiB/s, done.
Resolving deltas: 100% (14223/14223), done.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libboost-all-dev is already the newest version (1.65.1.0ubuntu1).
libopenmpi-dev is already the newest version (2.1.1-8).
libsndfile1-dev is already the newest version (1.0.28-4ubuntu0.18.04.2).
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupt

Build from current master. Builds the ASR app. Resulting binaries in `/content/flashlight/build/bin/asr`.

If using a GPU Colab runtime, build the CUDA backend; else build the CPU backend.

In [3]:
# export necessary env variables
%env MKLROOT=/opt/intel/mkl
%env ArrayFire_DIR=/opt/arrayfire/share/ArrayFire/cmake
%env DNNL_DIR=/opt/dnnl/dnnl_lnx_2.0.0_cpu_iomp/lib/cmake/dnnl

if backend == "CUDA":
  # Total time: ~13 minutes
#   !cd flashlight && git checkout d2e1924cb2a2b32b48cc326bb7e332ca3ea54f67 && mkdir -p build && cd build && \
  !cd flashlight && git checkout 8f7af9ec1188bfd7050c47abfac528d21650890f && mkdir -p build && cd build && \
  cmake .. -DCMAKE_BUILD_TYPE=Release \
           -DFL_BUILD_TESTS=OFF \
           -DFL_BUILD_EXAMPLES=OFF \
           -DFL_BUILD_APP_ASR=ON && \
  make -j$(nproc)
elif backend == "CPU":
  # Total time: ~14 minutes
#   !cd flashlight && git checkout d2e1924cb2a2b32b48cc326bb7e332ca3ea54f67 && mkdir -p build && cd build && \
  !cd flashlight && git checkout 8f7af9ec1188bfd7050c47abfac528d21650890f && mkdir -p build && cd build && \
  cmake .. -DFL_BACKEND=CPU \
           -DCMAKE_BUILD_TYPE=Release \
           -DFL_BUILD_TESTS=OFF \
           -DFL_BUILD_EXAMPLES=OFF \
           -DFL_BUILD_APP_ASR=ON && \
  make -j$(nproc)
else:
  raise ValueError(f"Unknown backend {backend}")


env: MKLROOT=/opt/intel/mkl
env: ArrayFire_DIR=/opt/arrayfire/share/ArrayFire/cmake
env: DNNL_DIR=/opt/dnnl/dnnl_lnx_2.0.0_cpu_iomp/lib/cmake/dnnl
Note: checking out '8f7af9ec1188bfd7050c47abfac528d21650890f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 8f7af9ec Fixed typo in README.md (#609)
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX comp

Let's take a look around.

In [4]:
# Binaries are located in
!ls flashlight/build/bin/asr

fl_asr_align	       fl_asr_sfx_apply  fl_asr_tutorial_finetune_ctc
fl_asr_arch_benchmark  fl_asr_test	 fl_asr_tutorial_inference_ctc
fl_asr_decode	       fl_asr_train	 fl_asr_voice_activity_detection_ctc


## Inference Step 0: Preparation


### Download Models
Download acoustic model, tokens, and a few audio files for testing.

In [5]:
!wget https://dl.fbaipublicfiles.com/wav2letter/mling_pl/tokens-all.lst
!wget https://dl.fbaipublicfiles.com/wav2letter/mling_pl/checkpoint_cv_finetune.bin # acoustic model (large)
!mkdir audio
for i in range(5):
  path = "https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/audio/116-288045-000{}.flac".format(i)
  !cd audio && wget $path

--2022-02-21 15:14:19--  https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lexicon.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4965720 (4.7M) [text/plain]
Saving to: ‘lexicon.txt.1’


2022-02-21 15:14:20 (6.18 MB/s) - ‘lexicon.txt.1’ saved [4965720/4965720]

--2022-02-21 15:14:20--  https://dl.fbaipublicfiles.com/wav2letter/mling_pl/tokens-all.lst
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31534 (31K) [binary/octet-stream]
Saving to: ‘tokens-all.lst.1’


2022-02-21 15:14:21 (304 KB/s) - ‘tokens-all.lst.1’ saved [31534/31534]

--2022-02-21 15:14:21--  https:

### Install dependencies to record/process audio

In [6]:
!apt-get install sox
!pip install ffmpeg-python sox

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  cuda-command-line-tools-10-0 cuda-command-line-tools-10-1
  cuda-command-line-tools-11-0 cuda-compiler-10-0 cuda-compiler-10-1
  cuda-compiler-11-0 cuda-cuobjdump-10-0 cuda-cuobjdump-10-1
  cuda-cuobjdump-11-0 cuda-cupti-10-0 cuda-cupti-10-1 cuda-cupti-11-0
  cuda-cupti-dev-11-0 cuda-documentation-10-0 cuda-documentation-10-1
  cuda-documentation-11-0 cuda-documentation-11-1 cuda-gdb-10-0 cuda-gdb-10-1
  cuda-gdb-11-0 cuda-gpu-library-advisor-10-0 cuda-gpu-library-advisor-10-1
  cuda-libraries-10-0 cuda-libraries-10-1 cuda-libraries-11-0
  cuda-memcheck-10-0 cuda-memcheck-10-1 cuda-memcheck-11-0 cuda-nsight-10-0
  cuda-nsight-10-1 cuda-nsight-11-0 cuda-nsight-11-1 cuda-nsight-compute-10-0
  cuda-nsight-compute-10-1 cuda-nsight-compute-11-0 cuda-nsight-compute-11-1
  cuda-nsight-systems-10-1 cuda-nsight-systems-

## Inference Step 1: Record Audio from Your Microphone and Run Inference




### Let's record!

In [7]:
from flashlight.scripts.colab.record import record_audio
record_audio("recorded_audio") # result --> "recorded_audio.wav"

## Inference Step 2: Run on a Set of Audio Files Provided in the txt File



### Prepare the file with all audio paths at first

In [8]:
# !ls audio/*.flac > audio.lst
import glob
from subprocess import check_output
with open("audio.lst", "w") as f:
    for i,audio in enumerate(glob.glob("audio/*.flac") + ["recorded_audio.wav"]):
        duration = float(check_output("soxi -D " + audio, shell=True))
        f.write("%d %s %s\n" % (i, audio, duration))

In [19]:
!cat audio.lst

0 audio/116-288045-0000.flac 10.65
1 audio/116-288045-0004.flac 3.72
2 audio/116-288045-0001.flac 8.635
3 audio/116-288045-0003.flac 3.66
4 audio/116-288045-0002.flac 9.625
5 recorded_audio.wav 6.48
a a |


COMPILE MULTILINGUAL MODEL .so

In [21]:
%cd /content/flashlight/build
# !wget https://raw.githubusercontent.com/flashlight/wav2letter/49087d575ddf77aa5a99a01fee980fc00e92c802/recipes/mling_pl/model_with_externally_controlled_reshaping_big_lid.cpp
# !mv model_with_externally_controlled_reshaping_big_lid.cpp mling.cpp
!cmake .. -DFL_PLUGIN_MODULE_SRC_PATH=mling.cpp
# !cmake .. -DFL_PLUGIN_MODULE_SRC_PATH=mling_large.cpp
!make

/content/flashlight/build
-- -rdynamic supported.
-- Will build flashlight libraries.
-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_gnu_thread - mkl_core - gomp - pthread - m - dl]
--   Library mkl_intel_lp64: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so
--   Library mkl_gnu_thread: /opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so
--   Library mkl_core: /opt/intel/mkl/lib/intel64/libmkl_core.so
--   Library gomp: -fopenmp
--   Library pthread: /usr/lib/x86_64-linux-gnu/libpthread.so
--   Library m: /usr/lib/x86_64-linux-gnu/libm.so
--   Library dl: /usr/lib/x86_64-linux-gnu/libdl.so
-- MKL library found
-- CBLAS found (include: /opt/intel/mkl/include, library: /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so;/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so;/opt/intel/mkl/lib/intel64/libmkl_core.so;-fopenmp;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libdl.so)
-- Could NOT find FFTW3 (missing: FFTW3_DIR)
-- FindFFTW 

MULTILINGUAL MODEL INFERENCE

In [17]:
# we need a dummy lexicon:
!echo 'a a |' > lexicon.txt

In [22]:
%cd /content
# checkpoint_base_cvft.bin
# checkpoint_large.bin
!./flashlight/build/bin/asr/fl_asr_test \
    --test=audio.lst \
    --am=checkpoint_base_cvft.bin \
    --arch=flashlight/build/mling.so \
    --tokens=tokens-all.lst \
    --lexicon=lexicon.txt \
    --datadir=''  \
    --emission_dir=''  \
    --show \
    --logtostderr=1 \
    --minloglevel=0

/content
I0221 16:16:05.997638 10328 Test.cpp:76] [Network] Reading acoustic model from checkpoint_base_cvft.bin
I0221 16:16:15.735110 10328 Test.cpp:90] [Network] Model myModel: Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
	(0): View (-1 1 80 0)
	(1): LayerNorm ( axis : { 0 1 2 } , size : -1)
	(2): Conv2D (80->3072, 7x1, 3,1, SAME,0, 1, 1) (with bias)
	(3): GatedLinearUnit (2)
	(4): Dropout (0.300000)
	(5): Reorder (2,0,3,1)
(reshaping happens here)
Transformer (nHeads: 4), (pDropout: 0.3), (pLayerdrop: 0.3), (bptt: 920), (useMask: 0), (preLayerNorm: 0)
Transformer (nHeads: 4), (pDropout: 0.3), (pLayerdrop: 0.3), (bptt: 920), (useMask: 0), (preLayerNorm: 0)
Transformer (nHeads: 4), (pDropout: 0.3), (pLayerdrop: 0.3), (bptt: 920), (useMask: 0), (preLayerNorm: 0)
Transformer (nHeads: 4), (pDropout: 0.3), (pLayerdrop: 0.3), (bptt: 920), (useMask: 0), (preLayerNorm: 0)
Transformer (nHeads: 4), (pDropout: 0.3), (pLayerdrop: 0.3), (bptt: 920), (useMask: 0), (preLay