Rhasspy ASR Kaldi

Automated speech recognition in Rhasspy voice assistant with Kaldi.

Requirements

Python 3.7
Kaldi
- Expects $KALDI_DIR in environment
Opengrm
- Expects ngram* in $PATH
Phonetisaurus
- Expects phonetisaurus-apply in $PATH

See pre-built apps for pre-compiled binaries.

Installation

$ git clone https://github.com/rhasspy/rhasspy-asr-kaldi
$ cd rhasspy-asr-kaldi
$ ./configure
$ make
$ make install

Transcribing

Use python3 -m rhasspyasr_kaldi transcribe <ARGS>

usage: rhasspy-asr-kaldi transcribe [-h] --model-dir MODEL_DIR
                                    [--graph-dir GRAPH_DIR]
                                    [--model-type MODEL_TYPE]
                                    [--frames-in-chunk FRAMES_IN_CHUNK]
                                    [wav_file [wav_file ...]]

positional arguments:
  wav_file              WAV file(s) to transcribe

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --model-type MODEL_TYPE
                        Either nnet3 or gmm (default: nnet3)
  --frames-in-chunk FRAMES_IN_CHUNK
                        Number of frames to process at a time

For nnet3 models, the online2-tcp-nnet3-decode-faster program is used to handle streaming audio. For gmm models, audio is buffered and packaged as a WAV file before being transcribed.

Training

Use python3 -m rhasspyasr_kaldi train <ARGS>

usage: rhasspy-asr-kaldi train [-h] --model-dir MODEL_DIR
                               [--graph-dir GRAPH_DIR]
                               [--intent-graph INTENT_GRAPH]
                               [--dictionary DICTIONARY]
                               [--dictionary-casing {upper,lower,ignore}]
                               [--language-model LANGUAGE_MODEL]
                               --base-dictionary BASE_DICTIONARY
                               [--g2p-model G2P_MODEL]
                               [--g2p-casing {upper,lower,ignore}]

optional arguments:
  -h, --help            show this help message and exit
  --model-dir MODEL_DIR
                        Path to Kaldi model directory (with conf, data)
  --graph-dir GRAPH_DIR
                        Path to Kaldi graph directory (with HCLG.fst)
  --intent-graph INTENT_GRAPH
                        Path to intent graph JSON file (default: stdin)
  --dictionary DICTIONARY
                        Path to write custom pronunciation dictionary
  --dictionary-casing {upper,lower,ignore}
                        Case transformation for dictionary words (training,
                        default: ignore)
  --language-model LANGUAGE_MODEL
                        Path to write custom language model
  --base-dictionary BASE_DICTIONARY
                        Paths to pronunciation dictionaries
  --g2p-model G2P_MODEL
                        Path to Phonetisaurus grapheme-to-phoneme FST model
  --g2p-casing {upper,lower,ignore}
                        Case transformation for g2p words (training, default:
                        ignore)

This will generate a custom HCLG.fst from an intent graph created using rhasspy-nlu. Your Kaldi model directory should be laid out like this:

my_model/ (--model-dir)
- conf/
  - mfcc_hires.conf
- data/
  - local/
    - dict/
      - lexicon.txt (copied from --dictionary)
    - lang/
      - lm.arpa.gz (copied from --language-model)
- graph/ (--graph-dir)
  - HCLG.fst (generated)
- model/
  - final.mdl
- phones/
  - extra_questions.txt
  - nonsilence_phones.txt
  - optional_silence.txt
  - silence_phones.txt
- online/ (nnet3 only)
- extractor/ (nnet3 only)

When using the train command, you will need to specify the following arguments:

--intent-graph - path to graph json file generated using rhasspy-nlu
--model-type - either nnet3 or gmm
--model-dir - path to top-level model directory (my_model in example above)
--graph-dir - path to directory where HCLG.fst should be written (my_model/graph in example above)
--base-dictionary - pronunciation dictionary with all words from intent graph (can be used multiple times)
--dictionary - path to write custom pronunciation dictionary (optional)
--language-model - path to write custom ARPA language model (optional)

Building From Source

rhasspy-asr-kaldi depends on the following programs that must be compiled:

Kaldi
- Speech to text engine
Opengrm
- Create ARPA language models
Phonetisaurus
- Guesses pronunciations for unknown words

Kaldi

Make sure you have the necessary dependencies installed:

sudo apt-get install \
    build-essential \
    libatlas-base-dev libatlas3-base gfortran \
    automake autoconf unzip sox libtool subversion \
    python3 python \
    git zlib1g-dev

Download Kaldi and extract it:

wget -O kaldi-master.tar.gz \
    'https://github.com/kaldi-asr/kaldi/archive/master.tar.gz'
tar -xvf kaldi-master.tar.gz

First, build Kaldi's tools:

cd kaldi-master/tools
make

Use make -j 4 if you have multiple CPU cores. This will take a long time.

Next, build Kaldi itself:

cd kaldi-master
./configure --shared --mathlib=ATLAS
make depend
make

Use make depend -j 4 and make -j 4 if you have multiple CPU cores. This will take a long time.

There is no installation step. The kaldi-master directory contains all the libraries and programs that Rhasspy will need to access.

See docker-kaldi for a Docker build script.

Phonetisaurus

Make sure you have the necessary dependencies installed:

sudo apt-get install build-essential

First, download and build OpenFST 1.6.2

wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.2.tar.gz
tar -xvf openfst-1.6.2.tar.gz
cd openfst-1.6.2
./configure \
    "--prefix=$(pwd)/build" \
    --enable-static --enable-shared \
    --enable-far --enable-ngram-fsts
make
make install

Use make -j 4 if you have multiple CPU cores. This will take a long time.

Next, download and extract Phonetisaurus:

wget -O phonetisaurus-master.tar.gz \
    'https://github.com/AdolfVonKleist/Phonetisaurus/archive/master.tar.gz'
tar -xvf phonetisaurus-master.tar.gz

Finally, build Phonetisaurus (where /path/to/openfst is the openfst-1.6.2 directory from above):

cd Phonetisaurus-master
./configure \
    --with-openfst-includes=/path/to/openfst/build/include \
    --with-openfst-libs=/path/to/openfst/build/lib
make
make install

Use make -j 4 if you have multiple CPU cores. This will take a long time.

You should now be able to run the phonetisaurus-align program.

See docker-phonetisaurus for a Docker build script.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
bin		bin
debian		debian
etc		etc
rhasspyasr_kaldi		rhasspyasr_kaldi
scripts		scripts
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.projectile		.projectile
.yamllint.yml		.yamllint.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
VERSION		VERSION
__main__.py		__main__.py
configure		configure
mypy.ini		mypy.ini
pylintrc		pylintrc
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
rhasspyasr_kaldi.spec		rhasspyasr_kaldi.spec
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rhasspy ASR Kaldi

Requirements

Installation

Transcribing

Training

Building From Source

Kaldi

Phonetisaurus

About

Releases

Packages

Contributors 3

Languages

License

rhasspy/rhasspy-asr-kaldi

Folders and files

Latest commit

History

Repository files navigation

Rhasspy ASR Kaldi

Requirements

Installation

Transcribing

Training

Building From Source

Kaldi

Phonetisaurus

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages