Kaldi Align

A basic forced aligner using Kaldi and gruut for multiple human languages.

Installation:

git clone https://github.com/rhasspy/kaldi-align
cd kaldi-align

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade wheel setuptool
pip3 install -r requirements.txt

kaldi-align --version

You will also need some system libraries for Kaldi and ffmpeg to convert audio:

$ sudo apt-get install libopenblas-base libgfortran5 ffmpeg

Alignment

Create a JSONL alignment file from WAV files and a CSV file with the format id|text:

kaldi-align \
  --model en-us \
  --metadata /path/to/metadata.csv \
  --audio-files <(find /path/to/wavs -name '*.wav' -type f) \
  --output-file alignments.jsonl

Text from your metadata.csv file will be automatically cleaned using gruut (punctuation and non-words removed). You can save the cleaned metadata by providing a path to --clean-metadata (optional).

If your metadata CSV file has the format id|speaker|text, pass --has-speaker to kaldi-align.

With the alignment JSONL file, you can create:

A CSV file with phoneme ids using gruut that is suitable for training a Larynx
Trimmed versions of your WAV files with silence removed from front and back

Phonemes CSV File

Create a CSV file with the format id|P P P where each P is a phoneme id.

align2csv \
  --language <LANG> \
  --alignments alignments.json \
  --phoneme-ids /path/to/phonemes.txt \
  > /path/to/phonemes.csv

where <LANG> is one of gruut's supported languages. The align2csv script runs python3 -m kaldi_align.align2csv under the hood.

The --phoneme-ids path is optional, but recommended. It will write a text file with the map between IPA text phonemes and the integer ids used in the CSV output.

Trimmed WAV Files

Trim silence from WAV files:

align2wavs \
  --metadata /path/to/metadata.csv \
  --audio-files <(find /path/to/wavs -name '*.wav' -type f) \
  --alignments alignments.json \
  --output-dir /path/to/aligned/wavs/

Trimmed versions of all WAV files with at least one word will writen to --output-dir along with the metadata from --metadata. The align2wavs script runs python3 -m kaldi_align.align2wavs under the hood.

Note that --audio-files accepts a file path with an audio file path on each line. These paths should not contain spaces.

If your metadata CSV file has the format id|speaker|text, pass --has-speaker to align2wavs.

Supported languages

Kaldi models will be automatically downloaded on first use and stored in $HOME/.local/share/kaldi_align. You may also manually download them.

Czech (cs-cz)
German (de-de)
English (en-us)
Spanish (es-es)
Persian/Farsi (fa)
French (fr-fr)
Italian (it-it)
Dutch (nl)
Russian (ru-ru)
Swedish (sv-se)

Dependencies

gruut
ffmpeg
pydub
kaldi
- Automatically downloaded on first use

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
bin		bin
etc		etc
kaldi_align		kaldi_align
scripts		scripts
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.projectile		.projectile
CHANGELOG		CHANGELOG
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
mypy.ini		mypy.ini
pylintrc		pylintrc
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

rhasspy/kaldi-align

Folders and files

Latest commit

History

Repository files navigation

Kaldi Align

Installation:

Alignment

Phonemes CSV File

Trimmed WAV Files

Supported languages

Dependencies

About

Resources

License

Stars

Watchers

Forks

Languages