GitHub - seanghay/khmer-acoustic-model-mfa: Train an Acoustic Model for Khmer language with Montreal Forced Aligner

Train an Acoustic Model for Khmer language with Montreal Forced Aligner

We'll use [High quality TTS data for Khmer (OpenSLR 42)] dataset for training the acoustic model.

1. Create Conda Environment

conda create -n aligner python=3.8 --yes
conda activate aligner

2. Install MFA

conda install -c conda-forge montreal-forced-aligner --yes

3. Download the data

# audio dataset
wget -O km_kh_male.zip https://www.openslr.org/resources/42/km_kh_male.zip

# pronouncing dictionary
wget -O lexicon.txt https://github.com/seanghay/khmer-acoustic-model-mfa/raw/main/lexicon.txt

# uncompress
unzip km_kh_male.zip

4. Preprocess the dataset

Create transcription for each audio files.

python preprocess.py

5. Train

mfa train --clean --speaker_characters 8 km_kh_male/wavs lexicon.txt khm_model.zip

This will take quite some time. Once it's done, there will be khm_model.zip file which you can then use for forced alignment.

[Download Pretrained Model]

What is `--speaker_characters`?

Each audio file name looks like this khm_0308_0011865648

MFA requires speaker labels for speaker-adapted training (SAT), so basically speaker_characters argument is to tell MFA to slice and parse the file name (khm_0308_0011865648)

It looks like this in Python

speaker_characters = 8
file_name = "khm_0308_0011865648"
speaker_id = file_name[0:speaker_characters]
print(f"{speaker_id=}")
# => speaker_id=khm_0308"

6. Train G2P Model

The pronounciation dictionary lexicon.txt has a limited amount of words which will lead to out of vocabulary(OOV) error for missing words, so in order to be able to generate unseen words we have to train G2P model.

mfa train_g2p --phonetisaurus lexicon.txt khm_g2p.zip

[Download G2P Model]

7. Forced Alignment

The output files will be in Praat TextGrid format.

mfa align --clean --speaker_characters 8 km_kh_male/wavs lexicon.txt khm_model.zip outputs

Align with sample audio

For some reason, without --beam 100 the program crashes.

mfa align --clean --g2p_model_path khm_g2p.zip sample_audio lexicon.txt khm_model.zip sample_audio --beam 100

This will create ./sample_audio/audio.TextGrid

Preview on Praat: doing phonetics by computer

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
img		img
sample_audio		sample_audio
.gitignore		.gitignore
README.md		README.md
lexicon.txt		lexicon.txt
preprocess.py		preprocess.py
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train an Acoustic Model for Khmer language with Montreal Forced Aligner

1. Create Conda Environment

2. Install MFA

3. Download the data

4. Preprocess the dataset

5. Train

What is `--speaker_characters`?

6. Train G2P Model

7. Forced Alignment

Align with sample audio

References

About

Releases 1

Languages

seanghay/khmer-acoustic-model-mfa

Folders and files

Latest commit

History

Repository files navigation

Train an Acoustic Model for Khmer language with Montreal Forced Aligner

1. Create Conda Environment

2. Install MFA

3. Download the data

4. Preprocess the dataset

5. Train

What is --speaker_characters?

6. Train G2P Model

7. Forced Alignment

Align with sample audio

References

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages

What is `--speaker_characters`?