<a href="https://colab.research.google.com/github/exphon/exphon2026/blob/main/MFA_LJSpeech.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Forced align LJSpeech dataset using Montreal Forced Aligner (MFA)


**Note**: The notebook takes 20 minutes to finish.

**DATA**: https://keithito.com/LJ-Speech-Dataset/

Expected results:

![english_mfa](https://github.com/exphon/exphon2026/blob/main/fig/english_mfa.png?raw=1)

## STEP 1: miniconda 설치를 위한 `install_mfa.sh` 작성

Montreal Forced Aligner

In [None]:
%%writefile install_mfa.sh
#!/bin/bash

## a script to install Montreal Forced Aligner (MFA)

root_dir=${1:-/tmp/mfa}

# Clean up previous installation
if [ -d "$root_dir" ]; then
    echo "Removing existing MFA installation at $root_dir"
    rm -rf $root_dir
fi
mkdir -p $root_dir
cd $root_dir

# download miniconda3
wget -q --show-progress https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $root_dir/miniconda3 -f

# Initialize conda for the current shell to enable 'conda activate' etc.
eval "$($root_dir/miniconda3/bin/conda shell.bash hook)"

# Accept Conda Terms of Service (toc)
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Create MFA environment with a specific Python version (e.g., 3.9) for compatibility
conda create -n aligner python=3.9 -y

# Activate the environment
conda activate aligner

# Install Montreal Forced Aligner into the activated environment
conda install -c conda-forge montreal-forced-aligner -y

echo -e "\n======== DONE =========="
echo -e "\nTo activate MFA, run: source $root_dir/miniconda3/bin/activate aligner"
echo -e "\nTo delete MFA, run: rm -rf $root_dir"
echo -e "\nSee: https://montreal-forced-aligner.readthedocs.io/en/latest/aligning.html to know how to use MFA"

## STEP 2: mfa 설치

In [None]:
# download and install mfa
INSTALL_DIR="/tmp/mfa" # path to install directory

!bash ./install_mfa.sh {INSTALL_DIR}
# The following command needs to be executed in a way that the conda environment is properly activated
# Using `bash -c "source ... && mfa ..."` ensures it runs in a single shell context
!bash -c "source {INSTALL_DIR}/miniconda3/bin/activate aligner && export MPLBACKEND=Agg && mfa align --help"

## STEP 3: 영어 데이터 LJSpeech 다운로드

In [None]:
# download and unpack ljs dataset
! echo "download and unpack ljs dataset"
! mkdir -p ./ljs
! cd ./ljs && wget -q --show-progress https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
! cd ./ljs && tar xjf LJSpeech-1.1.tar.bz2

## STEP 4: SOX 설치 (Resampling to 16KHz)

이 코드는 sox라는 오디오 처리 도구를 사용하여 .wav 파일을 변환합니다. --norm=-3은 오디오 볼륨을 -3 dBFS로 정규화하여 일관된 볼륨 수준을 유지합니다. -r 16k는 샘플링 속도를 16kHz로 설정하고, -c 1은 오디오를 모노 채널로 변환합니다. 마지막으로, pwd/wav/{}는 처리된 파일을 ./wav 디렉토리에 저장하도록 지정합니다.

In [None]:
# install sox tool
!sudo apt install -q -y sox
# convert to 16k audio clips
!mkdir ./wav
!echo "normalize audio clips to sample rate of 16k"
!find ./ljs -name "*.wav" -type f -execdir sox --norm=-3 {} -r 16k -c 1 `pwd`/wav/{} \;
!echo "Number of clips" $(ls ./wav/ | wc -l)

## STEP 5: txt 퍄일 준비하기

In [None]:
# create transcript files from metadata.csv
lines = open('./ljs/LJSpeech-1.1/metadata.csv', 'r').readlines()
from tqdm.auto import tqdm
for line in tqdm(lines):
  fn, _, transcript = line.strip().split('|')
  ident = fn
  open(f'./wav/{ident}.txt', 'w').write(transcript)

# this is an example transcript for LJ001-0001.wav
!cat ./wav/LJ001-0001.txt

In [None]:
from IPython.display import Audio
Audio('./wav/LJ001-0001.wav')

## STEP 6: 영어 Acoustic model 및 Lexicon model 다운로드하기

In [None]:
# download a pretrained english acoustic model, and english lexicon
!wget -q --show-progress https://github.com/MontrealCorpusTools/mfa-models/raw/main/acoustic/english.zip
!wget -q --show-progress http://www.openslr.org/resources/11/librispeech-lexicon.txt

In [None]:
! head librispeech-lexicon.txt

### Lexicon 모델 정비

In [None]:
# see: https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/pull/480
import re
lexicon = open("librispeech-lexicon.txt").readlines()
with open("modified_librispeech-lexicon.txt", "w") as f:
    for line in lexicon:
        word, *phonemes = re.split(r"\s+", line.strip())
        phonemes = " ".join(phonemes)
        f.write(f"{word}\t{phonemes}\n")

## STEP 7: MFA로 align하기

In [None]:
# FINALLY, align phonemes and speech
! source {INSTALL_DIR}/miniconda3/bin/activate aligner && \
  export MPLBACKEND=Agg && \
  mfa align -t ./temp -j 4 ./wav modified_librispeech-lexicon.txt ./english.zip ./ljs_aligned

# output files are at ./ljs_aligned
!echo "See output files at ./ljs_aligned"

## STEP 8: zip 파일로 압축하여 결과물 다운로드하기

In [None]:
!zip -r ljs_aligned.zip ./ljs_aligned

In [None]:
from google.colab import files
files.download('ljs_aligned.zip')

# References

- https://gist.github.com/NTT123/12264d15afad861cb897f7a20a01762e (본 tutorial에서 사용한 원래의 colab 코드)
- [Montreal Forced Aligner documentation 홈페이지](https://montreal-forced-aligner.readthedocs.io/en/latest/)
- [MFA github](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner)
