# Montreal Forced Aligner

This notebook illustrates how to use the [Montreal Forced Aligner (MFA)](https://montreal-forced-aligner.readthedocs.io/en/latest/) in the BPM. The intent is to show you how to work with the MFA tools in the BPM environment and only covers the basics. For more advanced usage of the MFA you should consult its [documentation](https://montreal-forced-aligner.readthedocs.io/en/latest/).

## The utilities

The MFA comes with several command line utilities. The first two are shown in this notebook.
1. **`mfa_align`**: Use this command to perform forced alignment.
1. **`mfa_validate_dataset`** Utility that validates aligner inputs, including audio files, transcripts, and phone dictionary. It is a good idea to use this command before running `mfa_align` on a corpus directory.
1. **`mfa_generate_dictionary`** Command to create a dictionary from a graphame to phone (G2P) model or from orthography.
1. **`mfa_train_and_align`** Use this command to bootstrap the training of a new language model.
1. **`mfa_train_g2p`** This utility is used to train a G2P model from an existing dictionary.

All of these utilities can be called with the argument `--help` to see all available arguments and options. Symlinks for all of these utilities exist in $PATH so that they can be called directly without path information.

## MFA location

In the BPM the MFA is installed in `/opt/montreal-forced-aligner`.

## Preparing to align

The MFA aligner requires several inputs:

1. A corpus to align. The corpus is one or more .wav files in a directory with corresponding transcript files organized by name: e.g. for a file named xxx.wav there should be a corresponding [xxx.txt (or xxx.lab) file containing the audio transcript](https://montreal-forced-aligner.readthedocs.io/en/latest/data_format.html#prosodylab-aligner-format). Alternatively, the transcript can be in [xxx.TextGrid if you want to align multiple labelled regions of the audio](https://montreal-forced-aligner.readthedocs.io/en/latest/data_format.html#textgrid-format).
1. A [dictionary file](https://montreal-forced-aligner.readthedocs.io/en/latest/dictionary.html).
1. An [acoustic model](https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html).

We'll cover how to prepare each of these inputs in reverse order.

## Installing a pretrained model

The MFA has utilities for training new acoustic models, but this notebook will only cover how to use existing pretrained models.

[Pretrained acoustic models](https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html) can be installed as subdirectories of `/opt/montreal-forced-aligner/pretrained_models`. When installed to this location you can simply use the subdirectory name (e.g. `english`) for the `acoustic_model_path` argument required by `mfa_align`. The `english` and `mandarin` subdirectories are already included in the BPM, and you can use them as examples for installing additional models.

Execute the following cell to see which models are currently available in your copy of the BPM:

In [1]:
!find /opt/montreal-forced-aligner/pretrained_models/* -maxdepth 0 -type d

/opt/montreal-forced-aligner/pretrained_models/english
/opt/montreal-forced-aligner/pretrained_models/mandarin


The code block that follows shows how to install a pretrained model from the [MFA collection](https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html). The example is for Spanish, and other language models will install in a similar way. (The cell may take a while to execute if the download is slow, and no progress for `wget` will be shown until the cell is done executing.)

In [2]:
%%bash
# Execute in a terminal instead of the notebook if you want to see the result
# of each command as it executes.
cd /opt/montreal-forced-aligner/pretrained_models
wget http://mlmlab.org/mfa/mfa-models/spanish.zip
unzip spanish.zip   # Unzip to subdirectory `spanish`
rm spanish.zip      # Optional cleanup

Archive:  spanish.zip
   creating: spanish/
  inflating: spanish/final.occs      
  inflating: spanish/tree            
  inflating: spanish/final.mdl       
  inflating: spanish/meta.yaml       


--2019-02-05 14:18:32--  http://mlmlab.org/mfa/mfa-models/spanish.zip
Resolving mlmlab.org (mlmlab.org)... 207.38.86.235
Connecting to mlmlab.org (mlmlab.org)|207.38.86.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14680624 (14M) [application/zip]
Saving to: 'spanish.zip'

     0K .......... .......... .......... .......... ..........  0%  358K 40s
    50K .......... .......... .......... .......... ..........  0%  268K 46s
   100K .......... .......... .......... .......... ..........  1%  215K 53s
   150K .......... .......... .......... .......... ..........  1%  107K 73s
   200K .......... .......... .......... .......... ..........  1%  107K 84s
   250K .......... .......... .......... .......... ..........  2% 49.9K 1m57s
   300K .......... .......... .......... .......... ..........  2% 76.5K 2m6s
   350K .......... .......... .......... .......... ..........  2% 89.3K 2m9s
   400K .......... .......... .......... .......... ..........  3% 82.6K 2m

Once the model is installed in this location it can be referred to as `spanish` when calling `mfa_align`.

```bash
mfa_align corpus_directory dictionary_path spanish output_directory
```

## Installing a dictionary

Dictionary files can be installed in any location in the BPM. No dictionary is installed in the BPM by default. A good choice for English is the [LibriSpeech](http://www.openslr.org/11/) lexicon from [openslr.org](http://openslr.org). The following cell downloads this lexicon to `/opt/montreal-forced-aligner/dictionaries`.

In [3]:
%%bash
# Execute in a terminal instead of the notebook if you want to see the result
# of each command as it executes.
cd /opt/montreal-forced-aligner/
mkdir -p dictionaries
cd dictionaries
wget http://www.openslr.org/resources/11/librispeech-lexicon.txt
# The dictionary is now at /opt/montreal-forced-aligner/dictionaries/librispeech-lexicon.txt

--2019-02-05 14:23:01--  http://www.openslr.org/resources/11/librispeech-lexicon.txt
Resolving www.openslr.org (www.openslr.org)... 46.101.158.64
Connecting to www.openslr.org (www.openslr.org)|46.101.158.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5627653 (5.4M) [text/plain]
Saving to: 'librispeech-lexicon.txt'

     0K .......... .......... .......... .......... ..........  0% 99.0K 55s
    50K .......... .......... .......... .......... ..........  1% 74.1K 64s
   100K .......... .......... .......... .......... ..........  2% 25.6K 1m52s
   150K .......... .......... .......... .......... ..........  3% 26.0K 2m14s
   200K .......... .......... .......... .......... ..........  4% 29.6K 2m22s
   250K .......... .......... .......... .......... ..........  5% 15.7K 2m52s
   300K .......... .......... .......... .......... ..........  6% 21.8K 3m0s
   350K .......... .......... .......... .......... ..........  7% 14.1K 3m21s
   400K .......... .......

## Prepare your corpus

To prepare your corpus place one or more .wav files into a directory, along with a transcript file in [.txt or .TextGrid format](https://montreal-forced-aligner.readthedocs.io/en/latest/data_format.html). The transcript file should have the same basename as the .wav file, and the audio should be at 16kHz sample rate or greater.

In the next cell we create a project directory and copy a .wav file and .TextGrid to the `corpus` subdirectory. 

In [4]:
%%bash
mkdir -p /home/ubuntu/myproject/corpus
cp ../resources/two_plus_two_16.wav /home/ubuntu/myproject/corpus
cp ../resources/two_plus_two_16.TextGrid /home/ubuntu/myproject/corpus

## Validate your inputs

Before performing alignment you may want to validate your inputs. Use the `mfa_validate_dataset` utility to check your corpus directory and dictionary. Review the output for errors before proceeding.

In [5]:
%%bash
mfa_validate_dataset /home/ubuntu/myproject/corpus /opt/montreal-forced-aligner/dictionaries/librispeech-lexicon.txt

Setting up corpus information...
Creating dictionary information...
Setting up corpus_data directory...
Generating base features (mfcc)...
Calculating CMVN...

    1 sound files
    0 sound files .lab transcription files
    1 sound files with TextGrids transcription files
    0 additional sound files ignored (see below)
    1 speakers
    2 utterances
    0.910 seconds total duration
    
    DICTIONARY
    ----------
    There were no missing words from the dictionary. If you plan on using the a model trained on this dataset to align other datasets in the future, it is recommended that there be at least some missing words.
    
    SOUND FILE READ ERRORS
    ----------------------
    There were no sound files that could not be read.
    
    FEATURE CALCULATION
    -------------------
    There were no utterances missing features.
    
    FILES WITHOUT TRANSCRIPTIONS
    ----------------------------
    There were no sound files missing transcriptions.
    
    TRANSCRIPTIONS WITHO

100%|##########| 39/39 [00:07<00:00,  5.48it/s]


## Run the aligner

If your corpus passes the validation stage, then you should be able to run the aligner. Provide the corpus location, dictionary location, model name, and output directory to `mfa_align` to align your corpus. The output directory will be created if it doesn't already exist.

In [6]:
%%bash
# The following command is a single statement, and the \ characters indicate that the
# command continues on the following line. The linebreaks make it easier to read.
mfa_align \
    /home/ubuntu/myproject/corpus \
    /opt/montreal-forced-aligner/dictionaries/librispeech-lexicon.txt \
    english \
    /home/ubuntu/myproject/mfa 

Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 2.0
Creating dictionary information...
Done with setup.
Done! Everything took 2.0869534015655518 seconds


The output of `mfa_align` should now be in `/home/ubuntu/myproject/mfa`.