# Prep audio for Dr.VOT and add results to textgrids

[Dr.VOT is a tool for automatic measurement of voice onset time (VOT)](https://github.com/MLSpeech/Dr.VOT).

Dr.VOT has requirements that make it challenging to fit into larger workflows. Specifically, it runs as a command line tool in its own Python environment and requires short audio files that begin with the target area to analyze, organized into its own folder structure.

This notebook illustrates a way of integrating Dr.VOT with a corpus of longer audio files. The small sample corpus illustrated here contains word- and phone-aligned textgrids such as you might produce from a forced aligner like MFA. The steps to integrating Dr.VOT are:

  1. Use the input textgrids to find audio to extract for Dr.VOT to process
  1. Run Dr.VOT (outside the notebook)
  1. Combine the Dr.VOT results with the input textgrids, adding `VOT` as a new textgrid tier. Write the new combined textgrids under a folder named `votadded`. 

Modify this notebook to match your own corpus organization and analysis requirements.

In [1]:
import re
from pathlib import Path
import pandas as pd
import parselmouth
from phonlab.utils import dir2df
from audiolabel import read_label, df2tg

## Use the input textgrids to find audio to extract for Dr.VOT to process

Load MFA input textgrids from speaker directories. The .wav files are in the same folders as the textgrids.

In [2]:
mfadir = Path('../resource/drvot/')
mfatg = dir2df(mfadir, fnpat=r'\.TextGrid$', addcols=['barename'])
mfatg

Unnamed: 0,relpath,fname,barename
0,spkr1,two_plus_two.TextGrid,two_plus_two
1,spkr2,three_plus_five.TextGrid,three_plus_five


Now load each textgrid into a phone-word dataframe, find all the phones of interest, and extract and save audio that starts at the phone and ends at the right word boundary.

**NOTE** Dr.VOT doesn't seem to be configurable, so it's easiest to output to `data/raw` in the Dr.VOT repository directory. Specify its location in the `drvotdir` variable.

In [3]:
# Set values of both of these variables for your project.
phones_of_interest = ['p', 't']  # Phones to inspect for VOT. Use your own values here
drvotdir = Path(Path.home() / 'src/Dr.VOT')   # Directory where you cloned Dr.VOT github repo

In [4]:
# Iterate over the MFA output textgrids and extract audio portions.
for tg in mfatg.itertuples():
    # Load .wav file corresponding to the MFA textgrid.
    snd = parselmouth.Sound(str(mfadir / tg.relpath / f'{tg.barename}.wav'))

    # Load MFA output into phone and word dataframes, then merge into phone-word dataframe,
    # as described at https://github.com/rsprouse/audiolabel/blob/master/doc/working_with_phonetic_dataframes.ipynb
    [wddf, phdf] = read_label(
        mfadir / tg.relpath / tg.fname,
        tiers=['word', 'phone'],
        ftype='praat'
    )
    phwddf = pd.merge_asof(
        phdf.rename({'t1': 't1_ph', 't2': 't2_ph'}, axis='columns'),
        wddf.drop('fname', axis='columns') \
            .rename({'t1': 't1_wd', 't2': 't2_wd'}, axis='columns'),
        left_on='t1_ph',
        right_on='t1_wd'
    )

    # For every phone that is of interest, extract from start of phone to end of word and
    # save as a .wav file that includes the time offset in the filename.
    for ph in phwddf[phwddf['phone'].isin(phones_of_interest)].itertuples():
        extsnd = snd.extract_part(from_time=ph.t1_ph, to_time=ph.t2_wd)
        outwav = drvotdir / 'data/raw' / tg.relpath / f'{tg.barename}.{ph.t1_ph:0.8f}.wav'  # Include 8 digits after decimal in filename
        outwav.parent.mkdir(parents=True, exist_ok=True)  # Make sure spkr output directory exists!
        extsnd.save(file_path=str(outwav), format='WAV')

Assuming your audio files were created in `data/raw`, you can now use Dr.VOT to find your VOT measurements.

## Run Dr.VOT

Open a terminal window outside of this notebook and follow the [instructions for running Dr.VOT](https://github.com/MLSpeech/Dr.VOT) that are found in the github repository. The previous steps should have written your extracted audio files to `data/raw` under the cloned Dr.VOT directory.

When you have succeeded in running Dr.VOT you can follow the next steps to add its results to your MFA input textgrids and write them to a new folder.

## Combine Dr.VOT results with MFA textgrids

First, collect all the Dr.VOT output textgrid filenames into a dataframe. Extract the time offset from each filename with the `fnpat` parameter. Also extract the `barename` part of the filename that matches the `barename` column of the source MFA textgrid dataframe `mfatg`.

Since the `toffset` column is created by string matching, we need to convert the column from a string representation to a floating point number using `astype`. This allows `toffset` to be added to the `t1` and `t2` values from the `VOT` textgrids.

In [5]:
# Regular expression to match filenames of interest. The `barename` and `toffset`
# capture groups will be added as columns to the output of `dir2df`.
fnpatre = re.compile(
    r'''
        (?P<barename>.+)       # capture `barename` to match `mfatg` column
        \.                     # `.` separator
        (?P<toffset>\d+\.\d+)  # capture time offset as `toffset`
        _pred(?:NEG|POS)       # _predNEG|_predPOS string added by Dr.VOT, not captured
        \.TextGrid$            # textgrid extension
    ''',
    re.VERBOSE
)
drvotdf = dir2df(drvotdir / 'data/out_tg', fnpat=fnpatre)
drvotdf['toffset'] = drvotdf['toffset'].astype(float)   # Cast from str to float
drvotdf

Unnamed: 0,relpath,fname,barename,toffset
0,spkr1,two_plus_two.0.25219823_predPOS.TextGrid,two_plus_two,0.252198
1,spkr1,two_plus_two.0.46425836_predNEG.TextGrid,two_plus_two,0.464258
2,spkr1,two_plus_two.0.86794082_predNEG.TextGrid,two_plus_two,0.867941
3,spkr2,three_plus_five.0.43762015_predPOS.TextGrid,three_plus_five,0.43762


Read the `VOT` tier from each textgrid into a dataframe and collect the rows with non-empty labels, which mark negative and positive VOT durations. The result is a dataframe of VOT values. (Dr.VOT produces another tier named `Window` which is not discussed in this notebook.)

Also add the `relpath`, `barename`, and `toffset` values to each row.

In [7]:
votdfs = []
for row in drvotdf.itertuples():
    [df] = read_label(
        drvotdir / 'data/out_tg' / row.relpath / row.fname,
        tiers=['VOT'],
        ftype='praat_long'
    )
    df[['relpath', 'barename', 'toffset']] = row.relpath, row.barename, row.toffset
    votdfs.append(df[df['VOT'] != ''])
votdf = pd.concat(votdfs).reset_index(drop=True)
votdf

Unnamed: 0,t1,t2,VOT,fname,relpath,barename,toffset
0,0.032,0.098,POS_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.252198
1,0.061,0.122,NEG_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.464258
2,0.042,0.131,NEG_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.867941
3,0.077,0.082,POS_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr2/thr...,spkr2,three_plus_five,0.43762


Use the `toffset` values to correct `t1` and `t2`. The corrected values will be used when adding VOT to the MFA textgrids.

In [8]:
votdf['t1'] = votdf['t1'] + votdf['toffset']        # Correct the `t1` values
votdf['t2'] = votdf['t2'] + votdf['toffset']        # Correct the `t2` values
votdf

Unnamed: 0,t1,t2,VOT,fname,relpath,barename,toffset
0,0.284198,0.350198,POS_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.252198
1,0.525258,0.586258,NEG_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.464258
2,0.909941,0.998941,NEG_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr1/two...,spkr1,two_plus_two,0.867941
3,0.51462,0.51962,POS_VOT,/Users/ronald/src/Dr.VOT/data/out_tg/spkr2/thr...,spkr2,three_plus_five,0.43762


The `barename` and `relpath` columns of `votdf` should contain values that match the same columns in the `mfatg` dataframe of MFA textgrids. Use these values to select VOT measures and add as a new textgrid tier. Change the value of `outtg` to specify where you would like the new textgrid to be saved.

In [9]:
# Loop over all the MFA textgrids, one at a time
for tg in mfatg.itertuples():
    # Load all tiers from this MFA textgrid into a list of dataframes.
    # Since we do not specify which tiers to load via the `tiers` param,
    # the label content columns will have the default name `label`.
    dflist, lm = read_label(mfadir / tg.relpath / tg.fname, ftype='praat', return_lm=True)

    # Create subset of `votdf` that matches this MFA textgrid and sort by time
    votsubset = votdf[
        (votdf['relpath'] == tg.relpath) & (votdf['barename'] == tg.barename)
    ].sort_values('t1')  # Sort by `t1` to ensure proper order!

    # Add VOT tier and write the output textgrid
    outtg = mfadir.parent / 'votadded' / tg.relpath / f'{tg.barename}.TextGrid'  # Output textgrid name. Set to your own location
    outtg.parent.mkdir(parents=True, exist_ok=True)  # Make sure textgrid output directory exists!
    df2tg(
        dflist + [votsubset],                   # Dataframes to write as textgrid tiers
        tnames=list(lm.names) + ['VOT'],        # Tier names in output textgrid
        lbl=['label'] * len(dflist) + ['VOT'],  # Column names in dataframes that have label content
        outfile=outtg
    )