In [1]:
import importlib
from pathlib import Path
import subprocess
import shutil
import pandas as pd
import phonlab as phon

# Integrating other tools - [opensauce-python](https://github.com/voicesauce/opensauce-python) for voice quality

It is possible to integrate other tools that are not part of `phonlab` itself, particularly if the tool provides compatible input and output interfaces:

* Input is array-like, e.g. an audio file.
* Output is table-like, e.g. a csv file with timepoints and measurement columns.

The external tool can be created in its own `conda` environment or installed as an external command line tool that can be executed from a notebook in the `phonlab` environment via the Python `subprocess` module.

## Make a new environment for `opensauce-python`

Use `opensauce-environment.yml` to create a separate environment from the `phonlab` environment. The `opensauce-python` library will be installed in this environment, and we will not have to worry about potential dependency conflicts with `phonlab` because each environment is self-contained.

To create the environment:

```
conda env create -f opensauce-environment.yml
```

Then do `conda env list` and take note of the path to the `opensauce_env` environment. The Python executable will be in a `bin` subdirectory.

In [2]:
opensauceenv = Path('/Users/kjohnson/miniconda3/envs/opensauce_env')  # from `conda env list`
pythonbin = opensauceenv / 'bin' / 'python'
pythonbin

PosixPath('/Users/kjohnson/miniconda3/envs/opensauce_env/bin/python')

## Install Praat

`opensauce-python` uses Praat for voice quality analysis. [Download and install the Praat application](https://www.fon.hum.uva.nl/praat/) and take note of the path to the application executable.

Common paths for different platforms are:
* OS X: '/Applications/Praat.app/Contents/MacOS/Praat'
* Windows: 'C:/Program Files/Praat/Praat.exe'
* Linux: '/usr/bin/praat'

`opensauce-python` can also use `snack` and `pyreaper` for voice analysis. The installation of those tools is more complicated than Praat installation and is not illustrated in this notebook. See the [`opensauce-python` documentation for suggestions on how to install those tools](https://github.com/voicesauce/opensauce-python#installation) if you are interested.

In [3]:
praatbin = Path('/Applications/Praat.app/Contents/MacOS/Praat')
praatbin

PosixPath('/Applications/Praat.app/Contents/MacOS/Praat')

## Install `opensauce-python`

Activate the environment with `conda activate opensauce_env`. Navigate to the directory where you want to copy the `opensauce-python` code and do:

```
git clone https://github.com/voicesauce/opensauce-python.git
```

Take note of the path to the `opensauce-python` directory.

In [8]:
opensaucerepo = Path.home() / 'Documents' / 'src' / 'opensauce-python'
opensaucerepo

PosixPath('/Users/kjohnson/Documents/src/opensauce-python')

## Run `opensauce`

### Test run
Do a test run of opensauce [using the `subprocess.run` function](https://docs.python.org/3/library/subprocess.html#subprocess.run) to ensure the `opensauce_env` environment exists and the `opensauce-python` repository is found. The following should print the help message that describes how to run `opensauce`.

In [9]:
sauceproc = subprocess.run(
    [pythonbin, '-m', 'opensauce', '--help'],
    cwd=opensaucerepo,
    capture_output=True,
    text=True
)
print(sauceproc.stderr)
print(sauceproc.stdout)

  default_praat_path = 'C:\Program Files\Praat\Praat.exe'
  "'C:\Program Files\Praat\Praat.exe'. On Linux, "

usage: __main__.py [-h] [-s SETTINGS] [-m DEFAULT_MEASUREMENTS_FILE]
                   [--measurements {snackF0,praatF0,shrF0,reaperF0,snackFormants,praatFormants,SHR} [{snackF0,praatF0,shrF0,reaperF0,snackFormants,praatFormants,SHR} ...]]
                   [--no-f0-column] [--include-f0-column] [--no-formant-cols]
                   [--include-formant-cols] [--no-textgrid] [--use-textgrid]
                   [--no-labels] [--include-labels] [--include-empty-labels]
                   [--ignore-label IGNORE_LABEL] [--time-starts-at-zero]
                   [--time-starts-at-frameshift] [--include-interval-endpoint]
                   [--exclude-interval-endpoint] [--NaN NAN]
                   [-o OUTPUT_FILEPATH] [--output-delimiter {comma,tab}]
                   [--no-output-settings] [--output-settings]
                   [--output-settings-path OUTPUT_SETTINGS_PATH]
    

Note the presence of `--praat-path` among the available parameters. The description of `--praat-path` includes default paths for various platforms. If your `pythonbin` matches the platform default you should not need to use `--praat-path`. In this notebook we will include it in order to illustrate its usage.

### Run an analysis

Here is an example of how to use `opensauce`. It is not a full tutorial, and you should consult the [`opensauce-python`](https://github.com/voicesauce/opensauce-python) repository for more details.

While `opensauce` can include labels provided by a textgrid, it is not necessary to run the analysis on a complete audio file. A `t_ms` column is included in the output csv file that provides times (in ms) for the measurements in the rows.

File handling in `opensauce` is not robust, and the input file is expected to be a `.wav` file that does not include any other '.' characters in its name or filepath. The call to `shutil` makes a copy of the input file in a location where '.' is unlikely to be found in the filepath (unless your username includes '.'). If the name of your audio file contains a '.' (e.g. `subj1.sentence1.wav`) then you will have to rename it before running `opensauce`.

The input filename after copying is in `wavfile`, and the output file is in `outcsv`. You must create the directory in `exampledir` if it does not already exist.

In [10]:
example_file = importlib.resources.files('phonlab') / 'data' / 'example_audio' / 'mono_16bit_integer.wav'
exampledir = Path.home() / 'opensauce-exampledir'
wavfile = exampledir / example_file.name
shutil.copyfile(example_file, wavfile)
outcsv =  exampledir / 'opensauce-example.csv'
wavfile, outcsv

FileNotFoundError: [Errno 2] No such file or directory: '/Users/kjohnson/opensauce-exampledir/mono_16bit_integer.wav'

Run `opensauce` on `wavfile` and write the results to `outcsv`. Use Praat instead of `snack` for F0 estimation since `snack` is not installed.

Note that some of the options have one value (`--praat-path`, `--F0`, `--num-formants`, `--output-filepath`) or multiple values (`--measurements`) that follow them; others are simple flags that have no following value (`--include-f0-column`, `--include-formant-cols`). Follow the pattern established by this example for other options you might want to use, depending on option type.

In [None]:
# The opensauce-python documentation recommends putting the name of the
# input file first in the argument list, before the options, to avoid
# problems in parsing the command.

sauceproc = subprocess.run(
    [
        pythonbin,
        '-m', 'opensauce',
        wavfile,
        '--praat-path', praatbin,
        '--measurements', 'praatF0', 'praatFormants', 'SHR', 'shrF0',
        '--F0', 'praatF0',
        '--num-formants', '2',
        '--include-f0-column',
        '--include-formant-cols',
        '--output-filepath', outcsv
    ],
    cwd=opensaucerepo,
    capture_output=True,
    text=True
)
print(sauceproc.stderr)
print(sauceproc.stdout)

## Load the results

Load the `opensauce` results, which use '\t' (tab) as the field separator rather than ',' (comma). A selection of the first and last rows are shown, and the NaN results are expected because there is no vocalizing at these times in the audio file.

In [None]:
saucedf = pd.read_csv(outcsv, sep='\t')
saucedf

Add a new column `sec` that shows the measurement times in seconds instead of ms. Query a selection of rows where NaN values are not present.

In [None]:
saucedf['sec'] = saucedf['t_ms'] * 1e-3
saucedf.query('not praatF0.isna()').head() #('sec >= 1.853 and sec <= 1.857')

## Combine with a textgrid

Since the `saucedf` dataframe fits the interface used by `interpolate_measures`, it is simple to construct a new dataframe from the audio file's annotated textgrid and combine it with the measurements provided by `opensauce`.

The following steps for reading and combining the textgrid tiers are the same as in the `Textgrids.ipynb` notebook.

In [None]:
example_tg = importlib.resources.files('phonlab') / 'data' / 'example_audio' / 'im_twelve.TextGrid'

phdf, wddf = phon.tg_to_df(example_tg, tiersel=['phone', 'word'])
tgdf = phon.merge_tiers(inner_df=phdf, outer_df=wddf, suffixes=['', '_wd'])
tgdf = tgdf.drop(columns=['t1_wd','t2_wd'])  # don't need to keep the times from the word tier

tgdf = phon.add_context(tgdf, 'phone', nprev = 1, nnext = 1)  # add columns for preceeding and following phone
tgdf

In [None]:
vowels = ['ay', 'eh', 'iy', 'aa', 'aw']
vdf = tgdf.query(f'phone in {vowels}').copy()  # make a dataframe that just has vowels
vdf = phon.explode_intervals([0.2,0.5, 0.8], ts=['t1', 't2'], df=vdf) # get times for observations
vdf['vowel_dur'] = vdf['t2'] - vdf['t1']  # add a vowel duration column

vdf.head()

Combine with the measurements from `Textgrids.ipynb`.

In [None]:
fmtsdf = pd.read_csv('im_twelve.csv')  # read in the csv of formants measurements

vdf = phon.interpolate_measures(
    meas_df=fmtsdf[['sec','F1', 'F2', 'F3', 'F4']],  # meas_ts and cols to interpolate only
    meas_ts='sec',        # time index in the measurements dataframe
    interp_df=vdf,       # textgrid dataframe
    interp_ts='obs_t',  # target observation times in the textgrid
    overwrite=True
)
vdf.head()

Now combine with the measurements from `opensauce`.

In [None]:
vdf = phon.interpolate_measures(
    meas_df=saucedf[['sec', 'praatF0', 'pF1', 'pF2', 'SHR', 'shrF0']],  # meas_ts and cols to interpolate only
    meas_ts='sec',        # time index in the measurements dataframe
    interp_df=vdf,       # textgrid dataframe
    interp_ts='obs_t',  # target observation times in the textgrid
    overwrite=True
)
vdf.head()