# Parselmouth Example

This notebook will demonstrate some basic functionality of parselmouth and how to convert existing praat scripts using the library. The logic and approach in this example will be similar/identical to that of Praat; in our later example we'll talk about other possible approaches that using parselmouth and Python allow us to pursue.

This notebook should be accompanied by the following files:
1. parselmouth_ex.wav
1. parselmouth_ex.TextGrid

# General Notes

You might not know the name of a function you use in Praat and want to use through parselmouth. You can

1. do the action manually in praat
1. open a blank script and press `Paste History'
1. find the name of the command for the action you want
1. look in the parselmouth API documentation `https://parselmouth.readthedocs.io/en/latest/api_reference.html` for the command that matches 

# Opening and interacting with .wav files

In [None]:
import parselmouth as pm # implementing praat functionality
import numpy as np # for numerical functions
import audiolabel as al # for interacting with LabelManager objects (TextGrids)

# Read in wav file; note the information provided in summary printouts
wav = pm.Sound('parselmouth_ex.wav')
print(wav)


Often we'll want to make measurements on subsections of audio, and save those subsections for future reference.

The objects of the Sound class in `parselmouth` have different methods (or functions) defined within them. 

One such method is `extract_part()` which can take a `from_time` and `to_time` argument (among others).

In [None]:
# establish start and end times
s = 10
e = 15

# extract section
new_wav = wav.extract_part(from_time=s,to_time=e)

# note the information has changed for our new_wav object
print(new_wav)


Another method of the Sound class is `save()`, which allows you to ... save the object. 

The following call has an error in it; what's the issue? Once you've identified it, correct it and save our `new_wav` object.

In [None]:
pm.save(new_wav, "my_new_sound.wav", 'WAV') 
# many different formats, bit-depths, encodings possible; see class parselmouth.SoundFileFormat in the API reference

Let's see what it looks like if we want to get the mean f0 of a given stretch of audio.

In [None]:
sub = new_wav
pitch = sub.to_pitch()
mean_f0 = np.mean(pitch.to_matrix().values) # no .get_mean() method for parselmouth.Pitch class
mean_f0

That seems a touch low... Let's plot the Pitch object's values.

In [None]:
import matplotlib.pyplot as plt
plt.plot(pitch.to_matrix().values[0])

Aha!  Looks like there are a lot of zero values likely from silence and many values that are clearly too high.  We can set a floor and ceiling and try again.

In [None]:
(floor, ceiling) = (50,200)
mean_f0 = np.mean([f for f in pitch.to_matrix().values[0] if floor < f < ceiling])
mean_f0

That's more like it!  

The specifics about classes, methods, and their arguments can be found in the API reference for `parselmouth` linked above.  And in fact, `to_pitch()` accepts `pitch_floor` and `pitch_ceiling` optional arguments, but the behavior does not replicate the above cell.  Check it out and see why!

# Common Workflow Example

## Common Framework
For many types of measurements in Praat, we have the following common workflow:

1. Select a Sound and Force-Aligned TextGrid
1. Loop over all phone intervals in the TextGrid
1. Select those phones and contexts that match what you care about
1. Make some measurements
1. Write those measurements to a .csv file

## Application

In [None]:
(floor, ceiling) = (50,200)

def measure_f0(wav,tg,out_file):
    # read in wav and textgrid
    wav = pm.Sound(wav)
    tg = al.LabelManager(from_file=tg,from_type='praat')
    
    # create header
    out_lines = []
    out_lines.append('start,end,duration,f0')
    # find all instances of 'iy' in phone tier
    matches = tg.tier('phone').search('iy')
    
    for match in matches:
        # get time of match
        s = match.t1
        e = match.t2
        dur = e - s
        
        # extract section of wav file and measure f0
        sub = wav.extract_part(from_time = s, to_time = e)
        pitch = sub.to_pitch()
        mean_f0 = np.mean([f for f in pitch.to_matrix().values[0] if floor < f < ceiling]) # no .get_mean() method for parselmouth.Pitch class

        # add to lines to write
        out_lines.append(','.join(map(str,[s,e,dur,mean_f0]))) #map(str, list) b/c join expects strings, not floats.
        
    # write output lines to file
    with open(out_file, 'w') as output:
        for line in out_lines:
            output.write(str(line + '\n'))
    
    
measure_f0('parselmouth_ex.wav','parselmouth_ex.TextGrid','f0_meas.csv')

# Context

Typically, we'll want to also consider other factors, such as the preceding and following phones as well as the lexical context. Like in Praat, you might consider calculating these during your for-loop over the labelManager object.

Let's look at example where we do the same as above, but also capture preceding phone, following phone, and word frame.

In [None]:
def measure_f0(wav,tg,out_file):
    # read in wav and textgrid
    wav = pm.Sound(wav)
    tg = al.LabelManager(from_file=tg,from_type='praat')
    
    # create header
    out_lines = []
    out_lines.append('start,end,duration,f0,prev,foll,word')
    # find all instances of 'iy' in phone tier
    matches = tg.tier('phone').search('iy')
    
    for n, match in enumerate(matches):
        # get time of match
        s = match.t1
        e = match.t2
        dur = e - s
        mid = s + (dur/2)
        
        # get previous and following
        prev = tg.tier('phone').prev(match).text
        foll = tg.tier('phone').next(match).text
        
        # get word
        word = tg.tier('word').label_at(mid).text
        
        # extract section of wav file and measure f0
        sub = wav.extract_part(from_time = s, to_time = e)
        pitch = sub.to_pitch()
        mean_f0 = np.mean([f for f in pitch.to_matrix().values[0] if floor < f < ceiling]) # no .get_mean() method for parselmouth.Pitch class

        # add to lines to write
        out_lines.append(','.join(map(str,[s,e,dur,mean_f0,prev,foll,word]))) #map(str, list) b/c join expects strings, not floats.
        
    # write output lines to file
    with open(out_file, 'w') as output:
        for line in out_lines:
            output.write(str(line + '\n'))
    
    
measure_f0('parselmouth_ex.wav','parselmouth_ex.TextGrid','f0_meas_context.csv')