# praatIO Workshop 

### What is praatIO?
"A python library for working with praat, textgrids, time aligned audio transcripts, and audio files. It is primarily used for extracting features from and making manipulations on audio files given hierarchical time-aligned transcriptions (utterance > word > syllable > phone, etc)."
It was developed by Tim Mahrt, a former student of the Linguistics Department at UIUC. 
In this workshop I will work with some dummy data. This dummy data contains three tiers: phonetic transcription, orthography and syllable. 
With the script below I will change the time stamps of the phonetic transcription tier, create a new interval tier with /p, t, k/ and also a new point tier with the same sounds. 
I hope that what is shown here can be used as a starting point for further data analysis of your interest. 

### Importing libraries and getting file names 
The lines of code below show how to import the necessary libraries to perform the segmentation proposed here and how to get the file directory and their names. 

In [25]:
from praatio import tgio #Tim Mahrt's library to parse textgrids
import os #to get file directory 
import glob #to get file directories at the same time 
from collections import namedtuple #named tuples are the type of data structure that 
#Tim used to represented the tiers.
cwd = os.getcwd() #getting current directory 
files = glob.glob(cwd + '/*TextGrid') #getting names of all textgrid files
print(cwd)
print(files)

/Users/marcofonseca/Library/Mobile Documents/com~apple~CloudDocs/praatIO_wokshop
['/Users/marcofonseca/Library/Mobile Documents/com~apple~CloudDocs/praatIO_wokshop/test2.TextGrid', '/Users/marcofonseca/Library/Mobile Documents/com~apple~CloudDocs/praatIO_wokshop/test3.TextGrid', '/Users/marcofonseca/Library/Mobile Documents/com~apple~CloudDocs/praatIO_wokshop/test1.TextGrid']


### Creating empty lists 
The line of codes assigns variables that are necessary to perform the data segmentation proposed here.
The type of data structure that praatIO uses to represent TextGrids is a name tuple. A tuple is an ordered number of values separated by a comma. Named tuples cannot be assigned, so it is necessary to create a new one to update their values. 
Lists which will be populated will then be used as input of the named tuples. 

In [63]:
Interval = namedtuple("Interval", ["start", "end", "label"])
Point = namedtuple("Point", ["time", "label"]) #defining a named a tuple blueprint analogous to a class
voicelessPlosiveList = ["t", "k", "p"] #getting a list of plosives 
plosiveIntervals = [] #creating empty lists to be populated leter. 
plosiveStartPoints = [] 
updatedPhonetic = []
duration = []

### Looping through each text grid 

In [64]:
for file in files: #looping through all files 
    tg = tgio.openTextgrid(file) #opening text grid
    #print(tg)
    #print(tg.tierDict['phonetic_transcription'].entryList)
    #print(tg.tierDict['ortography'].entryList)
    print(tg.tierDict['syllable'].entryList) 

[Interval(start=0.37650906388920924, end=0.7998951688059928, label='is'), Interval(start=0.7998951688059928, end=1.0468703966741164, label='to'), Interval(start=1.0468703966741164, end=1.399692150771436, label='e'), Interval(start=1.399692150771436, end=1.6334365628609102, label='a'), Interval(start=1.6334365628609102, end=2.1450281063020236, label='te'), Interval(start=2.1450281063020236, end=2.6742607374480025, label='nas'), Interval(start=2.6742607374480025, end=3.185852280889116, label='tes'), Interval(start=3.185852280889116, end=3.446058324535889, label='te')]
[Interval(start=0.37650906388920924, end=0.7998951688059928, label='is'), Interval(start=0.7998951688059928, end=1.0468703966741164, label='to'), Interval(start=1.0468703966741164, end=1.399692150771436, label='e'), Interval(start=1.399692150771436, end=1.6334365628609102, label='a'), Interval(start=1.6334365628609102, end=2.1450281063020236, label='ke'), Interval(start=2.1450281063020236, end=2.6742607374480025, label='nas

### Acessing the start time, end time and label of the text grids. 

In [65]:
for file in files: #looping through all files 
    tg = tgio.openTextgrid(file) #opening text grid
    phoneticTranscriptionTier = tg.tierDict['phonetic_transcription'] #getting phonetic
    #transcription tier. 
    orthographyTier = tg.tierDict['ortography'] #getting ortography tier.
    syllableTier = tg.tierDict['syllable'] #getting syllable tier tier.
    #print(orthographyTier.entryList)
    #print(phoneticTranscriptionTier.entryList[1].start) #[0].label)
    #print(orthographyTier.entryList)
    #print(syllableTier.entryList)
    #print(phoneticTranscriptionTier.entryList)


### Adding 0.1 seconds to the Phonetic tier, calculating duration and saving the output

In [67]:
for file in files: #looping through all files 
    tg = tgio.openTextgrid(file) #opening text grid
    phoneticTranscriptionTier = tg.tierDict['phonetic_transcription'] #getting phonetic
    #transcription tier. 
    for i in range(len(phoneticTranscriptionTier.entryList)):
        current_len = phoneticTranscriptionTier.entryList[i].end - phoneticTranscriptionTier.entryList[i].start
        duration.append(current_len) #calculating duration
        newInterval = Interval(phoneticTranscriptionTier.entryList[i].start +0.1, phoneticTranscriptionTier.entryList[i].end + 0.1, 
                              phoneticTranscriptionTier.entryList[i].label) #creating a new
        #interval and adding 0.1 seconds to the beginning and the end. 
        updatedPhonetic.append(newInterval) #appending it to the empty list.



In [60]:
with open('duratio.txt', 'w') as f: #saving results
    f.write(str(duration))

### Looking for plosives 
The code below finds matches for the plosive list that I created ealier. 
It also populates the named tuples with the plosive points and intervals, in a process similar that was done for adding 0.1 to the phonetic tier. 

In [36]:
for file in files: #looping through all files 
    tg = tgio.openTextgrid(file) #opening text grid
    phoneticTranscriptionTier = tg.tierDict['phonetic_transcription'] #getting phonetic tier
    for name in voicelessPlosiveList: #looping through the 
        findMatches = phoneticTranscriptionTier.find(name) #matching it with the phonetic tiers
        for i in findMatches: #for each match 
            newInterval = Interval(phoneticTranscriptionTier.entryList[i].start, 
                                   phoneticTranscriptionTier.entryList[i].end, 
                    phoneticTranscriptionTier.entryList[i].label) #created a new interval
            plosiveIntervals.append(newInterval) #populates the named tuple
            newPoint = Point(phoneticTranscriptionTier.entryList[i].start, 
                             phoneticTranscriptionTier.entryList[i].label) #doing the same thing for 
            #the point tier  
            plosiveStartPoints.append(newPoint)
print(plosiveIntervals)
print(plosiveStartPoints)

[Interval(start=0.7998951688059928, end=0.95425468622357, label='t'), Interval(start=1.6334365628609102, end=1.9421555976960647, label='t'), Interval(start=2.6742607374480025, end=2.855081886422879, label='t'), Interval(start=3.1814420089628994, end=3.3181604386756107, label='t'), Interval(start=0.7998951688059928, end=0.95425468622357, label='t'), Interval(start=2.6742607374480025, end=2.855081886422879, label='t'), Interval(start=3.1814420089628994, end=3.3181604386756107, label='t'), Interval(start=1.6334365628609102, end=1.9421555976960647, label='k'), Interval(start=0.7998951688059928, end=0.95425468622357, label='t'), Interval(start=2.6742607374480025, end=2.855081886422879, label='t'), Interval(start=3.1814420089628994, end=3.3181604386756107, label='t'), Interval(start=1.6334365628609102, end=1.9421555976960647, label='p'), Interval(start=0.7998951688059928, end=0.95425468622357, label='t'), Interval(start=1.6334365628609102, end=1.9421555976960647, label='t'), Interval(start=2

### Saving the new tiers
Now we will finally save the changes that we made here. This is done by populating the lists with the new point tiers.   

In [44]:
for file in files: #looping through all files
    tg = tgio.openTextgrid(file) #opening text grid
    plosivesTier = tgio.IntervalTier(name = "plosives_tier", entryList = plosiveIntervals)
    plosivesPointTier = tgio.PointTier(name = "plosive_point", entryList = plosiveStartPoints)
    newPhoneticTier = tgio.IntervalTier(name = "phonetic_transcription", entryList = updatedPhonetic)
    tg.replaceTier('phonetic_transcription', newPhoneticTier)
    tg.addTier(plosivesTier)
    tg.addTier(plosivesPointTier)
    tg.save(file)