Skip to content

Extract vowels using TextGrids

Santiago Barreda edited this page Nov 10, 2020 · 23 revisions

[To access the tools, select 'Fast Track > Tools' in Praat.]

This function will automatically extract vowels from a larger sound filse based on information in TextGrids. It can do this for one file or for an entire folder of files at once. To use this, go to "Fast Track > Tools" in Praat, and then select the option to "Extract vowels with TextGrids. The menu will pop up.

Before running the function (this is important)

This function relies on data in the /dat/ folder for vowel extraction. A file called 'arpabet.csv' contains all of the vowels in the commonly-used ARPAbet symbol set (AA AE AH AO AW AX AY EH ER EY IH IX IY OW OY UH UW UX). By default, all segments with labels included in this csv file will be extracted. If you want to extract an alternate set of vowels, place a file called "vowelstoextract.csv" in the /day/ folder. Copy the formatting in the "arpabet.csv" file.

Options Menu

Options

Folder Options

  • Sound folder: The path to a folder containing wav files.

  • TextGrid folder: The path to a folder containing TextGrid files. Only wav files with corresponding TextGrid files will be processed.

  • Folder: this is the output folder for all vowel files and CSV files.

All three folders can be the same, or users can use three separate folders for the sake of organization.

Tier Options

  • Segment tier: Which tier has segmental information? This is mandatory.

  • Word tier: Which tier has word information? This is optional and ignored if equal to 0.

  • Segment tier: Do you want to collect comments from some specific tier? This is optional and ignored if equal to 0.

Collection Options

  • Select stress: Collection of vowels can be limited to those with primary or secondary stress.

  • Words to skip: Vowels from any words entered here will be skipped (must be an exact match). This let's 'frame' words be skipped (e.g., 'the', 'please say').

  • Buffer: Vowels can be 'padded' with an extra bit of sound to allow for analysis right up to the edge of segmental boundaries. Please see preparing sounds for more information. If you set this to 0, you will lose the 25 ms on either edge of the segment.

Outputs

The outputs are:

  • Sound files: wav files named filename_N, where N is a four digit number (i.e., 0001, 0002) associated with each vowel. Vowels are numbered sequentially from start to finish and skipped vowels are not numbered. Numbers begin at 0000 for each file.

  • Segmentation information: Named filename_segmentation_info.csv. These contain information about the context of the extracted sounds, vowel durations, stress, comments, and more. A file called just 'segmentation_information.csv" contains information about all processed files.

  • File information: Named filename_file_information.csv. These contain information about the extracted sounds to be analyzed. These files can be used to guide a folder analysis. A file called just 'file_information.csv" contains information about all processed files.

Clone this wiki locally