# Using `audiolabel`

The recommended way to use `audiolabel` is to use the `read_label` function to read label files into Pandas dataframes.

Use this notebook to read label files and view the results. See the related ['Working with phonetic dataframes' notebook](working_with_phonetic_dataframes.ipynb) for useful ways to use the resulting dataframes.

The next cell loads all tiers from a Praat textgrid in the order that they appear in the file and saves them to separate variables `phdf`, `wddf`, and `ctxdf`.

**`read_label` always returns a list of tiers.** It is recommended that the lefthand side of `read_label` assignments always be enclosed in brackets `[ ]`, even when only one tier is returned. This is so that the variable contains the tier itself rather than the list that contains the tier.

In [1]:
from audiolabel import read_label
[phdf, wddf, ctxdf] = read_label('../test/this_is_a_label_file.TextGrid', ftype='praat')

The tier dataframe contains a row for each label and four columns by default:

- `t1`: start time of the label
- `t2`: end time of the label
- `label`: label content
- `fname`: filename where label was read from

In [11]:
wddf   # 'word' tier

Unnamed: 0,t1,t2,word,fname
0,0.012472,0.441497,THIS,../test/this_is_a_label_file.TextGrid
1,0.441497,0.611111,IS,../test/this_is_a_label_file.TextGrid
2,0.611111,0.660998,A,../test/this_is_a_label_file.TextGrid
3,0.660998,1.139909,LABEL,../test/this_is_a_label_file.TextGrid
4,1.139909,1.628798,FILE,../test/this_is_a_label_file.TextGrid
5,1.628798,1.648753,sp,../test/this_is_a_label_file.TextGrid


In [3]:
phdf   # 'phone' tier

Unnamed: 0,t1,t2,label,fname
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid


In [4]:
ctxdf   # 'context' tier

Unnamed: 0,t1,t2,label,fname
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid


## The `tiers` parameter

You can read specific tiers from a label file by including the `tiers` parameters. Tiers can be identified by the name they have in the label file or by index number.

**NOTE: It is recommended that you always use the `tiers` parameter to make your code self-documenting.** An additional benefit is that `read_label` will use the tier names you specify as the names of the label columns.

In [5]:
[phdf, wddf, ctxdf] = read_label(
    '../test/this_is_a_label_file.TextGrid',
    ftype='praat',
    tiers=['phone', 'word', 'context']   # Label content column
)
ctxdf   # Label column is named 'context' instead of 'label'

Unnamed: 0,t1,t2,context,fname
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid


Praat textgrids usually have named tiers, but some other label filetypes do not. You can identify tiers by index number instead of name, in which case the label column does not get a special name.

In [6]:
# Label column is named 'label'
[phdf2, wddf2, ctxdf2] = read_label(
    '../test/this_is_a_label_file.TextGrid',
    ftype='praat',
    tiers=[0, 1, 2]   # Label content columns will get generic name 'label'
)
ctxdf2

Unnamed: 0,t1,t2,label,fname
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid


## The `ftype` parameter

The `read_label` function can read several types of label files. The 'praat' type will read Praat textgrids. You can explicitly specify 'praat_short' or 'praat_long' for short/long textgrid types, but usually the 'praat' `ftype` value will determine that detail automatically.

Other valid `ftype` values are 'eaf' for ELAN annotation files, 'esps' for [ESPS](https://github.com/rsprouse/espsfree) label files, or 'wavesurfer' for Wavesurfer files.

In [7]:
# ELAN
[eafdf] = read_label('../test/v2test.eaf', ftype='eaf', tiers=[0])
eafdf

Unnamed: 0,t1,t2,label,fname
0,114585.0,115485.0,áa,../test/v2test.eaf
1,159040.0,159980.0,kaa,../test/v2test.eaf
2,188675.0,189775.0,áas,../test/v2test.eaf
3,225150.0,226240.0,//púr,../test/v2test.eaf
4,240100.0,241020.0,pth,../test/v2test.eaf
5,916255.0,917645.0,t+ée?,../test/v2test.eaf
6,921465.0,922905.0,tt-ha,../test/v2test.eaf
7,923190.0,923970.0,ã.,../test/v2test.eaf
8,984230.0,985620.0,nh,../test/v2test.eaf
9,1465165.0,1466945.0,pha?,../test/v2test.eaf


In [8]:
# ESPS
[espsdf] = read_label('../test/sample.esps', ftype='esps', tiers=[0])
espsdf

Unnamed: 0,t1,t2,label,fname
0,0.0,2.609,{B_TRANS},../test/sample.esps
1,2.609,2.709,IVER,../test/sample.esps
2,2.709,2.753,eh,../test/sample.esps
3,2.753,2.892,s,../test/sample.esps
4,2.892,3.238,IVER,../test/sample.esps
5,3.238,3.247,d,../test/sample.esps
6,3.247,3.327,ae,../test/sample.esps
7,3.327,3.439,s,../test/sample.esps
8,3.439,3.555,ae,../test/sample.esps
9,3.555,3.604,t,../test/sample.esps


## The `codec` parameter

Use the `codec` parameter to specify the label file's encoding for `ftype`s 'praat', 'eaf'. Any of the [standard encodings](https://docs.python.org/3/library/codecs.html#standard-encodings) from the Python `codecs` module may be used. If `codec` is not used for 'praat' textgrids `read_label` will attempt to autodetect the encoding and normally defaults to 'utf-8' encoding but malformed label files can make the detection fail, in which case you can specify the encoding yourself.

In [9]:
[prdf] = read_label('../test/utf8_no_BOM.TextGrid', ftype='praat', tiers=[0], codec='utf-8')
prdf

Unnamed: 0,t1,t2,label,fname
0,0.0,3.390004,,../test/utf8_no_BOM.TextGrid
1,3.390004,4.456577,bat,../test/utf8_no_BOM.TextGrid
2,4.456577,7.666173,,../test/utf8_no_BOM.TextGrid
3,7.666173,9.720314,bet bít,../test/utf8_no_BOM.TextGrid
4,9.720314,10.266122,,../test/utf8_no_BOM.TextGrid
