# Oraganizing the FastTrack Data
This notebook is a guide and template for organizing the `aggregated_data.csv` file that FastTrack gives you and using the data there to find average formant values and vocal tract lengths.

## Packages needed

Before you start using the template, this notebook requires an additional Python packages: `pandas`. 

If you don't already have this package, you can install it with:

`pip install pandas`

## Importing the data
First, we need to get the `aggregated_data.csv` file here.

In [1]:
import pandas as pd

In [2]:
# Change the path so that it finds your 'aggregated_data.csv' file, wherever it is on your computer

data = pd.read_csv('/home/sage/College/phonlab/Tests/brown-fox_output/processed_data/aggregated_data.csv')
data

Unnamed: 0,file,f0,duration,label,group,color,number,cutoff,f11,f21,...,f33,f43,f14,f24,f34,f44,f15,f25,f35,f45
0,brown-fox_0001,239.4,0.1,AH,3,Black,1,5750,505,1758,...,2741,4102,328,1820,2730,4033,363,1812,2923,4147
1,brown-fox_0002,312.0,0.07,IH,11,Teal,2,6050,605,1876,...,2919,4301,633,2360,2934,4316,604,2579,2922,4489
2,brown-fox_0003,244.9,0.14,AW,5,Olive,3,6200,747,1621,...,2626,3666,695,1695,2951,3581,534,1347,2832,3498
3,brown-fox_0004,217.6,0.168,AA,1,Red,4,7250,791,1163,...,2938,4262,823,1186,2832,4137,665,1251,2738,4138
4,brown-fox_0005,223.6,0.06,AH,3,Black,5,6200,561,2113,...,2641,3637,651,1404,2611,3539,543,1247,2623,3525
5,brown-fox_0006,231.0,0.09,OW,14,Maroon,6,6200,546,1548,...,2619,3928,518,1289,2587,3939,492,1153,2507,3993
6,brown-fox_0007,233.4,0.048,ER,9,Lime,7,6800,456,1561,...,2990,4342,457,1837,3082,4539,443,2028,3082,4613
7,brown-fox_0008,222.6,0.06,AH,3,Black,8,7550,559,2289,...,3255,4347,564,1027,3315,4282,479,986,3437,4343
8,brown-fox_0009,220.0,0.138,EY,10,Purple,9,6800,599,2218,...,3078,4349,437,2678,2975,4502,433,2342,2950,4516
9,brown-fox_0010,227.5,0.14,IY,13,Pink,10,6200,433,2517,...,3076,4474,420,2710,2951,4467,353,1660,2870,4142


In `aggregated_data.csv`,
- 'file' refers to the vowel sound file extracted by FastTrack
- 'f0' is the pitch
- 'duration' is the length of the vowel
- 'label' is which vowel, using whatever label was in the TextGrid used to extract the vowels
- 'group' is which vowel again, with a different number corresponding to each vowel type
- 'color' is the color FastTrack used in plotting the vowels
- 'number' is the number of the vowel, with a different number for each vowel extracted
- 'cutoff' is the point above which FastTrack stopped looking for formants

The rest of the columns are the formant measurements. FastTrack by default measures the formants of a vowel at five evently spaced points along the duration (called "temporal bins"), so there are five measurements of each formant. 

The first number after 'f' is the formant number, i.e. whether it's F1, F2, F3, or F4. The second number is which temporal bin the measurement belongs to, i.e. whether it's the first, second, third, fourth, or fifth measurement along the vowel duration.

So 'f12' is the second measurement of the first formant, and 'f21' is the first measurement of the second formant.

### Combining multiple `aggregated_data.csv`s

If you have multiple related sound files that you've fed through FastTrack and you want one big data file with all of them, you can append them here.

First, import the other `aggregated_data.csv`s.

In [3]:
# Change the path so that it finds your 'aggregated_data.csv' file, wherever it is on your computer

data2 = pd.read_csv('/home/sage/College/phonlab/Tests/test1_output/processed_data/aggregated_data.csv')
data2

Unnamed: 0,file,f0,duration,label,group,color,number,cutoff,f11,f21,...,f33,f43,f14,f24,f34,f44,f15,f25,f35,f45
0,red-fox_0001,290.9,0.08,IH,11,Teal,1,7550,559,1821,...,2926,4370,552,2646,3005,4444,383,2492,2870,4515
1,red-fox_0002,246.3,0.12,EH,8,Black,2,7550,656,2269,...,2998,4361,730,2017,3090,4479,595,2011,3109,4462
2,red-fox_0003,179.0,0.18,AA,1,Red,3,7250,761,1174,...,3042,4113,836,1183,2976,4152,788,1288,2748,4045
3,red-fox_0004,232.0,0.038,AH,3,Black,4,6950,670,1437,...,2734,3672,612,1285,2772,3650,574,1278,2748,3632
4,red-fox_0005,246.8,0.19,OW,14,Maroon,5,5750,535,1596,...,2778,3794,512,1222,2696,3920,502,1224,2449,3922
5,red-fox_0006,240.9,0.16,ER,9,Lime,6,7250,546,1405,...,2338,4123,518,1731,2440,4108,467,1974,2948,4435
6,red-fox_0007,234.5,0.038,AH,3,Black,7,7400,461,1209,...,3197,4223,472,1123,3221,4226,473,1117,3224,4196
7,red-fox_0008,236.2,0.12,EY,10,Purple,8,7550,500,1834,...,3094,4374,446,2729,2980,4430,431,2434,2983,4437
8,red-fox_0009,247.0,0.068,IY,13,Pink,9,7400,417,2546,...,3082,4478,381,2502,3022,4353,388,2518,3562,4389
9,red-fox_0010,217.8,0.12,AW,5,Olive,10,5750,854,1785,...,2629,3604,839,1540,2705,3598,712,1482,2715,3412


Then, append the other csvs to the first with the cell below.

In [4]:
data = data.append(data2, ignore_index=True)
# # Repeat above for as many other csvs you have
# data = data.append(data3, ignore_index=True)
# data = data.append(data4, ignore_index=True)
# # etc...
data

Unnamed: 0,file,f0,duration,label,group,color,number,cutoff,f11,f21,...,f33,f43,f14,f24,f34,f44,f15,f25,f35,f45
0,brown-fox_0001,239.4,0.1,AH,3,Black,1,5750,505,1758,...,2741,4102,328,1820,2730,4033,363,1812,2923,4147
1,brown-fox_0002,312.0,0.07,IH,11,Teal,2,6050,605,1876,...,2919,4301,633,2360,2934,4316,604,2579,2922,4489
2,brown-fox_0003,244.9,0.14,AW,5,Olive,3,6200,747,1621,...,2626,3666,695,1695,2951,3581,534,1347,2832,3498
3,brown-fox_0004,217.6,0.168,AA,1,Red,4,7250,791,1163,...,2938,4262,823,1186,2832,4137,665,1251,2738,4138
4,brown-fox_0005,223.6,0.06,AH,3,Black,5,6200,561,2113,...,2641,3637,651,1404,2611,3539,543,1247,2623,3525
5,brown-fox_0006,231.0,0.09,OW,14,Maroon,6,6200,546,1548,...,2619,3928,518,1289,2587,3939,492,1153,2507,3993
6,brown-fox_0007,233.4,0.048,ER,9,Lime,7,6800,456,1561,...,2990,4342,457,1837,3082,4539,443,2028,3082,4613
7,brown-fox_0008,222.6,0.06,AH,3,Black,8,7550,559,2289,...,3255,4347,564,1027,3315,4282,479,986,3437,4343
8,brown-fox_0009,220.0,0.138,EY,10,Purple,9,6800,599,2218,...,3078,4349,437,2678,2975,4502,433,2342,2950,4516
9,brown-fox_0010,227.5,0.14,IY,13,Pink,10,6200,433,2517,...,3076,4474,420,2710,2951,4467,353,1660,2870,4142


### Cleaning up the data

Now, when working with your data, there may be times when you want to refer to all the rows belonging to a sound file that FastTrack extracted from. 

Right now the extracted-from sound file is entangled with the extracted sound files in the column 'file', but we can split that into two columns to make it easier on us.

**WARNING**
\
The next cell separates the 'file' column using the underscore as the dividing line, so it won't work well if the 'file' column of your 'aggregated_data.csv', which has the format `[file name]_[sound number]`, has more than one underscore. In other words, it won't work if there's an underscore in the .wav and .TextGrid files passed through FastTrack.

In [5]:
data.insert(1, 'sound', '')
data[['file', 'sound']] = data['file'].str.split('_',expand=True)
data

Unnamed: 0,file,sound,f0,duration,label,group,color,number,cutoff,f11,...,f33,f43,f14,f24,f34,f44,f15,f25,f35,f45
0,brown-fox,1,239.4,0.1,AH,3,Black,1,5750,505,...,2741,4102,328,1820,2730,4033,363,1812,2923,4147
1,brown-fox,2,312.0,0.07,IH,11,Teal,2,6050,605,...,2919,4301,633,2360,2934,4316,604,2579,2922,4489
2,brown-fox,3,244.9,0.14,AW,5,Olive,3,6200,747,...,2626,3666,695,1695,2951,3581,534,1347,2832,3498
3,brown-fox,4,217.6,0.168,AA,1,Red,4,7250,791,...,2938,4262,823,1186,2832,4137,665,1251,2738,4138
4,brown-fox,5,223.6,0.06,AH,3,Black,5,6200,561,...,2641,3637,651,1404,2611,3539,543,1247,2623,3525
5,brown-fox,6,231.0,0.09,OW,14,Maroon,6,6200,546,...,2619,3928,518,1289,2587,3939,492,1153,2507,3993
6,brown-fox,7,233.4,0.048,ER,9,Lime,7,6800,456,...,2990,4342,457,1837,3082,4539,443,2028,3082,4613
7,brown-fox,8,222.6,0.06,AH,3,Black,8,7550,559,...,3255,4347,564,1027,3315,4282,479,986,3437,4343
8,brown-fox,9,220.0,0.138,EY,10,Purple,9,6800,599,...,3078,4349,437,2678,2975,4502,433,2342,2950,4516
9,brown-fox,10,227.5,0.14,IY,13,Pink,10,6200,433,...,3076,4474,420,2710,2951,4467,353,1660,2870,4142


At this point, since 'group', 'color', and 'number' really only have to do with FastTrack's automatic vowel plotting and 'cutoff' isn't important to the vowels themselves (only to FastTrack's process), you can safely delete those columns (unless they would be helpful for plotting vowels again later).

In [6]:
del data['group']
del data['color']
del data['number']
del data['cutoff']
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f33,f43,f14,f24,f34,f44,f15,f25,f35,f45
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,2741,4102,328,1820,2730,4033,363,1812,2923,4147
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,2919,4301,633,2360,2934,4316,604,2579,2922,4489
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,2626,3666,695,1695,2951,3581,534,1347,2832,3498
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,2938,4262,823,1186,2832,4137,665,1251,2738,4138
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,2641,3637,651,1404,2611,3539,543,1247,2623,3525
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,2619,3928,518,1289,2587,3939,492,1153,2507,3993
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,2990,4342,457,1837,3082,4539,443,2028,3082,4613
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,3255,4347,564,1027,3315,4282,479,986,3437,4343
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,3078,4349,437,2678,2975,4502,433,2342,2950,4516
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,3076,4474,420,2710,2951,4467,353,1660,2870,4142


## Average vowel formants

To find a vowel's average formants, take the mean of the middle three temporal bins. In other words, 
- the average of F1 is the average of 'f12', 'f13', and 'f14'
- the average of F2 is the average of 'f22', 'f23', and 'f24'
- the average of F3 is the average of 'f32', 'f33', and 'f34'
- the average of F4 is the average of 'f42', 'f43', and 'f44'

In [7]:
data['f1_mean'] = round(data[['f12', 'f13', 'f14']].mean(axis=1), 0)
data['f2_mean'] = round(data[['f22', 'f23', 'f24']].mean(axis=1), 0)
data['f3_mean'] = round(data[['f32', 'f33', 'f34']].mean(axis=1), 0)
data['f4_mean'] = round(data[['f42', 'f43', 'f44']].mean(axis=1), 0)
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f34,f44,f15,f25,f35,f45,f1_mean,f2_mean,f3_mean,f4_mean
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,2730,4033,363,1812,2923,4147,414.0,1847.0,2727.0,4042.0
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,2934,4316,604,2579,2922,4489,627.0,2216.0,2907.0,4316.0
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,2951,3581,534,1347,2832,3498,810.0,1651.0,2653.0,3614.0
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,2832,4137,665,1251,2738,4138,831.0,1180.0,2913.0,4224.0
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,2611,3539,543,1247,2623,3525,661.0,1576.0,2648.0,3645.0
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,2587,3939,492,1153,2507,3993,527.0,1316.0,2650.0,3920.0
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,3082,4539,443,2028,3082,4613,480.0,1871.0,2917.0,4378.0
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,3315,4282,479,986,3437,4343,570.0,1110.0,3279.0,4330.0
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,2975,4502,433,2342,2950,4516,457.0,2634.0,3048.0,4392.0
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,2951,4467,353,1660,2870,4142,421.0,2756.0,3033.0,4444.0


## Vocal tract length

Another value that FastTrack doesn't give us that we want is vocal tract length.

Vocal tract length is important to vowels because how long or short the vocal tract is changes which harmonics will resonate and become formants, in the same way that tubes of different sizes produce different sounds, or glasses with different amounts of liquid make different sounds.

Since men tend to have longer vocal tracts and women tend to have shorter vocal tracts, vocal tract length is especially important to figure out so you can guess how participants will react to voices.

### Equations

We can basically treat the vocal tract like a tube that's closed at one end (the glottis) during voiced sounds, so we can use equations designed for closed-at-one-end tubes.

The equation for resonances in a tube that's closed at one end is $\boxed{f_n = \frac{(2n-1)c}{4L}}$, with the following variables:
- $f$ = frequency (Hz)
- $n$ = resonance number (e.g. 1=first formant, 2=second formant, etc.)
- $c$ = speed of sound ($35,000^{cm/s}$ in warm moist air)
- $L$ = vocal tract length (cm)

Solving for length, the equation is $\boxed{L = \frac{(2n-1)c}{4*f_n}}$

These equations can be turned into Python commands that let you enter values to find the missing variable. Those commands are defined in the two cells below.

In [8]:
def vt_formant(n, l, c=35000):
    return (((2 * n) - 1) * c) / (4 * l)

In [9]:
def vt_length(n, f, c=35000):
    return (((2 * n) - 1) * c) / (4 * f)

**Given the formant number and the vocal tract length (cm) in the parentheses, `vt_formant()` returns the frequency of the formant.**

So to find the first formant of a vocal tract that is 18cm long, you run the command `vt_formant(1, 18)`

In [10]:
vt_formant(1, 18)

486.1111111111111

**Given the formant number and the frequency of a resonance in the parentheses, `vt_length()` returns the length of the vocal tract in cm.**

So to find the length of a vocal tract whose first formant is 722Hz, you run the command `vt_length(1, 722)`

In [11]:
vt_length(1, 722)

12.119113573407203

**To find frequency or length of a vocal tract in a different air density, temperature, or humidity, set the value of $c$ in the function for the speed of sound in that air density, temperature, or humidity.**

[Speed of sound in air at different temperatures and humidities](https://www.engineeringtoolbox.com/air-speed-sound-d_603.html)

For example, if we were taking measurements of mutants with ice-powers who could handle air at the temperature of freezing in your lungs while talking ($c = 33,150^{cm/s}$), we would get slightly different results for the same equations in [10] and [11].

In [12]:
vt_formant(1, 18, c=33150)

460.4166666666667

In [13]:
vt_length(1, 722, c=33150)

11.478531855955678

### Finding vocal tract lengths in the data

Human vocal tracts are not perfect tubes, however. By changing the shape of our vocal tract, we can get different resonances. That's how we produce different vowels.

In vowel production, F1 correlates inversely with vowel height and F2 correlates directly with vowel frontness. Because of this, those formants are not reliable indicators of vocal tract length.

So for finding vocal tract length in our data, in the end we want two measures: (1) the average length of all the formants of a vowel, and (2) the length of F4, because F4 is the least affected by vowel quality.

First, we get the vocal tract length of every mean formant for each vowel.

In [14]:
data['length_f1'] = vt_length(1, data['f1_mean'])
data['length_f2'] = vt_length(2, data['f2_mean'])
data['length_f3'] = vt_length(3, data['f3_mean'])
data['length_f4'] = vt_length(4, data['f4_mean'])
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f35,f45,f1_mean,f2_mean,f3_mean,f4_mean,length_f1,length_f2,length_f3,length_f4
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,2923,4147,414.0,1847.0,2727.0,4042.0,21.135266,14.212236,16.043271,15.153389
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,2922,4489,627.0,2216.0,2907.0,4316.0,13.955343,11.845668,15.04988,14.191381
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,2832,3498,810.0,1651.0,2653.0,3614.0,10.802469,15.899455,16.490765,16.94798
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,2738,4138,831.0,1180.0,2913.0,4224.0,10.529483,22.245763,15.018881,14.500473
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,2623,3525,661.0,1576.0,2648.0,3645.0,13.237519,16.656091,16.521903,16.803841
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,2507,3993,527.0,1316.0,2650.0,3920.0,16.603416,19.946809,16.509434,15.625
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,3082,4613,480.0,1871.0,2917.0,4378.0,18.229167,14.029931,14.998286,13.990407
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,3437,4343,570.0,1110.0,3279.0,4330.0,15.350877,23.648649,13.342482,14.145497
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,2950,4516,457.0,2634.0,3048.0,4392.0,19.146608,9.965831,14.353675,13.945811
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,2870,4142,421.0,2756.0,3033.0,4444.0,20.783848,9.524673,14.424662,13.782628


Now we get the average vocal tract length.

In [15]:
data['length_mean'] = data[['length_f1', 'length_f2', 'length_f3', 'length_f4']].mean(axis=1)
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f45,f1_mean,f2_mean,f3_mean,f4_mean,length_f1,length_f2,length_f3,length_f4,length_mean
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,4147,414.0,1847.0,2727.0,4042.0,21.135266,14.212236,16.043271,15.153389,16.636041
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,4489,627.0,2216.0,2907.0,4316.0,13.955343,11.845668,15.04988,14.191381,13.760568
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,3498,810.0,1651.0,2653.0,3614.0,10.802469,15.899455,16.490765,16.94798,15.035167
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,4138,831.0,1180.0,2913.0,4224.0,10.529483,22.245763,15.018881,14.500473,15.57365
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,3525,661.0,1576.0,2648.0,3645.0,13.237519,16.656091,16.521903,16.803841,15.804839
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,3993,527.0,1316.0,2650.0,3920.0,16.603416,19.946809,16.509434,15.625,17.171165
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,4613,480.0,1871.0,2917.0,4378.0,18.229167,14.029931,14.998286,13.990407,15.311947
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,4343,570.0,1110.0,3279.0,4330.0,15.350877,23.648649,13.342482,14.145497,16.621876
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,4516,457.0,2634.0,3048.0,4392.0,19.146608,9.965831,14.353675,13.945811,14.352981
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,4142,421.0,2756.0,3033.0,4444.0,20.783848,9.524673,14.424662,13.782628,14.628953


Now we delete the unneeded vocal tract lengths for F1, F2, and F3.

In [16]:
del data['length_f1']
del data['length_f2']
del data['length_f3']
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f15,f25,f35,f45,f1_mean,f2_mean,f3_mean,f4_mean,length_f4,length_mean
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,363,1812,2923,4147,414.0,1847.0,2727.0,4042.0,15.153389,16.636041
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,604,2579,2922,4489,627.0,2216.0,2907.0,4316.0,14.191381,13.760568
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,534,1347,2832,3498,810.0,1651.0,2653.0,3614.0,16.94798,15.035167
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,665,1251,2738,4138,831.0,1180.0,2913.0,4224.0,14.500473,15.57365
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,543,1247,2623,3525,661.0,1576.0,2648.0,3645.0,16.803841,15.804839
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,492,1153,2507,3993,527.0,1316.0,2650.0,3920.0,15.625,17.171165
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,443,2028,3082,4613,480.0,1871.0,2917.0,4378.0,13.990407,15.311947
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,479,986,3437,4343,570.0,1110.0,3279.0,4330.0,14.145497,16.621876
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,433,2342,2950,4516,457.0,2634.0,3048.0,4392.0,13.945811,14.352981
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,353,1660,2870,4142,421.0,2756.0,3033.0,4444.0,13.782628,14.628953


Finally, to make the data easier to read, we round the length values to two decimal points.

In [17]:
data['length_f4'] = round(data['length_f4'], 2)
data['length_mean'] = round(data['length_mean'], 2)
data

Unnamed: 0,file,sound,f0,duration,label,f11,f21,f31,f41,f12,...,f15,f25,f35,f45,f1_mean,f2_mean,f3_mean,f4_mean,length_f4,length_mean
0,brown-fox,1,239.4,0.1,AH,505,1758,2876,4119,486,...,363,1812,2923,4147,414.0,1847.0,2727.0,4042.0,15.15,16.64
1,brown-fox,2,312.0,0.07,IH,605,1876,2928,4234,612,...,604,2579,2922,4489,627.0,2216.0,2907.0,4316.0,14.19,13.76
2,brown-fox,3,244.9,0.14,AW,747,1621,2329,3733,822,...,534,1347,2832,3498,810.0,1651.0,2653.0,3614.0,16.95,15.04
3,brown-fox,4,217.6,0.168,AA,791,1163,2832,4228,822,...,665,1251,2738,4138,831.0,1180.0,2913.0,4224.0,14.5,15.57
4,brown-fox,5,223.6,0.06,AH,561,2113,2724,3874,650,...,543,1247,2623,3525,661.0,1576.0,2648.0,3645.0,16.8,15.8
5,brown-fox,6,231.0,0.09,OW,546,1548,2863,4085,538,...,492,1153,2507,3993,527.0,1316.0,2650.0,3920.0,15.62,17.17
6,brown-fox,7,233.4,0.048,ER,456,1561,2484,4133,499,...,443,2028,3082,4613,480.0,1871.0,2917.0,4378.0,13.99,15.31
7,brown-fox,8,222.6,0.06,AH,559,2289,3161,4496,579,...,479,986,3437,4343,570.0,1110.0,3279.0,4330.0,14.15,16.62
8,brown-fox,9,220.0,0.138,EY,599,2218,3155,4383,492,...,433,2342,2950,4516,457.0,2634.0,3048.0,4392.0,13.95,14.35
9,brown-fox,10,227.5,0.14,IY,433,2517,3046,4408,428,...,353,1660,2870,4142,421.0,2756.0,3033.0,4444.0,13.78,14.63


## Exporting the data

To export the DataFrame here to a .csv file on your computer, use the command below. If you want the organized data to show up wherever you've put this script, delete `'insert_path_here/'`. If you want the organized data to show up somewhere else, replace `'insert_path_here/'` with the path to wherever you want the data.

In [18]:
# Change the path and/or file name as necessary
data.to_csv(path_or_buf='insert_path_here/organized_data.csv')