# How to capture the output of command line tools to Pandas DataFrames

Here's a general problem: How do you run a command line tool and use its output in a script?

One useful answer is to use Python's `subprocess` module to execute the command and then read its output into a Pandas DataFrame. We'll illustrate this approach using the `ifcformant` tool found in the Berkeley Phonetics Machine. `ifcformant` represents the simple case in which a command line tool creates text output directly to a text file or STDOUT.

For tools that create binary output see the command_line_tools_to_df_2 notebook.

TODO: link to part 2 notebook

## Overview

In order to successfully run a command line tool and capture its output it is necessary to understand a few concepts, each of which is covered in this notebook:

1. Basics of running a command line tool
1. Basics of running ***your*** command line tool
1. How to run a command line tool from Python
1. Capturing output data into a DataFrame
1. Combining multiple DataFrames

Feel free to skip a section if you think you already understand it. If something seems unclear in a later section, though, you might need to review a section you skipped.

## Basics of running a command line tool

The normal way to explore commands on the command line is to open a terminal and start typing commands. Because this is a notebook, not a terminal window, we'll use the `%%bash` cell magic and `!` line magic for running commands. Any notebook cell that begins with `%%bash` runs in a `bash` shell, and the commands that follow `%%bash` execute in the same way as they do at the command line. The `!` line magic is much the same but executes only a single line in a `bash` shell.

### Example commands

This example runs the `ls` command to list the contents of the current directory:

In [1]:
!ls

command_line_tools_to_df_1.ipynb


Options and arguments can be provided in the same way as they are at the command line. Here the combination of `-a` and `-l` options cause both regular and dotfiles files to be printed in detailed format.

In [2]:
!ls -al

total 224
drwxrwxr-x 3 ubuntu ubuntu   4096 Sep 25 18:40 .
drwxrwxr-x 5 ubuntu ubuntu   4096 Sep 21 13:46 ..
-rw-rw-r-- 1 ubuntu ubuntu 213667 Sep 25 18:40 command_line_tools_to_df_1.ipynb
drwxr-xr-x 2 ubuntu ubuntu   4096 Sep 25 10:23 .ipynb_checkpoints


This example runs `ls` twice, first without options, then with `-al`.

In [3]:
%%bash
ls
ls -al

command_line_tools_to_df_1.ipynb
total 224
drwxrwxr-x 3 ubuntu ubuntu   4096 Sep 25 18:40 .
drwxrwxr-x 5 ubuntu ubuntu   4096 Sep 21 13:46 ..
-rw-rw-r-- 1 ubuntu ubuntu 213667 Sep 25 18:40 command_line_tools_to_df_1.ipynb
drwxr-xr-x 2 ubuntu ubuntu   4096 Sep 25 10:23 .ipynb_checkpoints


### Piping the output of one command to the input of another

It is possible to combine commands into a series, where the output of one command is used as the input of another command. These series can be used to create more complicated behaviors from a set of simple tools. For example, the `echo` command simply outputs its arguments as a string. Notice in this example that the `\n` represents the newline character, with the '2' and '100' appearing on separate lines:

In [4]:
!echo '2\n100'

2
100


Now we'll make things slightly more complicated by piping (sending) `echo`'s output to the `sort` command, which sorts input lines and prints out the result. The pipe symbol `|` is used to link the output of the first command (aka STDOUT) to the input of the second (aka STDIN), and the two lines produced by `echo` become the input to `sort`:

In [5]:
!echo '2\n100' | sort

100
2


The lines are now printed in lexicographic (not numeric) order. To get numeric order we can use `-n`:

In [6]:
!echo '101\n2\n100' | sort -n

2
100
101


The concept of piping the output of one command into another command is a useful one that we will exploit in some of our examples.

## Basics of running ***your*** command line tool (`ifcformant`)

In order to use a command line tool effectively you should understand what it does. What kind of inputs does it require? What kind of outputs does it create? What options does it provide to tune its behavior? If you can't answer these questions you can't use the tool well.

There are a number of ways you can learn about most commands:

* Read the `man` page with `man <command>`.
* Read documentation found wherever the tool is hosted (e.g. github)
* Use `<command> --help` to see available options
* Read other documentation that might be available

Our illustration uses the `ifcformant` command, which performs formant analysis on an input audio file. This tool does not have a `man` page or public repository. It is [briefly described at the UC Berkeley PhonLab wiki](http://linguistics.berkeley.edu/plab/guestwiki/index.php?title=IFC_formant_tracker).

`ifcformant` also has a `--help` option:

In [7]:
%%bash
ifcformant --help


ifcformant - IFC formant tracker

ifcformant computes formants, rms, and f0 measures of an input audio file at
10ms intervals and prints a table of results.

The analysis input requires audio with a 12kHz sample rate, and ifcformant
invokes the sox utility to convert the input audio file to the correct rate.
Additional effects can be performed by sox during conversion with the
--effects parameter.

Usage:

    ifcformant --speaker=male|female|child [--output=outfile]
      [--fields=fieldlist] [--fpfmt=floating_point_format]
      [--timefmt=time_format] [--sep=separator] [--sox=soxcmd]
      [--effects=soxeffects] [--print-header] [--no-midpt]
      audiofile

    ifcformant -s male|female|child [-o outfile]
      [-f fieldlist] [-p floating_point_format] [-t time_format] [-S separator]
      [-x soxcmd] [-e soxeffects] [--print-header] [--no-midpt] audiofile

    ifcformant --version|-v

    ifcformant --help|-h

Required arguments:

  --speaker=male|female|child
  -s male|female|ch

It's always important to understand what kind of outputs your tools create. Execute `ifcformant` directly and observe what it produces. We include a value for `--speaker` and the input audio file, as required and described by the documentation.

In [8]:
%%bash
ifcformant --speaker male --print-header ../resources/two_plus_two.wav

sec	rms	f1	f2	f3	f4	f0
0.0050	96.5	264.6	1456.6	3314.5	3694.6	255.3
0.0150	60.1	259.2	1327.6	3376.1	3790.2	0.0
0.0250	49.5	271.7	1100.8	2774.3	3632.4	69.8
0.0350	49.5	249.0	1084.6	2689.8	3568.3	69.8
0.0450	75.5	234.6	1401.1	3068.3	3454.2	85.7
0.0550	429.1	327.8	1572.1	3163.7	3474.9	352.9
0.0650	817.1	359.9	1645.9	2800.7	3499.2	0.0
0.0750	1152.5	356.5	1617.0	2601.8	3450.1	0.0
0.0850	1152.5	350.3	1597.6	2460.5	3472.1	106.2
0.0950	1192.9	362.9	1595.5	2467.6	3526.0	78.4
0.1050	1286.3	352.8	1577.0	2439.2	3523.1	80.5
0.1150	1363.7	334.8	1641.5	2429.4	3512.3	79.6
0.1250	1363.7	329.7	1622.6	2476.5	3534.8	77.6
0.1350	1231.4	319.4	1638.6	2556.7	3551.9	78.2
0.1450	1099.3	343.2	1221.1	2352.4	3482.1	79.5
0.1550	1019.9	325.6	1310.1	2398.8	3558.2	79.3
0.1650	839.0	307.2	1616.7	2422.4	3564.1	80.5
0.1750	497.3	275.2	1659.6	2433.8	3522.7	78.7
0.1850	405.2	289.0	1511.5	2386.6	3410.9	77.6
0.1950	394.5	343.5	1279.6	2346.5	3431.3	0.0
0.2050	245.6	363.5	1394.9	2364.8	3507.0	149.2
0.2150	134.5	335.9	1588.0	24

As you can see, `ifcformant` creates a table of measurements at 10ms intervals. In addition to the required arguments we used the `--print-header` option to include the names of the columns as the first row of output.

The columns are tab-separated by default. See `ifcformant --help` if you want a different separator.

## How to run a command line tool from Python

Now that we understand how to run `ifcformant` at the command line we can explore how to run it from the context of a Python script (or notebook). The `subprocess` module provides several methods for running executables external to a script.

In [9]:
import subprocess

We'll use three of `subprocess`'s methods for running external commands, `check_output()`, `check_call()`, and `Popen()`. All methods require a list of command line arguments, beginning with the name of the command you want to run.

In the next cell we create an argument list that matches the arguments we used when we ran `ifcformant` as a `bash` command. The speaker type and filename are provided by variables.

In [10]:
speaker = 'male'
fname = '../resources/two_plus_two.wav'

ifcargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header',
    fname
]
ifcargs

['ifcformant',
 '--speaker',
 'male',
 '--print-header',
 '../resources/two_plus_two.wav']

### Run `ifcformant` with `check_output()`

The first way we'll run `ifcformant` is via `subprocess`'s `check_output()` method, which executes the command defined by `ifcargs` and returns its output as a series of bytes. Output is stored in `ifcout`. Note that `ifcout` contains the `bytes` type, as indicated by the 'b' prefix.

In [11]:
ifcout = subprocess.check_output(ifcargs)
print(ifcout)

b'sec\trms\tf1\tf2\tf3\tf4\tf0\n0.0050\t96.5\t264.6\t1456.6\t3314.5\t3694.6\t255.3\n0.0150\t60.1\t259.2\t1327.6\t3376.1\t3790.2\t0.0\n0.0250\t49.5\t271.7\t1100.8\t2774.3\t3632.4\t69.8\n0.0350\t49.5\t249.0\t1084.6\t2689.8\t3568.3\t69.8\n0.0450\t75.5\t234.6\t1401.1\t3068.3\t3454.2\t85.7\n0.0550\t429.1\t327.8\t1572.1\t3163.7\t3474.9\t352.9\n0.0650\t817.1\t359.9\t1645.9\t2800.7\t3499.2\t0.0\n0.0750\t1152.5\t356.5\t1617.0\t2601.8\t3450.1\t0.0\n0.0850\t1152.5\t350.3\t1597.6\t2460.5\t3472.1\t106.2\n0.0950\t1192.9\t362.9\t1595.5\t2467.6\t3526.0\t78.4\n0.1050\t1286.3\t352.8\t1577.0\t2439.2\t3523.1\t80.5\n0.1150\t1363.7\t334.8\t1641.5\t2429.4\t3512.3\t79.6\n0.1250\t1363.7\t329.7\t1622.6\t2476.5\t3534.8\t77.6\n0.1350\t1231.4\t319.4\t1638.6\t2556.7\t3551.9\t78.2\n0.1450\t1099.3\t343.2\t1221.1\t2352.4\t3482.1\t79.5\n0.1550\t1019.9\t325.6\t1310.1\t2398.8\t3558.2\t79.3\n0.1650\t839.0\t307.2\t1616.7\t2422.4\t3564.1\t80.5\n0.1750\t497.3\t275.2\t1659.6\t2433.8\t3522.7\t78.7\n0.1850\t405.2\t289.0\t1511.5

To make `ifcout` more useful for our purposes we can use the `decode()` method to interpret the series of `bytes` with an encoding and thereby convert it to type `str`. After conversion the `print()` results in the familiar tabular output.

In [12]:
ifcout = ifcout.decode('utf-8')
print(ifcout)

sec	rms	f1	f2	f3	f4	f0
0.0050	96.5	264.6	1456.6	3314.5	3694.6	255.3
0.0150	60.1	259.2	1327.6	3376.1	3790.2	0.0
0.0250	49.5	271.7	1100.8	2774.3	3632.4	69.8
0.0350	49.5	249.0	1084.6	2689.8	3568.3	69.8
0.0450	75.5	234.6	1401.1	3068.3	3454.2	85.7
0.0550	429.1	327.8	1572.1	3163.7	3474.9	352.9
0.0650	817.1	359.9	1645.9	2800.7	3499.2	0.0
0.0750	1152.5	356.5	1617.0	2601.8	3450.1	0.0
0.0850	1152.5	350.3	1597.6	2460.5	3472.1	106.2
0.0950	1192.9	362.9	1595.5	2467.6	3526.0	78.4
0.1050	1286.3	352.8	1577.0	2439.2	3523.1	80.5
0.1150	1363.7	334.8	1641.5	2429.4	3512.3	79.6
0.1250	1363.7	329.7	1622.6	2476.5	3534.8	77.6
0.1350	1231.4	319.4	1638.6	2556.7	3551.9	78.2
0.1450	1099.3	343.2	1221.1	2352.4	3482.1	79.5
0.1550	1019.9	325.6	1310.1	2398.8	3558.2	79.3
0.1650	839.0	307.2	1616.7	2422.4	3564.1	80.5
0.1750	497.3	275.2	1659.6	2433.8	3522.7	78.7
0.1850	405.2	289.0	1511.5	2386.6	3410.9	77.6
0.1950	394.5	343.5	1279.6	2346.5	3431.3	0.0
0.2050	245.6	363.5	1394.9	2364.8	3507.0	149.2
0.2150	134.5	335.9	1588.0	24

`check_output()` also raises an error if execution of the external command results in a failure. The try/except blocks print a useful error message if that occurs. Notice what happens when the argument to `ifcformant` names a file that doesn't exist.

In [13]:
badargs = ['ifcformant', '--speaker', 'male', 'missing_file']
try:
    ifcout = subprocess.check_output(badargs)
except subprocess.CalledProcessError:
    print('ifcformant failed with args ', badargs)

ifcformant failed with args  ['ifcformant', '--speaker', 'male', 'missing_file']


## Run `ifcformant` with `check_call()`

The `check_call()` method is similar to `check_output()` in that it waits for the external command to complete before continuing and raises an error if the command does not exit cleanly. Unlike `check_output()` it does not return the output.

If you need access to a command's output it does not make sense to use `check_call()` unless the command provides another way to get its output. In the case of `ifcformant` you can use the `--output` option to specify an output text file that you can read later. 

In [14]:
ccargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header',
    '--output',
    fname.replace('.wav', '.ifc'),
    fname
]
ccargs

['ifcformant',
 '--speaker',
 'male',
 '--print-header',
 '--output',
 '../resources/two_plus_two.ifc',
 '../resources/two_plus_two.wav']

The argument list now contains the `--output` option, and `ifcformant` output will be written to a file with the same name as the input audio file, with the `.wav` extension replaced by `.ifc`.

In [15]:
try:
    subprocess.check_call(ccargs)
except subprocess.CalledProcessError:
    print('ifcformant failed with args ', ccargs)

Running the command defined by `ccargs` should create a file in the `resources` directory that we can look at with the `head` external command to show the first few lines:

In [16]:
!head ../resources/two_plus_two.ifc

sec	rms	f1	f2	f3	f4	f0
0.0050	96.5	264.6	1456.6	3314.5	3694.6	255.3
0.0150	60.1	259.2	1327.6	3376.1	3790.2	0.0
0.0250	49.5	271.7	1100.8	2774.3	3632.4	69.8
0.0350	49.5	249.0	1084.6	2689.8	3568.3	69.8
0.0450	75.5	234.6	1401.1	3068.3	3454.2	85.7
0.0550	429.1	327.8	1572.1	3163.7	3474.9	352.9
0.0650	817.1	359.9	1645.9	2800.7	3499.2	0.0
0.0750	1152.5	356.5	1617.0	2601.8	3450.1	0.0
0.0850	1152.5	350.3	1597.6	2460.5	3472.1	106.2


## Run `ifcformant` with `Popen()`

The final way to run `ifcformant` that we'll look at uses the `Popen()` method. Unlike `check_output()` and `check_call()`, the `Popen()` method does not wait for the external command to finish executing. Since it does not wait for command completion it also cannot return command output directly. Instead it returns a handle to the running process.

In [17]:
ifcproc = subprocess.Popen(ifcargs)

To cause your script to wait for the command to complete, use `communicate()`. Since `communicate()` will not raise an error if the process does not exit cleanly, you should check the process's returncode to verify your command worked correctly. A non-zero returncode indicates an error condition.

In [18]:
ifcproc.communicate()
if ifcproc.returncode != 0:
    print('ifcformant failed with args ', ifcargs)

In this example the output data is lost, as it is not recorded in an ouput file, and it is not read into a variable. The reason you might use `Popen()` to run your command will become more apparent in the next section, where it can be used with a pipe to send data directly to a DataFrame.

## Capturing output data into a DataFrame

We've now seen several ways to execute `ifcformant`, each of which has different behavior with respect to where output data is written. Now we turn to the question of how to capture the output data into a DataFrame.

For commands that produce tabular data the answer is to use the `read_csv()` method. Often it reads from a real text file, but it can also read from any file-like object, which includes pipes and `StringIO` objects created from strings.

We'll look at how to combine the various `subprocess` options for running commands with the various inputs `read_csv()` can use to create a DataFrame. First we import our libraries.

In [19]:
import pandas as pd     # Provides DataFrame objects
from io import StringIO # Converts strings to file-like objects
                        # that pandas can read.   

### `check_call()` plus filename

The first approach we'll use is to combine `check_call()` with a call to `read_csv()` that reads output from a file named by a filename. The first step is to create an output file when running `ifcformant`. In this example the output file has the same name as the audio file, with the `.wav` extension replaced by `.ifc`.

In [20]:
outfile = fname.replace('.wav', '.ifc')
ccargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header',
    '--output',
    outfile,
    fname
]
try:
    subprocess.check_call(ccargs)
except subprocess.CalledProcessError:
    print('ifcformant failed with args ', ccargs)

If `check_call()` does not raise an error, then we can read the file it created with `read_csv()`. The `sep` option specifies the separator character.

In [21]:
cc_df = pd.read_csv(outfile, sep='\t')
cc_df

Unnamed: 0,sec,rms,f1,f2,f3,f4,f0
0,0.005,96.5,264.6,1456.6,3314.5,3694.6,255.3
1,0.015,60.1,259.2,1327.6,3376.1,3790.2,0.0
2,0.025,49.5,271.7,1100.8,2774.3,3632.4,69.8
3,0.035,49.5,249.0,1084.6,2689.8,3568.3,69.8
4,0.045,75.5,234.6,1401.1,3068.3,3454.2,85.7
5,0.055,429.1,327.8,1572.1,3163.7,3474.9,352.9
6,0.065,817.1,359.9,1645.9,2800.7,3499.2,0.0
7,0.075,1152.5,356.5,1617.0,2601.8,3450.1,0.0
8,0.085,1152.5,350.3,1597.6,2460.5,3472.1,106.2
9,0.095,1192.9,362.9,1595.5,2467.6,3526.0,78.4


### `check_output()` plus `StringIO`

The second approach we'll look at combines `check_output()` with `StringIO` input to `read_csv()`. Since we are going to capture the output of `ifcformant` into a variable, we do not include the `--output` argument.

In [22]:
coargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header',
    fname
]
try:
    ifcout = subprocess.check_output(coargs)
except subprocess.CalledProcessError:
    print('ifcformant failed with args ', coargs)

Recall that the output returned by `check_output()` is a series of bytes. The `read_csv()` method does not read bytes directly, and we must convert them to a file-like object. We do this by converting bytes to a string with `decode()`. The `StringIO` wrapper around the string makes it file-like for the needs of `read_csv()`.

In [23]:
co_df = pd.read_csv(
    StringIO(                  # make file-like
        ifcout.decode('utf-8') # convert to string; 'ascii' also works for ifcformant
    ),
    sep='\t'
)
co_df

Unnamed: 0,sec,rms,f1,f2,f3,f4,f0
0,0.005,96.5,264.6,1456.6,3314.5,3694.6,255.3
1,0.015,60.1,259.2,1327.6,3376.1,3790.2,0.0
2,0.025,49.5,271.7,1100.8,2774.3,3632.4,69.8
3,0.035,49.5,249.0,1084.6,2689.8,3568.3,69.8
4,0.045,75.5,234.6,1401.1,3068.3,3454.2,85.7
5,0.055,429.1,327.8,1572.1,3163.7,3474.9,352.9
6,0.065,817.1,359.9,1645.9,2800.7,3499.2,0.0
7,0.075,1152.5,356.5,1617.0,2601.8,3450.1,0.0
8,0.085,1152.5,350.3,1597.6,2460.5,3472.1,106.2
9,0.095,1192.9,362.9,1595.5,2467.6,3526.0,78.4


### `Popen()` plus pipe

The final way to perform our task is to use `Popen()` with a pipe that `read_csv()` can read. A pipe provides a file-like object like `StringIO` does. The argument list we use for this task is the same as for `check_output()`. The `stdout` argument to `Popen()` specifies that output should be sent to a pipe.

In [24]:
poargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header',
    fname
]

try:
    poproc = subprocess.Popen(poargs, stdout=subprocess.PIPE)
except subprocess.CalledProcessError:
    print('ifcformant failed with args ', poargs)

At this point `read_csv()` can read directly from STDOUT of the `ifcformant` subprocess.

In [25]:
po_df = pd.read_csv(poproc.stdout, sep='\t')
po_df

Unnamed: 0,sec,rms,f1,f2,f3,f4,f0
0,0.005,96.5,264.6,1456.6,3314.5,3694.6,255.3
1,0.015,60.1,259.2,1327.6,3376.1,3790.2,0.0
2,0.025,49.5,271.7,1100.8,2774.3,3632.4,69.8
3,0.035,49.5,249.0,1084.6,2689.8,3568.3,69.8
4,0.045,75.5,234.6,1401.1,3068.3,3454.2,85.7
5,0.055,429.1,327.8,1572.1,3163.7,3474.9,352.9
6,0.065,817.1,359.9,1645.9,2800.7,3499.2,0.0
7,0.075,1152.5,356.5,1617.0,2601.8,3450.1,0.0
8,0.085,1152.5,350.3,1597.6,2460.5,3472.1,106.2
9,0.095,1192.9,362.9,1595.5,2467.6,3526.0,78.4


As you can see, `read_csv()` reads until there is no more data available. While `ifcformant` has completed all the computation it will do, it is still running as a subprocess, and you should make sure that it completes correctly. To do this, use `communicate()`, which waits until `ifcformant` exits, after which you should check the process returncode to ensure that it exited without an error.

In [26]:
poproc.communicate()
if poproc.returncode != 0:
    print('ifcformant failed with args ', poargs)

### Which approach should you use?

We have seen three different ways to capture command output and read with `read_csv()`. Which one should you use?

The first decision you should make is whether you want to cache output files from your external command or not. These files potentially clutter your data directories and require hard disk space, which you might find undesirable.

On the other hand, cached output files can be extremely useful, especially if your external command takes a while to execute. Cached results can save you a lot of time if you need to
run your script multiple times on the same input datafiles (like during debugging!). You
can check for cached results in your script and read those if they exist rather than
rerunning the external command.

*Caveat*: you have to take some care that the cached results do not have stale data. For instance, if you cache analysis results and then change the analysis parameters used by your external command, then your cached files might not match your current intended analysis.

**If you want to cache your results and your external command provides an option
to create an output text file, then '`check_call()` plus filename' is the simple approach.**

 **If you want to cache your results and your external command does not provide an option
to create an output text file, then '`check_output()` plus `StringIO`' will help.** After
you save the command output to a variable you can write its content to a cache file yourself.
Alternatively, write the DataFrame you create to a file with the `to_csv()` method.

**If you do not wish to cache your results, then '`check_output()` plus `StringIO`' is the
simple and short approach.**

**If you do not wish to cache your results, then '`Popen()` plus pipe' is the less memory-intensive approach.** `read_csv()` reads directly from a pipe and no intermediate output variable is required. Use this approach if your external command produces a large amount of
output and memory is in issue.

## Combining multiple DataFrames

Now that we've learned how to collect a command output in a DataFrame, let's learn how to combine DataFrames produced from multiple input files.

The first step in the process is to add some metadata to our DataFrame that associates each row with its input file. Let's take a look at one of our existing DataFrames. Notice that the columns do not include any metadata that can be used to identify the rows other than 'sec'.

In [27]:
co_df.columns

Index(['sec', 'rms', 'f1', 'f2', 'f3', 'f4', 'f0'], dtype='object')

The `assign()` method can be used to add additional columns to the DataFrame. Provide the new column names as keyword arguments to `assign()`. Here we use a single value for each column, and this value is automatically repeated as necessary to fill all the rows in the column. This convenience is known in the pandas world as 'broadcasting'.

In [28]:
co_df = co_df.assign(filename=fname, speakertype=speaker)
co_df

Unnamed: 0,sec,rms,f1,f2,f3,f4,f0,filename,speakertype
0,0.005,96.5,264.6,1456.6,3314.5,3694.6,255.3,../resources/two_plus_two.wav,male
1,0.015,60.1,259.2,1327.6,3376.1,3790.2,0.0,../resources/two_plus_two.wav,male
2,0.025,49.5,271.7,1100.8,2774.3,3632.4,69.8,../resources/two_plus_two.wav,male
3,0.035,49.5,249.0,1084.6,2689.8,3568.3,69.8,../resources/two_plus_two.wav,male
4,0.045,75.5,234.6,1401.1,3068.3,3454.2,85.7,../resources/two_plus_two.wav,male
5,0.055,429.1,327.8,1572.1,3163.7,3474.9,352.9,../resources/two_plus_two.wav,male
6,0.065,817.1,359.9,1645.9,2800.7,3499.2,0.0,../resources/two_plus_two.wav,male
7,0.075,1152.5,356.5,1617.0,2601.8,3450.1,0.0,../resources/two_plus_two.wav,male
8,0.085,1152.5,350.3,1597.6,2460.5,3472.1,106.2,../resources/two_plus_two.wav,male
9,0.095,1192.9,362.9,1595.5,2467.6,3526.0,78.4,../resources/two_plus_two.wav,male


Now we can write a loop that runs `ifcformant` on multiple input files and collects their outputs in a single DataFrame in which each row is related to its input file. The `append()`
method does the work of adding new rows, and the `ignore_index` parameter tells `append()` not to use the index when combining the DataFrames.

In [29]:
fnames = ['../resources/three_plus_five.wav', '../resources/two_plus_two.wav']
speaker = 'male'
coargs = [
    'ifcformant',
    '--speaker',
    speaker,
    '--print-header'
    # Note filename is not included
]

df = pd.DataFrame()
for f in fnames:
    try:
        args = coargs + [fname]   # Add filename to args.
        ifcout = subprocess.check_output(args)
    except subprocess.CalledProcessError:
        print('ifcformant failed with args ', args)
    co_df = pd.read_csv(StringIO(ifcout.decode('utf-8')), sep='\t')
    co_df = co_df.assign(filename=f, speaker=speaker)
    df = df.append(co_df, ignore_index=True)
df

Unnamed: 0,sec,rms,f1,f2,f3,f4,f0,filename,speaker
0,0.005,96.5,264.6,1456.6,3314.5,3694.6,255.3,../resources/three_plus_five.wav,male
1,0.015,60.1,259.2,1327.6,3376.1,3790.2,0.0,../resources/three_plus_five.wav,male
2,0.025,49.5,271.7,1100.8,2774.3,3632.4,69.8,../resources/three_plus_five.wav,male
3,0.035,49.5,249.0,1084.6,2689.8,3568.3,69.8,../resources/three_plus_five.wav,male
4,0.045,75.5,234.6,1401.1,3068.3,3454.2,85.7,../resources/three_plus_five.wav,male
5,0.055,429.1,327.8,1572.1,3163.7,3474.9,352.9,../resources/three_plus_five.wav,male
6,0.065,817.1,359.9,1645.9,2800.7,3499.2,0.0,../resources/three_plus_five.wav,male
7,0.075,1152.5,356.5,1617.0,2601.8,3450.1,0.0,../resources/three_plus_five.wav,male
8,0.085,1152.5,350.3,1597.6,2460.5,3472.1,106.2,../resources/three_plus_five.wav,male
9,0.095,1192.9,362.9,1595.5,2467.6,3526.0,78.4,../resources/three_plus_five.wav,male
