# Adding rows and columns

In this chapter we'll learn how to add rows and columns to an existing dataframe. First we'll read in some dataframes.

In [1]:
import pandas as pd
from audiolabel import read_label

In [2]:
flist = ['resource/two_plus_two_1.tg', 'resource/three_plus_five_1.tg']
[phdf0, wddf0] = read_label(flist[0], 'praat', addcols=['barename'], ignore_index=False)
[phdf1, wddf1] = read_label(flist[1], 'praat', addcols=['barename'], ignore_index=False)
phdf0.head()

Unnamed: 0,t1,t2,label,barename,fname
0,0.0125,0.3417,T,two_plus_two_1,resource/two_plus_two_1.tg
1,0.3417,0.4914,UW1,two_plus_two_1,resource/two_plus_two_1.tg
2,0.4914,0.5912,P,two_plus_two_1,resource/two_plus_two_1.tg
3,0.5912,0.6211,L,two_plus_two_1,resource/two_plus_two_1.tg
4,0.6211,0.6909,AH1,two_plus_two_1,resource/two_plus_two_1.tg


In [3]:
phdf1.head()

Unnamed: 0,t1,t2,label,barename,fname
0,0.0125,0.1222,TH,three_plus_five_1,resource/three_plus_five_1.tg
1,0.1222,0.222,R,three_plus_five_1,resource/three_plus_five_1.tg
2,0.222,0.4116,IY1,three_plus_five_1,resource/three_plus_five_1.tg
3,0.4116,0.5113,P,three_plus_five_1,resource/three_plus_five_1.tg
4,0.5113,0.5512,L,three_plus_five_1,resource/three_plus_five_1.tg


# Adding rows

You can combine the rows of two or more dataframes that have matching columns. Think of this as combining observations of the same kind (i.e. that are described with the same variables).

There are (at least) two ways to add rows. The first way appends rows from one dataframe to an existing dataframe, and the second way concatenates a list of dataframes.

## adding rows with `append()`

If you want to add the rows from one dataframe to another, use `append()`.

In [4]:
adf = wddf0.append(wddf1)
adf

Unnamed: 0,t1,t2,label,barename,fname
0,0.0125,0.4914,TWO,two_plus_two_1,resource/two_plus_two_1.tg
1,0.4914,0.8805,PLUS,two_plus_two_1,resource/two_plus_two_1.tg
2,0.8805,1.3195,TWO,two_plus_two_1,resource/two_plus_two_1.tg
3,1.3195,1.3594,sp,two_plus_two_1,resource/two_plus_two_1.tg
4,1.3594,1.7585,EQUALS,two_plus_two_1,resource/two_plus_two_1.tg
5,1.7585,1.8283,sp,two_plus_two_1,resource/two_plus_two_1.tg
6,1.8283,2.1975,FOUR,two_plus_two_1,resource/two_plus_two_1.tg
0,0.0125,0.4116,THREE,three_plus_five_1,resource/three_plus_five_1.tg
1,0.4116,0.8107,PLUS,three_plus_five_1,resource/three_plus_five_1.tg
2,0.8107,1.2696,FIVE,three_plus_five_1,resource/three_plus_five_1.tg


## adding rows with `pd.concat()`

Use `pd.concat()` to combine a list of dataframes. Note that `pd.concat()` is not limited to combining only two dataframes at a time. In fact, it is more efficient to use `pd.concat()` on a list of dataframes than it is to iteratively use `append()` (or `pd.concat()`) to add one dataframe at a time.

In [5]:
phwddf = pd.concat([wddf0, wddf1, phdf0, phdf1])
phwddf.head()

Unnamed: 0,t1,t2,label,barename,fname
0,0.0125,0.4914,TWO,two_plus_two_1,resource/two_plus_two_1.tg
1,0.4914,0.8805,PLUS,two_plus_two_1,resource/two_plus_two_1.tg
2,0.8805,1.3195,TWO,two_plus_two_1,resource/two_plus_two_1.tg
3,1.3195,1.3594,sp,two_plus_two_1,resource/two_plus_two_1.tg
4,1.3594,1.7585,EQUALS,two_plus_two_1,resource/two_plus_two_1.tg


Note an important syntactic difference between `pd.concat()` and `append()`. `append()` is an instance method, which means we call it on a specific dataframe object. This object is not modified directly, and we preserve the modification by assigning the modified dataframe to a variable. `pd.concat()` is a class method, which means we do not call it on a specific dataframe. If you explore the available methods for the `phdf0` dataframe via tab completion in the next cell you will find `append()` but not `concat()`.

In [None]:
phdf0.

# Adding columns

There are (at least) two ways to add columns to an existing dataframe, depending on what kind of values you want to add.

Think of adding columns to a dataframe as modifying the kind of observations by adding new variables.

## adding columns with `assign()`

You can use `assign()` to add columns to a dataframe from a single value or series. The `assign()` method takes its `key=value` arguments and uses the keys for the new column names and the values for the column values.

If the value provided is a scalar, that value is repeated for each row of the dataframe. This process is known as broadcasting.

In [6]:
phdf0 = phdf0.assign(subject=1, context='spooky')
phdf0.tail()

Unnamed: 0,t1,t2,label,barename,fname,context,subject
14,1.6488,1.7585,Z,two_plus_two_1,resource/two_plus_two_1.tg,spooky,1
15,1.7585,1.8283,sp,two_plus_two_1,resource/two_plus_two_1.tg,spooky,1
16,1.8283,1.8683,F,two_plus_two_1,resource/two_plus_two_1.tg,spooky,1
17,1.8683,2.0678,AO1,two_plus_two_1,resource/two_plus_two_1.tg,spooky,1
18,2.0678,2.1975,R,two_plus_two_1,resource/two_plus_two_1.tg,spooky,1


Like `append()`, `assign()` is an instance method, and we call it on a specific dataframe object. If you explore the available methods for the `phdf0` dataframe via tab completion in the next cell you will find `assign()` but not the class method `concat()`.

In [None]:
phdf0.

We can also use `assign()` to overwrite existing columns rather than adding new columns.

In [7]:
phdf0 = phdf0.assign(subject=2, context='creepy')
phdf0.tail()

Unnamed: 0,t1,t2,label,barename,fname,context,subject
14,1.6488,1.7585,Z,two_plus_two_1,resource/two_plus_two_1.tg,creepy,2
15,1.7585,1.8283,sp,two_plus_two_1,resource/two_plus_two_1.tg,creepy,2
16,1.8283,1.8683,F,two_plus_two_1,resource/two_plus_two_1.tg,creepy,2
17,1.8683,2.0678,AO1,two_plus_two_1,resource/two_plus_two_1.tg,creepy,2
18,2.0678,2.1975,R,two_plus_two_1,resource/two_plus_two_1.tg,creepy,2


The value assigned can also be a list or `pd.Series` that is the same length as the existing dataframe.

In [8]:
nrows = len(wddf0)
newcol = range(nrows)
list(newcol)

[0, 1, 2, 3, 4, 5, 6]

In [9]:
wddf0 = wddf0.assign(cnt=newcol)
wddf0

Unnamed: 0,t1,t2,label,barename,fname,cnt
0,0.0125,0.4914,TWO,two_plus_two_1,resource/two_plus_two_1.tg,0
1,0.4914,0.8805,PLUS,two_plus_two_1,resource/two_plus_two_1.tg,1
2,0.8805,1.3195,TWO,two_plus_two_1,resource/two_plus_two_1.tg,2
3,1.3195,1.3594,sp,two_plus_two_1,resource/two_plus_two_1.tg,3
4,1.3594,1.7585,EQUALS,two_plus_two_1,resource/two_plus_two_1.tg,4
5,1.7585,1.8283,sp,two_plus_two_1,resource/two_plus_two_1.tg,5
6,1.8283,2.1975,FOUR,two_plus_two_1,resource/two_plus_two_1.tg,6


## adding columns with `pd.concat(axis=1)`

If the columns you want to add are dataframes rather than scalar values or a simple series of values, then use `pd.concat()`.

We already saw `pd.concat()` used to combine dataframe rows. By default `pd.concat()` operates on `axis=0`, which means row concatenation. If we specify `axis=1` then we get concatentation of columns.

Here is a new dataframe that we will `pd.concat()` with `wddf1`.

In [10]:
newdf = pd.DataFrame({'cnt': range(len(wddf1)), 'context': 'weird'})
newdf

Unnamed: 0,cnt,context
0,0,weird
1,1,weird
2,2,weird
3,3,weird
4,4,weird
5,5,weird
6,6,weird


In [11]:
pd.concat([wddf1, newdf], axis=1)

Unnamed: 0,t1,t2,label,barename,fname,cnt,context
0,0.0125,0.4116,THREE,three_plus_five_1,resource/three_plus_five_1.tg,0,weird
1,0.4116,0.8107,PLUS,three_plus_five_1,resource/three_plus_five_1.tg,1,weird
2,0.8107,1.2696,FIVE,three_plus_five_1,resource/three_plus_five_1.tg,2,weird
3,1.2696,1.4592,sp,three_plus_five_1,resource/three_plus_five_1.tg,3,weird
4,1.4592,1.8583,EQUALS,three_plus_five_1,resource/three_plus_five_1.tg,4,weird
5,1.8583,2.2274,EIGHT,three_plus_five_1,resource/three_plus_five_1.tg,5,weird
6,2.2274,2.5966,sp,three_plus_five_1,resource/three_plus_five_1.tg,6,weird


The previous cell shows the result of the concatenation but does not save it to a variable. This is intentional so that we can demonstrate the effect of the indexes on `pd.concat()` in the next section.

# Ignoring the index

We haven't used indexes much yet, and you need to pay attention to them when you use `assign()` or `pd.concat()` with a `pd.DataFrame` or `pd.Series` object since they provide an index. By default the index will be used in combining operations with these objects, and if your index is not meaningful, then you might not get the result you intend.

In this section we'll learn how to avoid using the index when it is not meaningful for your operation.

Let's start by making a copy of `newdf` and altering its index so that the first label is `2` rather than `0`.

In [12]:
twodf = newdf.copy()
twodf.index = twodf.index + 2
twodf

Unnamed: 0,cnt,context
2,0,weird
3,1,weird
4,2,weird
5,3,weird
6,4,weird
7,5,weird
8,6,weird


Observe what happens when we `assign()` the columns of `twodf` to `wddf1`. In this and following examples we do not save the return value in order to leave the original dataframe unaltered for future examples.

In [13]:
wddf1.assign(cnt=twodf.cnt, context=twodf.context)

Unnamed: 0,t1,t2,label,barename,fname,cnt,context
0,0.0125,0.4116,THREE,three_plus_five_1,resource/three_plus_five_1.tg,,
1,0.4116,0.8107,PLUS,three_plus_five_1,resource/three_plus_five_1.tg,,
2,0.8107,1.2696,FIVE,three_plus_five_1,resource/three_plus_five_1.tg,0.0,weird
3,1.2696,1.4592,sp,three_plus_five_1,resource/three_plus_five_1.tg,1.0,weird
4,1.4592,1.8583,EQUALS,three_plus_five_1,resource/three_plus_five_1.tg,2.0,weird
5,1.8583,2.2274,EIGHT,three_plus_five_1,resource/three_plus_five_1.tg,3.0,weird
6,2.2274,2.5966,sp,three_plus_five_1,resource/three_plus_five_1.tg,4.0,weird


The rows are matched by index, which means the `cnt` and `context` from the first row in `twodf` match the third row in `wddf1`. The index of the first two rows of `wddf1` have no match in the `twodf` index, and the new columns of these rows receive the null value `NaN`.

If the column you `assign()` does not have an index, then the column values are assigned to the dataframe rows in order. **To avoid using a `pd.Series` index, simply use its `values` attribute when you `assign()`.** The `values` attribute returns a numpy ndarray, which does not have an index.

In [14]:
wddf1.assign(cnt=twodf.cnt.values, context=twodf.context.values)

Unnamed: 0,t1,t2,label,barename,fname,cnt,context
0,0.0125,0.4116,THREE,three_plus_five_1,resource/three_plus_five_1.tg,0,weird
1,0.4116,0.8107,PLUS,three_plus_five_1,resource/three_plus_five_1.tg,1,weird
2,0.8107,1.2696,FIVE,three_plus_five_1,resource/three_plus_five_1.tg,2,weird
3,1.2696,1.4592,sp,three_plus_five_1,resource/three_plus_five_1.tg,3,weird
4,1.4592,1.8583,EQUALS,three_plus_five_1,resource/three_plus_five_1.tg,4,weird
5,1.8583,2.2274,EIGHT,three_plus_five_1,resource/three_plus_five_1.tg,5,weird
6,2.2274,2.5966,sp,three_plus_five_1,resource/three_plus_five_1.tg,6,weird


When you use `pd.concat()` you have to do things a little differently. You can't just use the `values` attribute because `pd.concat()` requires dataframe inputs. Instead, you reset the index so that it starts at `0` for each of the input dataframes. Since the dataframes should have the same number of rows, this means the rows will have corresponding index numbers from top to bottom, and these rows will be combined.

First, look at `twodf` with its index starting at `2`.

In [15]:
twodf

Unnamed: 0,cnt,context
2,0,weird
3,1,weird
4,2,weird
5,3,weird
6,4,weird
7,5,weird
8,6,weird


Calling `reset_index` makes the index start at `0`. The `drop=True` parameter prevents `reset_index` from creating an extra column named `index` that preserves the old index.

In [16]:
twodf.reset_index(drop=True)

Unnamed: 0,cnt,context
0,0,weird
1,1,weird
2,2,weird
3,3,weird
4,4,weird
5,5,weird
6,6,weird


With `reset_index()`, the rows from both dataframes combine from top to bottom. The `wddf1` index already starts at `0`, but we'll call `reset_index()` on it anyway just to be sure.

In [17]:
pd.concat([
    wddf1.reset_index(drop=True),
    twodf.reset_index(drop=True)
], axis=1)

Unnamed: 0,t1,t2,label,barename,fname,cnt,context
0,0.0125,0.4116,THREE,three_plus_five_1,resource/three_plus_five_1.tg,0,weird
1,0.4116,0.8107,PLUS,three_plus_five_1,resource/three_plus_five_1.tg,1,weird
2,0.8107,1.2696,FIVE,three_plus_five_1,resource/three_plus_five_1.tg,2,weird
3,1.2696,1.4592,sp,three_plus_five_1,resource/three_plus_five_1.tg,3,weird
4,1.4592,1.8583,EQUALS,three_plus_five_1,resource/three_plus_five_1.tg,4,weird
5,1.8583,2.2274,EIGHT,three_plus_five_1,resource/three_plus_five_1.tg,5,weird
6,2.2274,2.5966,sp,three_plus_five_1,resource/three_plus_five_1.tg,6,weird
