## Concatenating columns rather than rows

As noted earlier, `pd.concat()` is more powerful in how it combine inputs. 

Consider this example, where we have different data about the same participants, in different files. One file contains participants' birthday month, and the other their age. What we want is to end up with a DataFrame with one row per participant, with columns for participant number, `Fav Color` and `Brithday Month`. However, when we read in the two input files and concatenate them, we get a column for colour and a column for month, with lots of NaN values in each because each input file had different column names, but we've stacked the rows of the inputs:

In [15]:
fav_colour = pd.read_csv('fav_colour.csv')
birthday_month = pd.read_csv('birthday_months.csv')

df = pd.concat([fav_colour, birthday_month])

df

Unnamed: 0,Participant num,Fav Colour,Birthday Month
0,1,blue,
1,2,red,
2,3,green,
3,4,purple,
4,5,red,
5,6,green,
6,7,orange,
7,8,yellow,
8,9,yellow,
9,10,pink,


You can see above that there's also a `Participant num` column, which indicates how we can match colours to months. What we actually want is to combine the two inputs "horizontally", such that we have 10 rows (one for each participant), with the colour and month corresponding to each participant in the same row. 

The default when concatenating dataframes is to do so vertically, as we saw above. However, `pd.concat()` allows us to concatenate horizontally as well. To do this, you must specify either `axis=1`, or `axis=columns`. Note in the example below, the rows with identical indices get combined when concatenated.

In [16]:
df = pd.concat([fav_colour, birthday_month], axis=1)
df

Unnamed: 0,Participant num,Fav Colour,Participant num.1,Birthday Month
0,1,blue,1,may
1,2,red,2,june
2,3,green,3,january
3,4,purple,4,february
4,5,red,5,september
5,6,green,6,july
6,7,orange,7,may
7,8,yellow,8,may
8,9,yellow,9,august
9,10,pink,10,december


We're still not quite where we want to be, as we have two redundant `Participant num` columns. When concatenating, pandas plays it safe, and doesn't assume that two columns with the same name are redundant. One way to fix this is, when we load the data in the beginning, we make the index of each input DataFrame the `participant num` column. Since indexes are essentially row labels, making participant_num the index tells pandas that indeed, these two columns with the same name are actually the same thing.

In [17]:
fav_colour = pd.read_csv('fav_colour.csv', index_col='Participant num')
birthday_month = pd.read_csv('birthday_months.csv', index_col='Participant num')

df = pd.concat([fav_colour, birthday_month], axis=1)
df

Unnamed: 0_level_0,Fav Colour,Birthday Month
Participant num,Unnamed: 1_level_1,Unnamed: 2_level_1
1,blue,may
2,red,june
3,green,january
4,purple,february
5,red,september
6,green,july
7,orange,may
8,yellow,may
9,yellow,august
10,pink,december


Alternatively, we could make one of the `Participant num` columns the index after concatenation, but specifying the index when we read in the data is a safer way of doing things. This is because, it could happen that your data aren't in the same order in both data files (e.g., one data file might not be sorted by `Participant num`), or one file might have missing data. By making `Participant num` the index for each file before we concatenate them, we ensure that pandas matches the rows from each input based on its index. 

Importantly, this is a case where we would *not* want to include the `ignore_index=True` argument to `pd.concat()`, because the index is important and meaningful.