# 2. Combining Data

### Objectives

+ Concatenate multiple DataFrames vertically and horizontally

### Resources
+ [Merge and concatenate](http://pandas.pydata.org/pandas-docs/stable/merging.html)

### Introduction
Most data analyses will use multiple different datasets or at least multiple datasets created from the same source. Pandas has tools to combine DataFrames in a wide variety of ways.

In [None]:
import pandas as pd

## Concatenating Data
[Concatenating data](http://pandas.pydata.org/pandas-docs/stable/merging.html) in Pandas refers to stacking DataFrames either one on top of each other or side by side. The **`pd.concat`** function is flexible and versatile with many different arguments that give you power to combine two ore more datasets at the same time.


### Concatenating very similar DataFrames
**`pd.concat`** provides many different and sometimes confusing arguments. We can use the IEX trading API to get some stock data from Amazon and Apple. We select just three columns and the first 5 rows of each. We will use these small datasets to illustrate how the `concat` function works.

In [None]:
url = 'https://api.iextrading.com/1.0/stock/{}/chart/5y'
cols = ['date', 'close', 'volume']
amzn = pd.read_json(url.format('amzn'))[cols]
aapl = pd.read_json(url.format('aapl'))[cols]

amzn_head = amzn.head()
aapl_head = aapl.head()

In [None]:
aapl_head

In [None]:
amzn_head

## Stacking data one on top of the other
The first argument for `concat` needs to be a list of DataFrames. As usual in Pandas, the default is to do the action vertically. We stack them with the following command:

In [None]:
pd.concat([amzn_head, aapl_head])

Notice that the index was kept the same. Use `ignore_index` to make a completely new `RangeIndex` from 0 to n-1.

In [None]:
pd.concat([amzn_head, aapl_head], ignore_index=True)

In [None]:
pd.concat([amzn_head, aapl_head], ignore_index=True)

### Label each piece of the DataFrame with the `keys` parameter
You can use the `keys` parameter to label each piece of the DataFrame. This creates a MultiLevel index.

In [None]:
pd.concat([amzn_head, aapl_head], keys=['amzn', 'aapl'])

### Perhaps its better to just make a new column beforehand

In [None]:
amzn_head['symbol'] = 'amzn'
aapl_head['symbol'] = 'aapl'
pd.concat([amzn_head, aapl_head])

## Beware! Automatic Alignment of Index
Of extreme importance to **`pd.concat`** (and all of pandas) is the automatic alignment of indexes that happens behind the scenes. For instance, let's change the second column of `amzn_head` and concatenate once again.

In [None]:
amzn_head2 = amzn_head.rename(columns={'close': 'closing price'})
pd.concat([amzn_head2, aapl_head])

## Column names align first
`pd.concat` does automatic alignment on the columns and by default does an outer join. Notice the missing values where the misalignment is. We can force an `inner` join, where only the columns in common are kept.

In [None]:
pd.concat([amzn_head2, aapl_head], join='inner')

## Use `axis=1` to change the direction of concatenation
An automatic alignment on the index still happens here

In [None]:
pd.concat([amzn_head, aapl_head], axis=1)