# Combining Data

Most data analyses will use multiple different datasets or at least multiple datasets created from the same source. pandas has tools to combine DataFrames in a wide variety of ways.

## Concatenating Data

Concatenating data in pandas refers to stacking DataFrames either one on top of each other or side by side. The `pd.concat` function (NOT a method) is flexible and versatile with many different arguments that give you power to combine two ore more datasets at the same time.


### Concatenating very similar DataFrames

The `pd.concat` function provides many different and sometimes confusing arguments. We read in two small DataFrames with just three columns and three rows of each. We will use these small datasets to illustrate how the `concat` function works.

In [None]:
import pandas as pd
amzn = pd.read_csv('../data/stocks/amzn_sample.csv', parse_dates=['date'])
aapl = pd.read_csv('../data/stocks/aapl_sample.csv', parse_dates=['date'])

### Stacking data one on top of the other
The first argument for `concat` needs to be a list of DataFrames. As usual in Pandas, the default is to do the action vertically. We stack them with the following command:

In [None]:
pd.concat([amzn, aapl])

Notice that the index was kept the same. Use `ignore_index` to make a completely new `RangeIndex` from 0 to n-1.

In [None]:
pd.concat([amzn, aapl], ignore_index=True)

In [None]:
pd.concat([amzn, aapl], ignore_index=True)

### Label each piece of the DataFrame with the `keys` parameter
You can use the `keys` parameter to label each piece of the DataFrame. This creates a MultiLevel index.

In [None]:
pd.concat([amzn, aapl], keys=['amzn', 'aapl'])

### Perhaps its better to just make a new column beforehand

In [None]:
amzn['symbol'] = 'amzn'
aapl['symbol'] = 'aapl'
pd.concat([amzn, aapl])

## Beware! Automatic Alignment of Index
Of extreme importance to `pd.concat` (and all of pandas) is the automatic alignment of indexes that happens behind the scenes. For instance, let's change the second column of `amzn_head` and concatenate once again.

In [None]:
amzn2 = amzn.rename(columns={'Adj. Close': 'close'})
pd.concat([amzn2, aapl])

## Column names align first
`pd.concat` does automatic alignment on the columns and by default does an outer join. Notice the missing values where the misalignment is. We can force an `inner` join, where only the columns in common are kept.

In [None]:
pd.concat([amzn2, aapl], join='inner')

## Use `axis=1` to change the direction of concatenation
An automatic alignment on the index still happens here.

In [None]:
pd.concat([amzn2, aapl], axis=1)