# Stacking data

The merge of two data sets using a common key in pandas (along with the DataFrame `merge` or `join` methods) can be thought of a "sideways" or left-right join. We typically are mashing together rows from one data set with another, using a common unique identifier. 

Another common need for combining data is "vertically", when we have two or more files with the same overall structure (ie common field names and values).

Such an operation is often referred to as "stacking" data, and it's quite common since many public agencies release data in yearly, quarterly, monthly and other time-oriented files.

As long as the file structures are identical, you can use the `pandas.concat` (for *concatenate*) function to combine data sets.

B

In [3]:
people = [
    {'name': 'John Doe', 'city': 'New York', 'state': 'NY', 'salary': 50000},
    {'name': 'Jane Smith', 'city': 'Los Angeles', 'state': 'CA', 'salary': 65000},
    {'name': 'Michael Johnson', 'city': 'Chicago', 'state': 'IL', 'salary': 40000},
    {'name': 'Emily Davis', 'city': 'Chicago', 'state': 'IL', 'salary': 480000},
    {'name': 'David Wilson', 'city': 'Los Angeles', 'state': 'CA', 'salary': 60000},
]
more_people = [
    {'name': 'Sarah Brown', 'city': 'Philadelphia', 'state': 'PA', 'salary': 52000},
    {'name': 'Alex Martinez', 'city': 'New York', 'state': 'NY', 'salary': 85000},
    {'name': 'Maria Garcia', 'city': 'New York', 'state': 'NY', 'salary': 160000},
    {'name': 'James Lee', 'city': 'Chicago', 'state': 'IL', 'salary': 80000},
    {'name': 'Linda Harris', 'city': 'San Francisco', 'state': 'CA', 'salary': 100000}
]

In [4]:
import pandas as pd

In [7]:
df = pd.DataFrame(people)
df

Unnamed: 0,name,city,state,salary
0,John Doe,New York,NY,50000
1,Jane Smith,Los Angeles,CA,65000
2,Michael Johnson,Chicago,IL,40000
3,Emily Davis,Chicago,IL,480000
4,David Wilson,Los Angeles,CA,60000


In [8]:
df2 = pd.DataFrame(more_people)
df2

Unnamed: 0,name,city,state,salary
0,Sarah Brown,Philadelphia,PA,52000
1,Alex Martinez,New York,NY,85000
2,Maria Garcia,New York,NY,160000
3,James Lee,Chicago,IL,80000
4,Linda Harris,San Francisco,CA,100000


Combining DataFrames involves using the `pd.concat` function. We also supply a few extra arguments:
    
- `axis=0` to perform a "stacking" or vertical join operation.
- `ignore_index=True` to reset the index on the resulting DataFrame

In [14]:
pd.concat([df, df2], ignore_index=True, axis=0)

Unnamed: 0,name,city,state,salary
0,John Doe,New York,NY,50000
1,Jane Smith,Los Angeles,CA,65000
2,Michael Johnson,Chicago,IL,40000
3,Emily Davis,Chicago,IL,480000
4,David Wilson,Los Angeles,CA,60000
5,Sarah Brown,Philadelphia,PA,52000
6,Alex Martinez,New York,NY,85000
7,Maria Garcia,New York,NY,160000
8,James Lee,Chicago,IL,80000
9,Linda Harris,San Francisco,CA,100000
