# Pandas - Combining DataFrames
---

Combining CSV's.

## Concatenation

1. Create two dataframes from **file-a.csv** and ** file-b.csv**. 
2. For each dataframe, set the index to be the `id` column.

In [None]:
import pandas as pd

In [None]:
# load 2 csv's
file_a = pd.read_csv('file-a.csv')
file_b = pd.read_csv('file-b.csv')

In [None]:
 # ...and combine them

## Concatenating along the column axis

Must specify whether to be combined along the row axis (**axis=0**) or column axis (**axis=1**).
Combining along the row axis using pandas' [concat](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html) method

In [None]:
# since row labels match, stack using columns, row on top of row

file = pd.concat([file_a, file_b], axis=0)

Compare the shape of combined dataframe to the shapes of the original two dataframes.

In [None]:
file_a.shape

In [None]:
file_b.shape

In [None]:
file.shape

In [None]:
file.head()

In [None]:
nutrition = pd.read_csv('nutrition.csv')
nutrition.head()

In [None]:
# file.set_index('id', inplace=True)
# file.head()

file.set_index('id', inplace=True)
file.head()

## Concatenating along the column axis


1. Create a dataframe from **file_c.csv**
2. Set your index to be the id column, so that index matches names

In [None]:
# order matters left to right

pd.concat([file, nutrition], axis=1)

With two dataframes have the same index, concanate them into one dataframe along the column axis:

# Joining

Load **file.csv** into a dataframe.

In [None]:
file = pd.read_csv('file.csv')
file.head()

In [None]:
# are any null?
file.isnull().sum()

In [None]:
# find null

file.loc[file['category_id'].isnull(), :]

In [None]:
categories = pd.read_csv('categories.csv')
categories.head()

Using pandas [merge](http://pandas.pydata.org/pandas-docs/stable/merging.html) method, combine **file** along the column axis.

In [None]:
# "left" joins

file.head()

Combine **categories** with **file**, _one of the file does not belong to a category_.

In [None]:
# merge , takes a few parameters ( left, right, )

df = pd.merge(file, category, left_on='', right_on='')

In [None]:
# keys = col names, val = what to rename to
df.rename(columns={'id_x': 'id', 'name_x': '', 'name_y': ''}, inplace=True)
df.head()

In [None]:
# dropping row or column?

df.drop('id_y', axis=1, inplace=True)

In [None]:
df.head()

In [None]:
# rename columns in categories, then remerge in file

# change name to category
# change category to name_cat

categories.rename(columns={'name': 'category', 'id': 'category_id'}, inplace=True)
categories.head()

In [None]:
pd.merge(df, categories, left_on='category_id', right_on='category_id')

In [None]:
df = pd.merge(df, categories, on='category_id')
df.head()