# [MERGING AND JOINING](http://pandas.pydata.org/pandas-docs/stable/merging.html#merge-join-and-concatenate)

## [The concat function](http://pandas.pydata.org/pandas-docs/stable/merging.html#concatenating-objects)

In [42]:
import pandas as pd
import numpy as np

In [54]:
data1 = pd.read_csv('../data/data1.csv').set_index('Name')
data1

Unnamed: 0_level_0,A,B,C,D
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
LYB,45.0,74,263,944
OIV,272.0,72,465,242
MBE,65.0,25,833,627
VYV,,68,92,73


In [55]:
data2 = pd.read_csv('../data/data2.csv').set_index('Name')
data2

Unnamed: 0_level_0,B,C,D
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BMA,836,123,378
BAD,545,6452,838


In [56]:
data3 = pd.read_csv('../data/data3.csv').set_index('Name')
data3

Unnamed: 0_level_0,D
Name,Unnamed: 1_level_1
JEI,23
VWI,82


Here, we perform a concatenation by specifying an outer join, which concatenates and performs a union on all the three data frames, and includes entries that do not have values for all the columns by inserting NaN for such columns:

In [57]:
pd.concat([data1, data2, data3], axis=0)

Unnamed: 0_level_0,A,B,C,D
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
LYB,45.0,74.0,263.0,944
OIV,272.0,72.0,465.0,242
MBE,65.0,25.0,833.0,627
VYV,,68.0,92.0,73
BMA,,836.0,123.0,378
BAD,,545.0,6452.0,838
JEI,,,,23
VWI,,,,82


We can also specify an inner join that does the concatenation, but only includes rows that contain values for all the columns in the final data frame by throwing out rows with missing columns, that is, it takes the intersection:

In [58]:
pd.concat([data1, data2, data3], axis=0, join='inner')

Unnamed: 0_level_0,D
Name,Unnamed: 1_level_1
LYB,944
OIV,242
MBE,627
VYV,73
BMA,378
BAD,838
JEI,23
VWI,82


In [63]:
prices1 = pd.read_csv('../data/prices1.csv').set_index('Name')
prices1

Unnamed: 0_level_0,A,B
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
IJK,6,84
NAE,47,73
OPQ,37,26
MNW,87,64


In [64]:
prices2 = pd.read_csv('../data/prices2.csv').set_index('Name')
prices2

Unnamed: 0_level_0,C
Name,Unnamed: 1_level_1
OPQ,635
MNW,42


In [65]:
prices3 = pd.read_csv('../data/prices3.csv').set_index('Name')
prices3

Unnamed: 0_level_0,D
Name,Unnamed: 1_level_1
NAE,25
OPQ,42
MNW,846
LWX,933


In [66]:
pd.concat([prices1, prices2, prices3], axis=1)

Unnamed: 0,A,B,C,D
IJK,6.0,84.0,,
LWX,,,,933.0
MNW,87.0,64.0,42.0,846.0
NAE,47.0,73.0,,25.0
OPQ,37.0,26.0,635.0,42.0


In [67]:
pd.concat([prices1, prices2, prices3], axis=1, join='inner')

Unnamed: 0_level_0,A,B,C,D
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
OPQ,37,26,635,42
MNW,87,64,42,846


In [68]:
pd.concat([prices1, prices2, prices3], axis=1, join_axes=[df.index])

Unnamed: 0_level_0,A,B,C,D
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
IJK,6.0,84.0,,
NAE,47.0,73.0,,25.0
OPQ,37.0,26.0,635.0,42.0
MNW,87.0,64.0,42.0,846.0
LWX,,,,933.0
MQE,,,,
ITD,,,,


## [Using append](http://pandas.pydata.org/pandas-docs/stable/merging.html#concatenating-using-append)

The append function is a simpler version of concat that concatenates along axis=0.

In [69]:
values1 = pd.read_csv('../data/values1.csv').set_index('ID')
values1

Unnamed: 0_level_0,Value1,Value2,Value3,Value4,Value5
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A643,89,67,98,568,687
B843,73,53,827,43,286
C426,38,457,297,54,735


In [70]:
values2 = pd.read_csv('../data/values2.csv').set_index('ID')
values2

Unnamed: 0_level_0,Value1,Value2,Value3
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
H013,27,84,25
I725,84,92,31
J274,35,833,72


In [71]:
values1.append(values2)

Unnamed: 0_level_0,Value1,Value2,Value3,Value4,Value5
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A643,89,67,98,568.0,687.0
B843,73,53,827,43.0,286.0
C426,38,457,297,54.0,735.0
H013,27,84,25,,
I725,84,92,31,,
J274,35,833,72,,


## [Appending a single row to a DataFrame](http://pandas.pydata.org/pandas-docs/stable/merging.html#appending-rows-to-a-dataframe)
We can append a single row to a DataFrame by passing a series or dictionary to the append method:

In [76]:
grades = pd.read_csv('../data/grades.csv')
grades

Unnamed: 0,Name,Math,Physics,Chemistry
0,Henry,5.8,9.3,4.3
1,Jack,8.7,9.2,5.6
2,Susan,5.5,7.6,8.4


In [78]:
moreGrade = {
    'Name': 'John',
    'Math': 8.3,
    'Physics': 4.8,
    'Chemistry': 7.7
}
grades.append(moreGrade, ignore_index=True)

Unnamed: 0,Name,Math,Physics,Chemistry
0,Henry,5.8,9.3,4.3
1,Jack,8.7,9.2,5.6
2,Susan,5.5,7.6,8.4
3,John,8.3,4.8,7.7


## [SQL-like merging/joining of DataFrame objects](http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging)

In [79]:
d1 = pd.read_csv('../data/d1.csv')
d1

Unnamed: 0,ID,DMC,YEM,DIY
0,A345,3759,2478,9764
1,B620,7241,7643,4587
2,C458,4571,2458,8543


In [80]:
d2 = pd.read_csv('../data/d2.csv')
d2

Unnamed: 0,ID,MEI,IER
0,A345,3759,2478
1,B620,7241,7643


In [81]:
d3 = pd.read_csv('../data/d3.csv')
d3

Unnamed: 0,ID,KLO,PHE
0,C458,9246,1764
1,D993,2768,2478


In [83]:
pd.merge(d1, d2, how='outer')

Unnamed: 0,ID,DMC,YEM,DIY,MEI,IER
0,A345,3759,2478,9764,3759.0,2478.0
1,B620,7241,7643,4587,7241.0,7643.0
2,C458,4571,2458,8543,,


In [84]:
pd.merge(d1, d3, how='inner')

Unnamed: 0,ID,DMC,YEM,DIY,KLO,PHE
0,C458,4571,2458,8543,9246,1764


In [85]:
pd.merge(d1, d3, how='right')

Unnamed: 0,ID,DMC,YEM,DIY,KLO,PHE
0,C458,4571.0,2458.0,8543.0,9246,1764
1,D993,,,,2768,2478


### [The join function](http://pandas.pydata.org/pandas-docs/stable/merging.html#joining-on-index)

In [86]:
v1 = pd.read_csv('../data/v1.csv')
v1

Unnamed: 0,HJL,BMJ,YVJ
0,32,87,567
1,986,567,878
2,675,90,67
3,467,97,54


In [87]:
v2 = pd.read_csv('../data/v2.csv')
v2

Unnamed: 0,MNK,BNG,YUI
0,45,86,735
1,246,457,24
2,873,84,742
3,23,87,47


In [88]:
v3 = pd.read_csv('../data/v3.csv')
v3

Unnamed: 0,MNK,SMH,OPU
0,457,646,98
1,75,24,468
2,55,86,247
3,56,48,856


In [89]:
v1.join(v2)

Unnamed: 0,HJL,BMJ,YVJ,MNK,BNG,YUI
0,32,87,567,45,86,735
1,986,567,878,246,457,24
2,675,90,67,873,84,742
3,467,97,54,23,87,47


In [90]:
v2.join(v3)

ValueError: columns overlap but no suffix specified: Index(['MNK'], dtype='object')