# 合并联结
Pandas具有功能全面的高性能内存中连接操作，与SQL等关系数据库非常相似。
Pandas提供了一个单一的函数merge，作为DataFrame对象之间所有标准数据库连接操作的入口点。

In [1]:
import pandas as pd
left = pd.DataFrame({
        'id':[1,2,3,4,5],
        'Name':['Alex','Amy','Allen','Alice','Ayoung'],
        'subject_id':['sub1','sub2','sub4','sub6','sub5']
    })
right = pd.DataFrame({
        'id':[1,2,3,4,5],
        'Name':['Billy','Brain','Bran','Bryce','Betty'],
        'subject_id':['sub2','sub4','sub3','sub6','sub5']
    })
print left,'\n'
print right

     Name  id subject_id
0    Alex   1       sub1
1     Amy   2       sub2
2   Allen   3       sub4
3   Alice   4       sub6
4  Ayoung   5       sub5 

    Name  id subject_id
0  Billy   1       sub2
1  Brain   2       sub4
2   Bran   3       sub3
3  Bryce   4       sub6
4  Betty   5       sub5


通过一个键合并两个DataFrames

In [2]:
print pd.merge(left,right,on='id')

   Name_x  id subject_id_x Name_y subject_id_y
0    Alex   1         sub1  Billy         sub2
1     Amy   2         sub2  Brain         sub4
2   Allen   3         sub4   Bran         sub3
3   Alice   4         sub6  Bryce         sub6
4  Ayoung   5         sub5  Betty         sub5


通过多个键合并两个DataFrames

In [3]:
print pd.merge(left,right,on=['id','subject_id'])

   Name_x  id subject_id Name_y
0   Alice   4       sub6  Bryce
1  Ayoung   5       sub5  Betty


### 1. 使用'how'参数进行合并
如何合并参数指定如何确定哪些键将被包含在结果表中。如果组合键没有出现在左侧或右侧表中，则连接表中的值将为NA。
下面是一个how参数选项和它们对应的SQL语句：

| 合并方式 | 等效SQL | 描述 |
| :----: | :----: | :----: |
| left | LEFT OUTER JOIN | 使用左侧对象键 | 
| right | RIGHT OUTER JOIN | 使用右侧对象键 | 
| outer | FULL OUTER JOIN | 使用联合键 | 
| inner | INNER JOIN | 使用键的交集 | 

#### 左联结

In [4]:
print pd.merge(left,right,on='subject_id',how='left')

   Name_x  id_x subject_id Name_y  id_y
0    Alex     1       sub1    NaN   NaN
1     Amy     2       sub2  Billy   1.0
2   Allen     3       sub4  Brain   2.0
3   Alice     4       sub6  Bryce   4.0
4  Ayoung     5       sub5  Betty   5.0


#### 右联结

In [5]:
print pd.merge(left,right,on='subject_id',how='right')

   Name_x  id_x subject_id Name_y  id_y
0     Amy   2.0       sub2  Billy     1
1   Allen   3.0       sub4  Brain     2
2   Alice   4.0       sub6  Bryce     4
3  Ayoung   5.0       sub5  Betty     5
4     NaN   NaN       sub3   Bran     3


#### 外部联结

In [8]:
print pd.merge(left,right,how='outer',on='subject_id')

   Name_x  id_x subject_id Name_y  id_y
0    Alex   1.0       sub1    NaN   NaN
1     Amy   2.0       sub2  Billy   1.0
2   Allen   3.0       sub4  Brain   2.0
3   Alice   4.0       sub6  Bryce   4.0
4  Ayoung   5.0       sub5  Betty   5.0
5     NaN   NaN       sub3   Bran   3.0


#### 内部联结

In [9]:
print pd.merge(left,right,on='subject_id',how='inner')

   Name_x  id_x subject_id Name_y  id_y
0     Amy     2       sub2  Billy     1
1   Allen     3       sub4  Brain     2
2   Alice     4       sub6  Bryce     4
3  Ayoung     5       sub5  Betty     5
